new release

Matthew Fluet fluet@CS.Cornell.EDU
Thu, 11 Jan 2001 10:25:12 -0500 (EST)


> * Figure out what to do about Matthew's (or some other new) contification
>   algorithm.

I've looked at contify.fun a few times thinking about trying out my
revision of the algorithm.  I don't think it would be that hard to modify
the code.

> * I inserted a cast on line 84 of x86codegen.h.  The new main.sml doesn't
>   compile the C file -wa and was getting an error.

Looks fine.

> * I noticed a few uses of List.insertionSort in your code.  I don't know how
>   big the lists are, but if they are big, it could be costing us.  You might
>   want to look into using MergeSort.sort, which is in my library.

I looked at all of these.  The one in x86.fun just sorts the profile
information (this should make it easier for mlprof to parse the profile
labels), of which there are usually only 2 and a few times 3.

The ones in allocate-registers.fun sort potential registers (of which
there are at most 8).

The ones in simplify.fun might be a little large.
Two of the ones in simplify.fun sort the pseudo-regs live into a block;
most of the time I've only ever seen this go up to 15 or so.
The last one sorts the cases of a Switch statement; this could be large,
particularly when picking up some of the cases produced by mlyacc.  I'll
look into changing those to merge sort.

> * There was recently an example mentioned on comp.lang.ml about slow compilation 
>   times with SML/NJ on long lists of integer constants.  I tried a few examples, 
>   and noticed that the native codegen is very slow on these as well.  If you 
>   could look into it, that would be great.

I know the basic issue: a long list of integer constants produces a giant
block, which is essentially composed of lots of moves.  This looks
tempting to a number of the simplification passes, but there is
essentially nothing to be done: the constantly changing frontier pointer
and dependencies in initializing cons cells prevent any code motion.  

I sidestepped part of this problem by simply skipping optimization on the
main chunk: that's the one that had initGlobals which was one of these
huge blocks.  But clearly they can crop up elsewhere.  

Could you send me a couple of the test files you were looking at?  I'll
see what phases are really being slowed down and see if there is anything
to be done there.