arrays

Henry Cejtin henry@sourcelight.com
Sun, 24 Dec 2000 16:28:04 -0600


I  didn't see the assembler ever being called in the -v output.  Clearly that
should be fixed up.  I would love to have the current  -v  output  only  come
from something like -vv with a single -v just indicating the various programs
being run (the MLton compiler, the assembler, the linker, gcc, etc.) but  not
the  whole  detailed  output  of all the internal parts of the MLton compiler
passes.

I did some quick tests of converting arrays with fat elements  into  parallel
arrays.  The case I tried was an
    (int * int * int) vector
vs.
    int vector * int vector * int vector
and  a routine which added up, for all the elements in the vector, either all
3 fields together or just the last field.  (The notion is to  pick  which  on
the basis of command line arguments so MLton won't optimize away any parts of
the arrays.)

The moral was that MLton did not flatten out the triple of vectors, so either
way  the  number of memory indirections was the same.  For the normal version
one indirection to get the vector element (which was the  address  where  the
tuple was) and then a second to get the int.  For the `improved' version, one
indirection to get the correct vector and then one to get the  int.   All  of
this  was  by just writing new versions of sup which indexes into each vector
and then makes a tuple.  The notion being that the optimizer will fix this in
the case where you are only using some of the slots.

This  saved  a lot of space (20 bytes per for the normal and 12 bytes for the
`improved).  Interestingly, in the case where  you  couldn't  fit  everything
into the L1 cache it sped things up by a lot because the tuple of vectors DID
make it into the L1.  It make things 2-3 times faster, depending  on  if  you
fit in the L2 cache or not.