[MLton] Profile driven packing...

Stephen Weeks MLton@mlton.org
Tue, 27 Apr 2004 13:03:18 -0700


> Consider a simple datatype
> 
> datatype t = T of Word32.word * Word32.word
> 
> Lets say you run a space profiler or via some domain knowledge happen to 
> know that most of the time you are just storing small words...
> 
> one could "optimize" the above to
> 
> datatype t = T1 of Word16.word * Word15.word
>             | T2 of Word32.word * Word32.word
>
> the basic idea is to use profile driven represenations of datatypes to get 
> better space utilization and possibly better cache locality.

A neat idea.  Another possibility would be to extend MLton's
whole-program constant propagation to see if it could prove that
certain bits in words were constant (as opposed to what it tries to do
now, which is to prove the entire word is constant).

> BTW just as a user feed back level.. is there some way of asserting that a 
> particular ML type will be laided out/packed in a certain way?
> 
> I'd be happy to see a pragma/comment
> 
> So that I can can say something like
> 
>     datatype t = A of Int31.int | B of Int31.int (*:packed: unboxed 32*)
> 
> and have the compiler complain when my expectation is not meet?

There's nothing there now.  However, one nice thing about the
representation pass is that the decisions are purely based on the
types.  So, if the type t above is layed out unboxed in one program,
it will be in all programs ... well sort of.  One issue is that prior
optimizations may have simplified the type.  For example, MLton may
have proved that the argument of A is unused or even that the variant
A is completely unused.  In those cases, the representation of t might
change (get smaller).  And this shows the dangers of such a pragma in
the first place.

BTW, the representation decisions are not as obvious as it might seem
at first blush, and are likely to change, which is yet another reason
not to expose them.  For example, your datatype above

 datatype t = T1 of Word16.word * Word15.word
             | T2 of Word32.word * Word32.word

will be represented as either a pointer (for T2) or a 32-bit word with
a low bit of one (for T1).

However, in

 datatype t = A | B | C of Int31.int

MLton will box the C variant.  This may be somewhat surprising.  The
issue is whether (and how many) tag bits should be allowed to
"overlap" with value bits and how many switch statements will we allow
to discriminate between the variants of a datatype.  In order to
represent the C variant above unboxed, because we need two bits to
distinguish between A and B, and so one of those bits must overlap
with the Int31.int of the C variant.  Implementing a case expression
would need to either first dispatch on the low bit, or add extra cases
for the value bits of the C variant. Things can get more complex as
there are more variants of different sizes.

As an extreme case, consider

  datatype t = W0 | W1 of Word1.t | W2 of Word2.t | ... | W31 of Word31.t

One could choose to represent this type unboxed in a single word.  But
implementing a case expression using this representation would be a
nightmare.

In any case, for this first effort, I took the conservative approach
of never letting tag bits overlap with value bits, except in the case
of pointers.  That way, I could guarantee that no more than three
switches were needed to implement a case expression.

For the pathalogical case above, this means that MLton boxes variants
W27 - W31, and uses a 6-bit tag to distinguish between the unboxed
variants W0 - W26, where the bottom two bits of the tag are not 00.