[MLton] Re: [MLton-commit] r5678

Thu Sep 20 20:37:51 PDT 2007

On Thu, 20 Sep 2007, Vesa Karvonen wrote:
> [I noticed the comment in this commit log message a long time ago, but
> didn't get around to commenting on it until now.]
>
>> --- mlton/trunk/mlton/backend/packed-representation.fun 2007-06-26 06:15:05 UTC (rev 5677)
>> +++ mlton/trunk/mlton/backend/packed-representation.fun 2007-06-27 00:11:16 UTC (rev 5678)
>> @@ -968,7 +968,7 @@
>>        (* TupleRep.make decides how to layout a sequence of types in an object,
>>         * or in the case of a vector, in a vector element.
>>         * Vectors are treated slightly specially because we don't require element
>> -       * widths to be a multiple of the word size.
>> +       * widths to be a multiple of the word32 size.
>>         * At the front of the object, we place all the word64s, followed by
>>         * all the word32s.  Then, we pack in all the types that are smaller than a
>>         * word32.  This is done by packing in a sequence of words, greedily,
>
> I'd just like to note that on some CPUs the above scheme results in
> rather bad record layouts.  More specifically, on some CPUs (IIRC,
> e.g. Hitachi SH-4) the register+immediate addressing mode is limited
> to small (e.g. 4-bit) immediate offsets (possibly) scaled by the
> operand size.  On such a CPU, packing the widest fields to the front
> means that only the first few fields can be accessed with a single
> instruction.  Narrow fields at the end will be beyond the reach of the
> small immediate offset.  I think that a better scheme would be to
> attempt to coalesce narrow fields into wider (aligned) fields (e.g.
> 1+1+2 -> 4 bytes) and put the coalesced fields to the front as long as
> alignment restrictions are respected and the aligned size of the whole
> record does not increase.  The point is to maximize the number of
> fields that can be accessed with a single instruction.

Fair enough; it would be fairly to modify the TupleRep.make function to 
put the small objects first.  Though, I don't believe that any currently 
supported architecture suffers from the problem you describe.