[MLton] array flattening

Matthew Fluet fluet@cs.cornell.edu
Fri, 9 Jul 2004 08:44:16 -0400 (EDT)


> 1. Add an extra "offset" field to sub and update to specify the offset
> within the array element.
>
> 	y = Array_sub (a, i, #1)
> 	z = Array_sub (a, i, #2)
>
> 	Array_update (a, i, #1, y)
> 	Array_update (a, i, #2, z)

That does seem easier.  Also, note that we will always be flattening known
types, so the offset will always be constant.

> As for RSSA and Machine, we have the ArrayOffset and Offset operands
>
>       structure Operand:
> 	 sig
> 	    datatype t =
> 	       ArrayOffset of {base: t,
> 			       index: t,
> 			       ty: Type.t}
> 	     | Offset of {base: t,
> 			  offset: Bytes.t,
> 			  ty: Type.t}
>             ...

Is there a reason we have both?  Why not just Offset, with ArrayOffset
being translated away as a computed address and then an Offset of 0.
(To answer my own question from the thread pointed to below:

>> Another way I could go would be to get rid of ArrayOffset altogether
>> and explicitly do the address computation in RSSA.  But I can only
>> guess that this will suffer from your other point:
>
>No, I don't think getting rid of ArrayOffset altogether is the right thing
>to do.  Like I said before, I like to think of ArrayOffset as a location;
>when it's just used once, we don't do an explicit address computation, we
>just produce an assembly address operand, like (%eax, %ecx, 4).  We'd burn
>a lot of code and cycles if we computed
>RP(0) = SP(12) + 4 * (SI(20))
>...OP(RP(0), 0) ...
>for every use of ArrayOffset.
>But, when we need to the address as a value, then it really should be
>dropped into a variable.

However, note that if the array offset came down to the codegen, then it
will happily produce: (0+4)(%eax, %ecx, 4).

So, I actually think the codegen would be happier with
ArrayOffset {base: t, index: t, offset: Bytes.t, ty: Type.t}

But, as Steve later pointed out:

>The generated code looks fine to me for now, and I don't mind having
>ArrayOffset do the dereference.  But in the long term, I think the
>right thing to do is to get rid of ArrayOffset, make all addressing
>computations explicit in Rssa, and put the necessary peepholing in the
>codegens.  Making the addressing explicit will help in optimization
>and when we have flatter arrays.  Of course it will mean that Rssa and
>Machine need to handle interior pointers, which they don't right now.
>But this is clearly a longer term thing and not for now.

But, until we got that peepholing in, the codegen would suffer in both
size and cycles.

On yet another hand, if you've flattened in something of significant size,
then it may be the case that doing:

?? ??,(0+4)(%eax, %ecx, 4).
?? ??,(0+8)(%eax, %ecx, 4).
?? ??,(0+12)(%eax, %ecx, 4).
?? ??,(0+16)(%eax, %ecx, 4).

is bigger than

leal %edx,(%eax, %ecx, 0)
?? ??,(0+4)(%edx)
?? ??,(0+8)(%edx)
?? ??,(0+12)(%edx)
?? ??,(0+16)(%edx)

You've also freed up %eax,%ecx, which is probably a good thing.

> Any thoughts?  Matthew, I am particularly interested to know if you
> see any problems this could cause the native codegen, or if another
> approach would be better.

This shouldn't cause any problems in the codegen.  You can always add an
immediate offset to an address, so there wouldn't be any problem handling
an extended ArrayOffset in the translation.

In fact, when you have an Int64.int array, this is exactly what the native
codegen does in order to access the lo and hi words.

However, I will suggest that if the array is flattened, then you do all
operands as computed address followed by offset.  This will help the
may-alias heuristics, because it is obvious that two operands with the
same base but different memlocs do not overlap.  It is less obvious that
an array offset and a computed address followed by (non-zero) offset do
not alias.  Especially when there is any control flow between the computed
address and the use, as might be the case with common sub-exp elim.


Bottom line: if MachineIL doesn't change, the x86-codegen shouldn't need
to change; if MachineIL changes ArrayOffset, then it is an easy change to
x86-translate.fun's Operand.toX86Operand.  In either case, I think some
experiments/tweaking will be in order.