x86

Fri, 18 Aug 2000 12:39:56 -0400 (EDT)

> > Has anything been done to fix the assumption that syntactially distinct
> > memory locations are distinct?  This REALLY REALLY scares me.  It isn't just
> > that it is incorrect, it is that often one will be able to get away with it.
> > Thus these bugs can remain hidden for a long time.
> 
> A good reason to keep around the C backend for a while, so we can compare.  But,
> I agree.  I think we (i.e. Matthew :-) need to look at the x86translate pass to
> understand what kind of must-not alias information holds.  I don't mind taking
> advantage of the fact that x86 only comes from machine and I think it would not
> cost much in terms of performance.

I was thinking about this issue some more yesterday, and decided I was
trying to use more information than I really needed.  Here's a solution
that I think will work:

Each memory location gets a "class".  For now, I think we only need four
classes: 
datatype class = Stack | Heap | Runtime | Unknown.  

Memory locations in distinct classes must-not-alias (except that Unknowns
may-alias everything else). Recall that there are three types of memory
locations:

datatype t = Imm of {base : Immediate.t,         (* constant offsets off 
	             index : Immediate.t,         * known labels;
	             ...}                         * like gcState.stackTop
                                                  *)
           | Simple of {base : t,                (* constant offsets off
	                index : Immediate.t,      * some deref;
	                ...}                      * like SP(4), and
                                                  * OI(RP(1), 4)
                                                  *)
           | Complex {base : t,                  (* runtime offset off
	              index : t,                  * some derref;
	              ...}                        * like XI(SP(4), RI(2))
                                                  *)

Then Imm may-alias Imm iff their classes admit may-aliasing and they are
syntactically equal.  (Technically not true; one could have multiple
labels for the same location, or play funny games with different offsets
off of different labels to the same location; but I think this is a
reasonable assumption to make.)

Also, Simple may-alias Simple iff their classes admit may-aliasing and it
is not the case that their bases are equal but their offsets are
different.  (This asserts that OI(RP(4), 0) and OI(RP(4), 4) do not
alias; again, a slight assumption that we won't ever see OI(RP(4), 0) and
OI(RP(4), 1) which overlap in memory; MLton doesn't allow this, so again I
think it is a reasonable assumption).

Any other combination of memory location variants defaults to whether or
not their classes admit may-aliasing.

We might be able to strengthen that last case, but I'm guessing that we
won't need to do so.

I think that does it.  At translation time, I give every memory location
it's class.  For MachineOutput.Operand variants, we have

Register             => Runtime
Global               => Runtime
GlobalPointerNonRoot => Runtime
StackOffset          => Stack
Offset               => Heap
ArrayOffset          => Heap
Contents             => Heap

Almost every other memory location falls into the Runtime class (things
like gcState.stackTop, the intInfTemp for returning the value and new
frontier pointer, etc.), with a few minor exceptions (derefs of
gcState.stackTop and gcState.frontier are Stack and Heap respectively,
some stuff with threads which live on the Heap, etc.)

Register allocation proceeds like I mentioned before, making use of this
may-alias information.

I've started running a few programs to see where may-alias returns true
for non-equal memory locations.  The only place so far has to do with
save exception stacks: gcState.currentThread->exnStack lives on the heap
and so might alias other heap values.