Using C as a back end

Stephen Weeks MLton@sourcelight.com
Thu, 26 Oct 2000 10:15:20 -0700 (PDT)


> "Stephen Weeks" <sweeks@my-deja.com> writes:
> > 
> > Norman Ramsey <nr@labrador.eecs.harvard.edu> wrote:
> > > It's a pretty good solution if
> > > * your language doesn't have garbage collection, or you're willing
> > >   to use the marvellous Boehm/Weiser/Demers conservative collector
>  
> > ...  We went through several iterations of the interface between the
> > generated C and the runtime (all basically different ways of telling
> > the GC the root set) and eventually found one we liked.
>  
> Is this described in any of the MLton papers? If not, can you explain
> how you do it? In general one cannot know where the C compiler puts
> pointers (caller/calle-save register, stack slot, ...), so I wonder
> how you manage to accurately determine the locatios of the root set.

It's not described in our papers.  In summary, we arrange it so that we don't
need to know where the C compiler puts things.

All of the roots for our GC are either contained in an global array of pointers
containing values defined at the SML toplevel or are in the MLton stack, which
is *not* the same as the C stack.  MLton never uses C calls to implement SML
calls -- it uses its own calling convention and stack and uses a C goto to make
an SML call, so the C stack never grows.  Conceptually, you can think of the
entire SML program as one C procedure.  We don't do it that way for performance
reasons.  We break up the procedure into many C procedures and use a trampoline
in the main procedure.  When a goto needs to go to a label in another procedure,
it returns to the trampoline in the main procedure which then calls the right C
procedure.  So there is never more than one C frame on the C stack executing
MLton code.

The only other thing that we need to do is to make sure that the GC never needs
to know about where the C compiler puts a pointer variable.  We do this by
ensuring that a C variable holding an SML pointer is never live at any point
where a GC might occur, in particular at limit check points.  Right now, our
backend is very simplistic -- if an SML pointer variable is ever live at any GC
point, then it permanently lives in a MLton stack slot.  

So, the GC can find all of the roots in MLton data structures, and doesn't need
to know about the C compiler at all.  In fact, we use a slight modification of
the GC for a native X86 code generator that we are currently developing.