Hackers guid to MLton IR

Stephen Weeks sweeks@wasabi.epr.com
Sat, 27 Nov 1999 17:55:52 -0800 (PST)


> Hmmm... okay let's try an experiment... let me see if I can reverse engineer
> the IR from the sources and produce a nice little tex document, describing the
> Cps code with a formal semantics. I'll need it for the thesis anyway!
> It'll also probably be an easy way to communicate about the IR. 

Sounds good.  The only nonstandard things that I can think of offhand
from a usual first-order, simply-typed language are
 * There is a special "variant" type for when we know that a value is
   a particular variant of a datatype
 * The type system keeps track of the local part of the handler
   stack, i.e. the portion of the handler stack within the current
   function.  See cps/infer-handlers.fun 

> > Also, you probably figured this out, but you can get a good overview
> > of the sequence of compiler passes in src/compile.sml.
> 
> Is there a flag to dump the Cps phase? 

When calling MLton from the command line, -ka is the best we've got.
This saves *all* the intermediate passes to files, including the cps
pass (after optimization), which is in foo.cps.  When mucking in the
compiler sources, Control.aux (in control/control.sml) controls
whether or not intermediate passes are saved.

> So, if you're curious, I'll probably take the Cps phase and extend it with
> "region types", so one can reason about memory management explicitly in a
> type safe way. Then I'll be writting a phase the emits a custom garbage
> collector along with the whole program. The result being a completely type
> safe program whose safety doesn't require you to believe the GC is correct.

Sounds interesting.  I'd thought about doing custom garbage collectors
at some point, but not in a type safe way.  Actually, now that I think
about it, it seems fairly straightforward to automatically generate
type-safe code for doing depth-first search copying of values in the
Cps language.  This would be enough for a simple stop and copy
collector that had its own stack, no?

Since there are so many passes that operate on Cps, if you're going to
modify the Cps language, you might be better of creating a new IR and
piping the output of the last Cps optimization pass into yours.  That
way, you won't have to modify all the existing code.

> At some point, I'll need to figure out the machine IR too
> in enough detail to understand how the current GC interacts with it. Also
> would it be particularly hard to retarget the backend to C--?

I don't think it would be too hard to port the backend to C--.  My
main worry would be getting access to the primitives.  There are a lot
of primitives used to implement the basis library (see all of the
_prim declarations in basis-library/*/*.sml).  These primitives are
just C function or macro names that get passed all the way through the
compiler and spat out at the backend to gcc.

> Does the fact your compiling to C dirty the machine interface?

The fact that we are compiling to C comes up noticeably in one place
-- "chunks", which are there solely to make sure that C procedures
don't get too big.  In machine.sig, a chunk corresponds to a C
procedure.  There is a pass (backend/chunkify.sig) which partitions
all of the labels in the Cps program into disjoint sets corresponding
to chunks.  A machine program is broken up into a collection of
chunks and some of the (farJump, newFrame, newHandler) need to know
which chunk they are operating on.

> BTW since you are compiling to C why are their architecture dependencies?
> i.e. how much work would it be to retarget everything to a Sparc and
> Alpha?

There's a little bit of assembler for stuff where the C standard
doesn't guarantee the semantics we need, or where gcc was just plain
buggy.  I don't think it would be too hard at all to port to other
Unices.

BTW, no need to send mail to both me and MLton, as I get all mail sent 
to MLton.