Hackers guid to MLton IR

Daniel Wang danwang@cs.princeton.edu
24 Nov 1999 10:20:15 -0500


"Stephen Weeks" <sweeks@intertrust.com> writes:

> > Is there a "compiler hackers" guide to MLton's internals. 
> Sadly there is not.  There is a document that I wrote way back in Nov
> 97, but it's probably too out of date to be worthwhile.  OTOH, I and
> Henry and Suresh read the MLton mail address, so should be able to
> answer your questions quickly.
> 

Hmmm... okay let's try an experiment... let me see if I can reverse engineer
the IR from the sources and produce a nice little tex document, describing the
Cps code with a formal semantics. I'll need it for the thesis anyway!
It'll also probably be an easy way to communicate about the IR. 

> ------	-------------------	-------------------------
> Ast	ast/ast.sig		none
> CoreML	core-ml/core-ml.sig	none
> Xml	xml/xml.sig		polymorphic, higher order
> Sxml	xml/sxml.sig		monomorphic, higher order
> Cps	cps/cps.sig		monomorphic, first order


> Machine backend/machine.sig	lame
Okay. I'll probably just start hacking on the Cps, since it sounds like
exactly what I want. 

At some point, I'll need to figure out the machine IR too
in enough detail to understand how the current GC interacts with it. Also
would it be particularly hard to retarget the backend to C--? Does the fact
your compiling to C dirty the machine interface?


> If you're going to use Cps and would like a higher level description
> of the type system than cps/type-check.fun, I think Suresh may have a
> tex'ed up version that is part of a paper we are working on.

That be a good start for me.

> Also, you probably figured this out, but you can get a good overview
> of the sequence of compiler passes in src/compile.sml.

Is there a flag to dump the Cps phase? 

So, if you're curious, I'll probably take the Cps phase and extend it with
"region types", so one can reason about memory management explicitly in a
type safe way. Then I'll be writting a phase the emits a custom garbage
collector along with the whole program. The result being a completely type
safe program whose safety doesn't require you to believe the GC is correct.

BTW since you are compiling to C why are their architecture dependencies?
i.e. how much work would it be to retarget everything to a Sparc and
Alpha?