[MLton] HOL's Moscow ML implementation, and pushing MLton to emulate it

Stephen Weeks MLton@mlton.org
Thu, 7 Apr 2005 10:38:45 -0700


> This is all true, though I think that moving scripts into the monolith
> wouldn't happen that often.  Strictly speaking it would never happen
> as scripts are only used to generate theory files (to use the HOL
> jargon).  Theory files are data and would never be incorporated
> directly into what was compiled, and nor would the scripts.  
> 
> But it is certainly true that advanced programmers might test their
> ideas out using the scripting interface (which will naturally look a
> great deal like the underlying environment). 

Sorry, scripts was the wrong terminology.  I was definitely referring
to code like the linear arithmetic solver, which might require some
interactive development and testing before it is deployed in the
monolith.

> As far as extensibility goes, we would particularly like a tool that
> enabled us to lift entire ML structures (with their own types and
> operations) on them into the interpreted domain, but with the
> operations implemented in the interpreter rather than having the
> source code interpreted.

Yeah this sounds great.  When I've thought about this sort of thing in
the past I've imagined a primitive built into MLton that would expose
the current environment in scope via an interpreter.  One could use
mlbs to produce the precise environment that one wants.  The biggest
problem I've never resolved to my satisfaction is how to lift
polymorphic values in the host environment to the hosted environment.
My intuition for the problem is that MLton's compilation strategy
requires seeing all the instances of a polymorphic function at compile
time.  Once one exposes a polymorphic function to the interpreter,
it's tough to make that true.

A while ago, I read Nick Benton's paper
(http://mlton.org/References#Benton05) and really liked it, but if I
recall correctly, its solution to the problem involved coercions.
These are slow because they take time proportional to the size to the
data structure being coerced.  Plus, there are difficulties with
sharing (I don't remember if the paper adequately addressed those).

The only other solution I see is to introduce a universal type from
the start in the host code as part of the compilation strategy, so
that there is only one kind of value and no need for coercions.  I.E.,
not the usual MLton compilation strategy.  This of course introduces a
big slowdown and defeats a lot of the benefits of using MLton.
Although maybe it could be done conditionally, so that if one has a
program with an interpreter, then one pays the cost, but removing the
interpreter allows MLton to do all its usual tricks.  That's a lot of
MLton hacking, but it could be very neat.

To spell out a little more detail -- this does not involve rewriting
all of MLton!  Another way to think about it is as a different
monomorphisation pass than our usual one.  Instead of duplicating
polymorphic functions for each type they are used, one translates
everything into a universal type, which trivially makes polymorphic
functions monomorphic.  Then there is no duplication.  The output of
such a pass is perfectly expressible in the same IL that MLton's
monomorphiser currently produces, so one can reuse the rest of the
compiler unchanged.  For those not so familiar with MLton internals,
the monomorphiser is pretty close to the front of the compiler, before
most of the optimizer.  So, you will still get a lot of the benefits
of MLton; it will simply be hamstrung by dealing with a universal
type.  One could even get fancy by doing some flow analysis and only
introducing the universal type where necessary (i.e. where one must
communicate with the interpreter).  It would be interesting to see if
one could factor out enough for this to buy anything.  I bet it would.