[MLton] HOL's Moscow ML implementation, and pushing MLton to emulate it

Matthew Fluet fluet@cs.cornell.edu
Thu, 24 Mar 2005 08:54:02 -0500 (EST)


Michael,

Sorry to not have had time to respond to your earlier message.  Here are 
some notes I was writing up after seeing Scott's message.

Scott has the right idea, but one can make it more formal (and more
extensible) by using the ideas of embedded interpreters.  See, for
instance,

http://www.eecs.harvard.edu/~nr/pubs/embed-abstract.html
and
http://research.microsoft.com/~nick/benton03.pdf


> A. Implementation of a small ML kernel to prove theorems

As you note, MLton can't really help you here in any simple way, since 
MLton does not provide a REPL.

> B. A persistence mechanism for values of type "thm" (theories)
> C. A way to accompany theories with code that is dynamically loaded

I don't quite understand your stress on "dynamically"; see my other notes 
below.

> -- MLton ----------
>
> MLton doesn't have an interactive loop, but even if we were to
> implement an interactive system ourselves (and either invented our own
> domain-specific language for doing proof, or provided our own ML
> interpreter, as Scott Owens suggested), we still wouldn't be able to
> dynamically load code that was appropriate to the theorem-proving task
> at hand.

That is true.

> If we switched our theory representation away from SML
> structures to data files, we would at least be able to dynamically
> load theorems from disk.

Also true.

> Joe Hurd experimented with getting MLton to emulate the batch-compiler
> aspect of HOL.  There's lots of repeated work there because the same
> files keep getting recompiled to produce slightly different
> executables as the build progresses, but it should be theoretically
> possible.  Unfortunately, the compiler hates our big theory files,
> which are sometimes rather large ('trivial' val-bindings, but lots of
> them), and even this much doesn't really work.  If we changed our
> file-format on disk, we would solve that problem.

I think we've solved most of the compiler performance problems that arose 
from Joe Hurd's experiments.  However, the performance problems arose from 
having a huge number of global bindings, which are nothing more than 
initialized data.  It does seem that changing the file-format on disk 
would alleviate this problem.

> -- Creating a monolith ----------
>
> To create a MLton-based, possibly interactive system, supporting the
> various special-purpose bits of code we've written, it seems we will
> have to do the following:
>
> * change to use a data-file format for theorems on disk
> * implement our own proof language
> * eventually build one monolithic executable that includes all of the
>   code that accompanies various theories.  For example, the code for
>   deciding linear arithmetic over the real numbers will always be
>   present, even if the user is only working with sets of natural
>   numbers.

You are paying some executable size and possibly persistent heap usage, 
but I don't see other negatives for having all the code in one executable. 
You could certainly implement some sort of name-space management in your 
interpreter/proof-language so that various bits of code weren't available 
until the user loaded the appropriate theory.

> * provide for relatively easy creation of new versions of the
>   monolith, so that users can add their own special-purpose code.
>   (This is not likely to happen that often.)

Essentially, you'll need to recompile HOL.  The hard part is figuring out 
how to incorporate user code without them needing to do a lot of Makefile 
hackery.