[MLton] MLton HOL

Stephen Weeks MLton@mlton.org
Mon, 28 Jun 2004 20:21:06 -0700


Hi Joe.  

I wanted to update you on the progress with the memory problem while
compiling HOL.  We did find the source of the problem, which is the
live-variable analysis in the backend of the compiler.  This is
potentially a quadratic analysis, because, for each function, it
(potentially) records information about every variable at every basic
block.

There is only one problematic function in your code, but it's very
problematic.  The "main" function has 161,783 blocks and 257,519
variables.  This is an order of magnitude larger than we've seen with
other large programs (like MLton).  Looking through the code and
reading

	http://www.cl.cam.ac.uk/users/jeh1004/research/papers/fasthol.html

I suspect that the unusual aspect of your code (from MLton's
perspective) is the automatically generated theory files.  These
contain lots of explicit constructions of data structures, lists, and
the like, all of which take a lot of code and introduce a lot of
variables.

I have a couple of ideas for passes to put in that will break main (or
any other large function) into chunks so that the quadraticness of
live-variable analysis doesn't hurt.  I have some time available and
plan to look into it this week.

In the mean time, I have an idea you might try in the interim.  Rather
than include the automatically generated theory files, why not include
the proof scripts and let MLton run the proof checker to generate the
theorems?  Presumably the source files for the scripts are much
smaller (and less unusual looking).  If you're worried about the run
time of this initial proof checking dominating the run time of your
tests, you can use MLton's World.save facility to stage the
computation.

Another possibility, if you have the infrastructure, would be to
manually serialize the theories to files and then read them in at
program startup, which would again cut down on the code.

I'll get back to you soon, I hope, with other news.