[MLton] space safety for structures

Fri, 2 Jul 2004 13:56:11 -0700

I agree with Henry's analysis that in the following program there is
no reason why S.l should kept alive in the closure for f.

  structure S =
     struct
        val l = List.tabulate (1000000, fn i => i)
        val x = 13
     end

  val f = fn _ => .... S.x ...

Let me present a slightly more challenging example to get right from a
compiler's perspective.

  structure S =
     struct
        val x = 13
        val l = List.tabulate (1000000, fn i => i)
     end

  structure T =
     struct
        val _ = (* some expensive computation *)
     end

  structure U =
     struct
        val _ = print (Int.toString S.x)
     end

My question is: can S.l be kept alive while the computation in T is
being performed?  It would be nice to answer "no", and certainly
that's what MLton does.  However, imagine that S, T, and U are
different files and that a compiler is using separate compilation.
The way this is typically done, if I understand correctly, is to build
a tuple to represent S that contains all the values exported by S.  In
this program, the tuple will contain both x and l.  The file
containing U will be compiled to be a function that takes a tuple
representing S and builds a tuple representing U.

When compiling U, we can certainly cause the code to first extract the
"x" component from the tuple representing S so that U doesn't keep S.l
alive.  However, merely by the fact that U takes the tuple
representing S, it will keep S.l alive.

It seems that fixing this problem requires compiling special code
dealing with S based on its *uses*.  A couple of ideas come to mind.

Each module that uses S could be compiled to take a stripped-down
version, containing only the part of S that it needs.  All of the
stripped-down versions must be constructed immediately after S is, to
avoid any one of them keeping stuff alive to long.

Another option would be to do some kind of whole-program liveness
analysis and then to insert code to black-hole components of S when
they become dead.

What I'm wondering is how do separate-compilation systems deal with
this problem, if they do, or do they just use a weaker notion of space
safety than we do?  This question comes to mind because it seems like
part of the reason that MLton is so slow in compiling HOL is that it
is doing whole-program liveness analysis and completely decomposing
structures, hence achieving a very aggressive notion of space safety.