[MLton] regions

Matthew Fluet fluet@cs.cornell.edu
Fri, 1 Oct 2004 08:56:21 -0400 (EDT)


> > In particular, given MLton's focus on performance, is there a
> > particular reason why regions were not adopted as the primary
> > mechanism for memory management?
>
> I know more now, but am still no expert on regions.  My opinion is
> that it is far from clear that regions will provide MLton with
> improved performance, they add a lot of complexity to the compiler,
> and they complicate reasoning about and achieving about space safety,
> which we do care about and is difficult enough to achieve as it is.

I agree with Stephen's opinion.  Region-based memory management and
garbage collection have different strengths and weaknesses; its pretty
easy to come up with programs that do significantly better under regions
than under GC, and vice versa.  And, I believe that it is the case that
common SML idioms tend to work better under GC than under regions.

Certainly, a simple stop-and-copy GC is easier to implement and probably
gives better average performance than a simple region system.  The MLKit
has put a lot of effort into improving the supporting analyses and
representations of regions, which are all necessary to improve the
performance.  They have also worked on combining garbage collection with
region-based memory management.  Unfortunately, it was my impression that
rather than getting the best of both worlds, you ended up with
compromising both systems:

http://www.it-c.dk/research/mlkit/papers.html

# Martin Elsman. Garbage Collection Safety for Region-based Memory
Management. In Proceedings of ACM SIGPLAN Workshop on Types in Language
Design and Implementation (TLDI'03). New Orleans, Louisiana, USA. January
2003. pdf, ps, ps.gz, bibtex.

# Niels Hallenberg, Martin Elsman, and Mads Tofte. Combining Region
Inference and Garbage Collection. In ACM SIGPLAN 2002 Conference on
Programming Language Design and Implementation (PLDI'02). Berlin, Germany.
June 2002. pdf, ps, ps.gz, bibtex.

# Niels Hallenberg. Combining Garbage Collection and Region Inference in
The ML Kit. Master's Thesis. Department of Computer Science, University of
Copenhagen. 1999. a4.ps.gz, letter.ps.gz.


As I said above, there are scenarios where region based memory management
can be better than GC.  One possibility is to take a Cyclone like
approach, and provide both mechanisms, but at the programmer's option.

http://www.eecs.harvard.edu/~greg/cyclone/

Region-based Memory Management in Cyclone, Dan Grossman, Greg Morrisett,
Trevor Jim, Michael Hicks, Yanling Wang, and James Cheney.  ACM Conference
on Programming Language Design and Implementation, pages 282--293, Berlin,
Germany, June, 2002.

Safe and Flexible Memory Management in Cyclone, Mike Hicks, Greg
Morrisett, Dan Grossman, and Trevor Jim.  University of Maryland Technical
Report CS-TR-4514, July 2003.


So, one might ask whether we might do the same thing -- i.e., provide a
MLton.Regions structure with explicit region based memory management
operations, so that the programmer could use them when appropriate.  I've
recently thought about this question:

http://www.cs.cornell.edu/People/fluet/rgn-monad/index.html

Unfortunately, I think my current conclusion is that the SML type system
is just a little too weak to really support this option.  We might be able
to provide a "poor-man's" version with dynamic checks, but I haven't
thought about this.


Final thoughts -- consider why one would choose region based memory
management.  One common arguement is that the region operations can all be
done in (approximately) constant time; therefore, you eliminate GC pause
times.  Comparing MLton to MLKit on the MLton benchmarks shows that we
certainly aren't losing out here, although I don't believe that any of the
benchmarks are GC bound.  Anoq implemented some games in MLton, and I
don't recall him having a problem with MLton's pause times.  However, if
this is really a concern, I suspect that trying to implement a real-time
GC for MLton would be an easier approach than trying to implement a region
system.