[MLton-user] Destructive update

Matthew Fluet fluet@cs.cornell.edu
Wed, 8 Feb 2006 09:55:37 -0500 (EST)


> 1. I compile all my programs with the command
>
> mlton -basis 1997 -verbose 1 -runtime 'ram-slop 0.4'
>
> The 1997 basis is so I can also develop with Moscow ML, mainly for its
> read-eval-print loop. The ram-slop is so I can comfortably run 2
> programs compiled like this on my machine without thrashing. However,
> one of my programs needs a lot of RAM, and so I invoke it with
>
> program @MLton ram-slop 0.8 -- arguments
>
> However, I've noticed that it only ever uses 40% of the machine's RAM,
> even though I strongly suspect it would use more if it could. Am I
> correctly overriding the ram-slop parameter that I set during
> compilation?

That is the correct way of overriding the ram-slop parameter.  You can 
confirm this by running your program with "@MLton gc-messages --".  The 
first message should be of the form

total RAM = 4,188,676,096  RAM = 3,350,941,696

where total RAM is the machine memory and RAM is the maximum memory that 
will be used by the heap.

Various other factors can limit the actual ammount of memory used by the 
program.  For one, MLton's garbage collector requires a contiguous heap, 
so we're at the mercy of malloc and mmap.  Also, there is a preference to 
keep the live ratio within certain bounds, so if a lot of garbage is 
recovered at each garbage collection, then the heap won't necessarily grow 
very large.

You can always try running with "@MLton fixed-heap ??? --", in which case 
the total heap will be exactly the specified size (either split into two 
semi-spaces for copying GC or one large space for mark-compact GC).
This is probably fine for non-interactive programs, where the relatively 
long pause time to GC a large heap isn't a problem.

> 2. My ML programming style, if I can call it that, makes intensive use
> of dictionaries, which have the following signature
>
> new : ('a * 'a -> order) -> ('a,'b) dict
> peek : ('a,'b) dict -> 'a -> 'b option
> insert : ('a,'b) dict -> 'a * 'b -> ('a,'b) dict
> foldl : ('a * 'b * 'c -> 'c) -> 'c -> ('a,'b) dict -> 'c
>
> I use them like arrays with a polymorphic key, and much of the time
> they could safely be destructively updated without breaking my
> programs.

That's a fairly common programming style, using purely functional data 
structures.  It does happen to play into the live ratio discussed above: 
using such data structures tends to produce a lot of garbage, because only 
the latest instance of the dictionary is live.  (On the other hand, this 
is exactly the scenario that supports the generational hypothesis.)

> a) Should I be using some other container type? How is this problem
> usually solved in the source code of the MLton compiler?

The imperative features of ML can be put to good use to destructively 
update a data structure.  One of the most commonly used data structures in 
the MLton compiler is a mutable HashSet:

http://mlton.org/cgi-bin/viewsvn.cgi/mlton/trunk/lib/mlton/basic/hash-set.sig?rev=4014

> b) Have you thought of doing a compiler analysis to see what data
> structures can be destructively updated? It's not my field of
> expertise, but I believe Clean-style uniqueness types can be used to
> do this, and I'm sure there are other methods that I'm ignorant of.
> Two nice things about this analysis are that (i) the programmer
> doesn't need to know anything about it, and (ii) if the analysis
> returns "I don't know" the compiler can just do what it would normally
> do.

It is certainly something that we've thought about.  I believe the 
compiler ILs are expressive enough to handle the transformation from a 
single-threaded functional data structure into a mutable data structure. 
Its finding a sound analysis that scales to all of the IL features that 
isn't entirely clear.