[MLton-user] Destructive update

Stephen Weeks MLton-user@mlton.org
Wed, 8 Feb 2006 12:45:09 -0800


> The ram-slop is so I can comfortably run 2
> programs compiled like this on my machine without thrashing. However,
> one of my programs needs a lot of RAM, and so I invoke it with
> 
> program @MLton ram-slop 0.8 -- arguments
> 
> However, I've noticed that it only ever uses 40% of the machine's RAM,
> even though I strongly suspect it would use more if it could. Am I
> correctly overriding the ram-slop parameter that I set during
> compilation?

If it is a 4G machine, then sometimes MLton will get stuck with only
1G due to address space fragmentation.  Another possibility is that
MLton is doing two-space copying collection, and so the 40% you see is
for one semispace, while there is an ocassional peak at 80% during
GC.  Although, MLton usually tries to hold on to the unused semispace
because there is a penalty to re-mmapping the pages.

Running with

  @MLton gc-messages gc-summary --

should provide plenty of data to figure out what's going on.


As to the confusion between ram-slop, fixed-heap, and max-heap, here's
an explanation, along with some code.

ram-slop sets how much RAM MLton thinks there is.  Here's the actual
computation from gc.c.

        s->ram = align (s->ramSlop * s->totalRam, s->pageSize);

MLton uses the ratio of s->ram to the amount of live data to determine
how big to make the heap and what type of GC to use.  Again, from gc.c:

        ratio = (float)s->ram / (float)live;

If the ratio is high, MLton will use two-space copying GC.  If the
ratio gets very low, then MLton will use its mark-compact,
generational GC, with a very low space overhead, in an attempt to cut
down on paging.  ram-slop does not limit how much memory MLton will
use, but by setting it lower, MLton will switch to the low-overhead GC
sooner.

Once MLton has determined the desired heap size, taking ram-slop into
account, it considers fixed-heap and max-heap (only one of which may
be set).  Here's the code, where "res" has already been computed as
the desired heap size.

        if (s->fixedHeap > 0) {
                if (res > s->fixedHeap / 2)
                        res = s->fixedHeap;
                else
                        res = s->fixedHeap / 2;
                if (res < live)
                        die ("Out of memory with fixed heap size %s.",
                                uintToCommaString (s->fixedHeap));
        } else if (s->maxHeap > 0) {
                if (res > s->maxHeap)
                        res = s->maxHeap;
                if (res < live)
                        die ("Out of memory with max heap size %s.",
                                uintToCommaString (s->maxHeap));
        }

The fixedHeap case explains Matthew's statement:

  You can always try running with "@MLton fixed-heap ??? --", in which
  case the total heap will be exactly the specified size (either split
  into two semi-spaces for copying GC or one large space for
  mark-compact GC).

It also explains how fixed-heap overrides ram-slop, while not
completely obviating it, as ram-slop is still used in computing the
desired heap size.

The maxHeap case explains Matthew's other comment:

  But max-heap will only limit the heap if the desired heap size (as
  calculated by the various live ratios) is greater than the max-heap
  flag. If the desired heap size never exceed the max-heap, then you
  should get exactly the same behavior as if you didn't specify a
  max-heap.

If you really want to know what's going on, I encourage you to read
heapDesiredSize in gc.c.  It's about 30 lines of code and 30 lines of
comments, so is pretty easy to follow.


Personally, I use fixed-heap if I want something to run really fast, I
have some feel for peak memory use, and I have a machine to dedicate.
That way, I can give the process as much of the RAM as I can, and it
never needs to mess with growing/shrinking the heap and suffering the
associated costs.  This works fine with two MLton processes running
simultaneously too.  Just keep the sum of the fixed heaps

max-heap is more for limiting out-of-control things, since, as Matthew
correctly points out, is has no impact unless the program tries to use
more than max-heap allows.