[MLton] max-heap setting for 64-bit applications

Wed Jan 6 12:29:37 PST 2010

On Mon, Dec 14, 2009 at 1:50 PM, Wesley W. Terpstra <wesley at terpstra.ca> wrote:
> On Mon, Dec 14, 2009 at 6:53 PM, Matthew Fluet <matthew.fluet at gmail.com> wrote:
>> I was thinking about the 32-bit case, in which case fragmenting the 4G
>> VM isn't terribly difficult.
>
> Ok, then let me address your original comment again:
>> But, the Windows specific mremap could still fail --- note that the
>> growHeap function demands significant growth from mremap.  If that
>> fails, then it attempts the alloc/copy, but allowing for an alloc of a
>> heap down to the minimum size.
>
> This sounds like a bad idea, then. The problem applies equally to all
> platforms (even linux).
>
> So, to be clear, we're talking about this situation:
> memory is available for the minimum size
> that memory is NOT large enough to accommodate "significant growth"
> the memory region contains the current mapping
>
> Under this case, AFAICT, every system fails to grow the heap, though
> it should succeed. Both windows and linux would've been fine if the GC
> had attempted to use mremap backed off to a lower value.
>
> I submit that this is a bug which should be fixed.

Agreed, although the trade-offs are not clear cut.  There was a bit of
churn in the heap resizing code in response to the thread started at:
  http://mlton.org/pipermail/mlton/2008-April/030230.html
It's worth reading through that thread, as it describes some
interesting behaviors and some of the pros/cons:
  http://mlton.org/pipermail/mlton/2008-May/030265.html -- should we
shrink a heap before attempting to allocate a larger heap?
  http://mlton.org/pipermail/mlton/2008-July/030285.html  -- mremap
(linux) can't always get as much memory as an unmap/mmap (ie., using
page to disk)

In any case, I remember that one problem with the previous code was
that the backoff scheme in remapHeap and createHeap used a linear
scheme.  That is, it backed off by (desiredSize - minSize) / 16 each
iteration, and then, as a last resort, tried minSize.  When
approaching an out-of-memory situation, we would tend to have a
minSize that was just 4K larger than the current heap size and a very
large desiredSize (because there is lots of live data) that was
unattainable, but was also so large that we couldn't mremap to minSize
+ (desiredSize - minSize) / 16.  So, as a last resort, we would mremap
to minSize = curSize + 4K --- gaining us all of 4K as a result of the
garbage collection.  And we'd very quickly be back in the garbage
collector where it would happen all over again.  Eventually, we would
move up by 4K increments until mremap couldn't succeed and we'd either
page to disk or die with out-of-memory.

That prompted the idea of "significant growth" in growHeap when
invoking remapHeap.
  http://mlton.org/cgi-bin/viewsvn.cgi?view=rev&rev=6783
Of course, this could just shift the problem to createHeap after a
page to disk, again not hitting the actual maximum available and
instead hitting minSize.

Some time later, I switched the backoff scheme to use a logarithmic(?) scheme.
  http://mlton.org/cgi-bin/viewsvn.cgi?view=rev&rev=7057
We would use more iterations, because we would shrink the amount of
backoff as we approached the minSize.  This makes the eventual
successful mremap or mmap much closer to the maximum size that could
be successfully mremap-ed or mmap-ed.  This has much better
"performance": on a memory-leak program, I would get an out-of-memory
error after about 5 minutes and two or three pagings to disk, rather
than after hour(s?) of 3G garbage collections that obtained 4K
increases in heap size and numerous pagings to disk.

Of course, this would also help with the original issue that prompted
the need for "significant growth".  Perhaps the adaptive backoff with
mremap would work satisfactorily with the true minSize.