[MLton-user] MLton on mswin7 absurdly slow ?

Thu Feb 10 10:14:28 PST 2011

On Tue, Feb 8, 2011 at 4:05 PM, John B Thiel <jbthiel at gmail.com> wrote:
> Thanks, Wesley and Matthew.  Yes, my setups definitely do not
> facilitate any single program running off with 1/2 of physical RAM.
> That seems to me not the best default nowadays - 10 or 20% maybe, ok.
>  12+ apps is about my minimum working set and I never have half the
> core just sitting around unused, regardless of qty.  I'd say it should
> default to a lower % heap limit, and let the user dial up as needed.
> (and be more aggressive with GC, see below).  If it fails early with
> "out of memory"; then at least the user has feedback to
> research/adjust parameters.  (such parameter could also be set 1-time
> in a user-specific config/ini file).

Its a classic time-space tradeoff.  The larger one allows the heap to
grow, the less time one spends in garbage collection.  Conversely, the
smaller one makes the heap, the more time one spends in garbage
collection.

With regards to setting the ratio, the "ram-slop" runtime option sets
the fraction of physical memory that a MLton compiled executable
attempts to use as an upper bound for its heap; as noted, it defaults
to 0.5  In any MLton compiled executable (including the mlton compiler
itself), you can use "@MLton ram-slop 0.1 --" to change it.  There
isn't a convenient way to globally change it for all MLton compiled
executables that you build or run; one would need to recompile the
runtime system.  I suppose it would be possible to pull runtime
options from environment variables.

However, there is a "cheat" for the MLton compiler itself.  The "real"
compiler executable is invoked by a shell/batch script.  You can edit
that script, which lives at "/usr/local/bin/mlton" for a Unix package
and wherever the "mlton.bat" ends up for the MinGW package.  You'll
see that there is an instance of "ram-slop 0.5" in that script, which
you could change to a different value better suited for your machine.

> In regards the actual allocation, I traced those gc-messages, Matthew,
> see below.  The "major mark-compact" is very successful at trimming
> the heap, but is only invoked when allocation hits the limit.  See
> gc#15, and gc#36 in first trace (1) excerpt below (100MB heap limit).
> It looks like the high water mark of live data is around 62+M
> remaining allocated after the "major mark compact" in gc#15.
> (confirmed with further test -- helloworld compile succeeds with 75MB
> max-heap, fails with 70MB. Same with fixed-heap.)

Any major collection will retain exactly the same amount of live data.
 So, it isn't the case that a mark-compact is "better" at garbage
collection than a (major) Cheney-copy.  They simply have different
properties in terms of total heap space required (Cheney copies
requires two semi-spaces of equal size, so will require memory for 2x
max live data, while mark compact works in place, so will require only
memory for 1x max live data) and the running time overhead (Cheney
copy is proportional to the size of the live data, while mark compact
is proportional to the heap size).

> The second trace (2) excerpt below illustrates running with no heap
> limit.  I did not see any "major mark compact" calls at all, and the
> heap rapidly balloons to 256MB, with the coup-de-grace in gc#19 of a
> heap-to-heap "major Cheney copy":
>        [GC: Starting major Cheney-copy;]
>        [GC:    from heap at 0x60000000 of size 256,770,048 bytes,]
>        [GC:    to heap at 0x08000000 of size 256,770,048 bytes.]
>
> So there's the expected 512MB allocation rail.
>
> Since there is really only at most 75M of live data at this point,
> this appears to be a theoretical 6x+ overallocation (or 3x if the 75M
> also involved a heap-heap "major Cheny copy", but in fact I did not
> see one in that trace).   Whether 6x or 3x, that is still a pretty
> high GC overhead, so it looks like some GC tuning development work
> could be helpful, with at least part of the answer being more  "major
> mark compacts" to keep the heap size down.  I would rather spare the
> 300ms for those in lieu of allocations.

This "overallocation" is another runtime configurable property:
"@MLton live-ratio 8.0"  By default, MLton attempts to (re)size the
total heap (which is either the one heap for mark-compact or the two
semi-spaces for Cheney-copy) after a major garbage collection to be
live-ratio times the live data retained by that garbage collection.
(With a little bit of slop to avoid resizing by a small amount and
other limits to stay within the ram-slop ratio.)  So, if you would
like heaps to stay closer to the live data, then run with a smaller
live-ratio.  The "penalty" will be more garbage collections.  But, as
you note, one may be willing to pay the time for an in RAM garbage
collection rather than incurring the overhead of paging when there are
other processes competing for RAM.

All that said, I don't disagree with the idea that GC tuning would be
helpful.  Indeed, there are some reasonable theoretical results about
garbage collections with heaps of fixed size, but I don't know of work
that really looks carefully at the question of dynamically resizing
the heap in response to the live data demands of the application,
similarly, in response to the behavior of the operating system.
Indeed, MLton has a number of (undocumented) controls in addition to
the above that try to get the runtime to "do the right thing" in
various situations.  But, its a tricky problem.