[MLton] Crash fread(...) failed (only read 0) (Cannot allocate memory) during deepFlatten with MLton 20070826

Tue Apr 22 17:27:54 PDT 2008

On Tue, 22 Apr 2008, Nicolas Bertolotti wrote:
>>> One thing I notice is that when writing the heap to disk, we do not
>>> release the card/cross maps.  Although with a 1.6G heap, these are only
>>> 12.8M, if they happen to fall in the middle of the virtual address
>> space,
>>> then it may not be possible to find the requested 1.6G heap.
>>>
>>> You could try the attached patch, which releases the card/cross maps
>> after
>>> writing the heap to disk.
>
> I could notice that, when the heap is copied to the disk and 
> createHeap() is called to allocate a new heap, there are multiple 
> iterations of the "backoff" loop before mmap() succeeds. As a 
> consequence, the newly allocated heap is not as big as what the GC 
> algorithm would normally expect.

Right, if we are required to backoff, then we don't have a heap of the 
desired size.  But, the runtime doesn't carry information about the 
desired size from one GC to the next.  At a garbage collection, the 
decision whether to use the Cheney Copy gc or the Mark Compact gc is 
dependent upon whether or not there is space for a secondary heap.

> Using your latest patch (still combined with mine), I could notice that 
> we don't loop anymore and the new heap size corresponds to what the GC 
> algorithm expects (Then, I guess it will also make it possible to run 
> more instructions without entering GC again).

The desired size gives the heap a lot more space than the minimum size. 
The minimum size is just enough to satisfy the last heap limit check that 
failed -- it probably doesn't cover more than 6 or so basic blocks, and it 
doesn't cover any allocating loops.  So, if we only get a heap of the 
minimum size, then we are almost certainly going to fail the next limit 
check and go through the entire process of a major collection and heap 
resizing (with a possible page to disk and back).

> So, it really seems that releasing the card/cross map before allocating 
> the heap is a good way to reduce memory fragmentation.

Good.  I'll commit my patch as well as yours.

> It might be interesting to release and rebuild the card/cross map in 
> other circumstances in order to achieve better performances:
> - if remapHeap() fails, we could retry after releasing the card/cross
>   map before deciding to create a new heap and copy the existing one.
> - if createHeap() fails, we could retry after releasing the card/cross
>   map before deciding to back up the existing heap to the disk.
> ... it depends on what the cost of rebuilding the card/cross map is (for 
> the 2nd option, I guess it is certainly much less than transferring 1.5 
> Gb to disk).

Rebuilding the cross map is linear in the number of objects in the heap. 
But, that is certainly cheaper than the write/read of the heap to disk.

> What do you think?

Those seem like reasonable heuristics.  Another reasonable heuristic might 
be to release the card/cross maps preemptively if the desired size is > 
1/4 of the virtual address space, since in that situation, any 
fragmentation makes it difficult to satisfy that size.

Tested patches are certainly welcome, especially since you have examples 
that would benefit from such heuristics.  You might also put in some gc 
messages that report when remapHeap and createHeap fail, to determine 
whether it is really beneficial.

BTW, if you are just changing the runtime, you might find it useful to 
rebuild the compiler with '-keep o', which will leave all of the object 
files from the compiler around.  Then you can compile with
'mlton -output mlton-compile *.o' which will just link the object files 
with the new runtime.

> P.S.: the compilation no longer crashes on my 2 Gb machine so it seems 
> that the initial issue has now been fully fixed.
>
> Unfortunately, the compilation swaps a lot (really a lot) during 
> deepFlatten and I'm afraid it has become impossible to build the product 
> using a machine that has less than 3 Gb of RAM. Any idea about what we 
> could do to limit the memory consumption during this phase?

I've previously suggested trying the 'hash-cons p' runtime option, that 
instructs the runtime to hash-cons every 1/p (major) garbage collections. 
This can reduce the live data by sharing data.  Of course, the 
hash-consing table requires some space, which is may be hard to come by in 
this situation.

Beyond that, there aren't any runtime or compiler options that will affect 
the memory usage of the deepFlatten pass.  Either disable the pass 
completely ('-drop-pass deepFlatten') or you will need to investigate 
algorithmic changes to the deepFlatten pass.

-Matthew