[MLton] RE: card/cross map in heap

Nicolas Bertolotti Nicolas.Bertolotti at mathworks.fr
Mon Sep 8 10:16:32 PDT 2008


I like the idea of recording a mmapMaxHeap statistic.

Anyway, and this is the only reason why I have not already implemented it in order to give a try, I am concerned about the fact there may be some memory pressure from other processes.

In order to exploit the multi-core architectures, we would like to run multiple processes which perform parts of the verification process in parallel.

Then, I would like to be able to suspend the execution of one process (after it has dumped its heap to the disk) and wait until the other process uses less memory to continue (This is easy to do by replacing the "die" when "createHeap" fails with a sleep/retry loop).

In that case, the mmapMaxHeap statistic is not exactly the appropriate answer and using a fixed heap is also not what we want.

Also, there are 2 very different situations which can cause a "createHeap" to fail:
- not enough RAM to satisfy the allocation
- not a big enough logical address range to satisfy the allocation

In the first situation, assuming that the whole multi-process program is the only one that will really use a large amount of RAM, there is a way to determine whether or not the allocation can succeed in the future (based on the total amount of RAM and the fraction of RAM that is used by the system. We could approximate it by computing the amount of memory that is already used at the beginning of the execution of the process).

I am more concerned about the second situation. In that case, we should be able to determine (before we actually try to page to disk) that the allocation will never succeed and always choose to continue with the existing heap. On Linux, the file /proc/<pid>/maps can be used to answer that question but I could not find a way to implement such an algorithm.

> -----Original Message-----
> From: Matthew Fluet [mailto:fluet at tti-c.org]
> Sent: Wednesday, August 20, 2008 12:53 AM
> To: Nicolas Bertolotti
> Cc: mlton at mlton.org
> Subject: RE: [MLton] RE: card/cross map in heap
>
> On Thu, 17 Jul 2008, Matthew Fluet wrote:
> > The ideal solution, especially for a situation like yours, where you are
> > happy to use lots of memory on a dedicated machine, is to use
> > @MLton fixed-heap 3.5G -- to grab as large a heap as you can (that
> > comfortably fits in physical memory) at the begining of the program and
> never
> > bother resizing it.  As I understand it, resizing is only to 'play nice'
> with
> > other processes in the system.
> >
> > The problem with fixed-heap, though, is that the runtime starts off
> trying to
> > use the Cheney-copy collector (so, it really grabs 1/2 * 3.5G) and it
> may be
> > some time before it is forced to use the mark-compact collector, and it
> is
> > only at that point that the runtime will try to grab the 3.5G. Since
> > fixed-heap affects the desiredSize (but not the minSize), you really
> need to
> > set fixed-heap to the size that is actually able to be allocated, so
> that
> > desiredSize == currentSize, and no resizing occurs.
>
> Some thoughts that have occured to me.
>
> First, I remarked earlier that we would sometimes like to know "if I unmap
> this memory, can I mmap this size?"  While there is no general way of
> answering this question, I believe that (for the tight memory situations
> where it becomes an issue) we have some useful information around.  In
> particular, after paging the heap to disk, if the subsequent createHeap is
> forced to back off, then we have a reasonable upper bound on the amount of
> memory we should ever ask for in the future.  In particular, because we
> have paged the heap to disk, we know that we have freed up as much memory
> as we possible can.  If mmap can't satisfy our request in this situation,
> then we might have gone over the size of a contiguous mapping in our
> address space.  If that is the case, then there is no need to subsequently
> page the heap to disk and try to allocate a larger heap -- we've already
> got as large a heap as we can get.  (Given that, we may want createHeap to
> use a finer-grained backoff when used after paging the heap to disk; that
> would really find the largest sized heap.)
>
> Of course, the other reason mmap may fail is that the operating system
> virtual memory mangager can't currently satisfy our request along with the
> outstanding requests of other processes.  That is, mmap may fail because
> the MM won't over commit the physical/swap pages.  It is possible that a
> subsequent mmap will succeed, if in the intervening time, other processes
> have given up memory.
>
> That seems to suggest the following policy:
>
>   - record a  mmapMaxHeap  statistic.
>     This statistic is updated whenever createHeap is forced to backoff
> when
>     no extra memory is being used (that is, at the initial createHeap and
> a
>     createHeap after the heap has been paged to disk).
>
>   - take  mmapMaxHeap  into account when resizing the heap.
>     In particular, don't let the desiredSize exceed the mmapMaxHeap; it is
>     better to stick with a heap that is at mmapMaxHeap size than to try
>     paging to disk.
>     There is one exception to this policy.  If minSize > mmapMaxHeap, then
>     we should allow the runtime to page the heap to disk and try to mmap
>     the desired memory.  This helps handle the case that we saw an
>     mmapMaxHeap because of memory pressure in the system.
>
> Technically, it is possible that an inability of mmap to satisfy a minSize
> request could be a temporary situation, due to memory pressure from other
> processes.  One could always wait and try again in this situation.
> However, anyone running a program that needs >2.5G memory probably knows
> better than to run other high-memory processes at the same time and/or
> provides a decent swap file/partition.  So, I suspect that it is fairly
> safe to assume that mmap failing to satisfy a minSize request corresponds
> to a hard limit in the virtual address space for a contiguous map, and
> thus corresponds to a true out-of-memory situation.
>
>
> The second thought is that I wonder if the heap would be easier to
> predict/control if we used one contiguous heap at all times (that is, even
> for major copying collections).  In particular, it would be nice to have
> the behavior described above for fixed-heap --- namely, that one could use
> fixed-heap to grab a large block of memory at the beginning of the program
> and there would be no subsequent resizing, even if the runtime switched
> over from major copying collections to major mark-compact collections.  I
> note that the original Samson91 paper
> (http://mlton.org/References#Sansom91) works with a single contiguous
> heap.  I'm not sure that I understand the advantage of the current
> implementation, where the secondaryHeap is managed separately, potentially
> created and released a number of times during the execution of a program.



More information about the MLton mailing list