[MLton] callcc and space use

Wed Feb 13 19:11:44 PST 2008

On Mon, 11 Feb 2008, Daniel Spoonhower wrote:
> Vesa Karvonen wrote:
>>  On Feb 6, 2008 8:49 PM, Daniel Spoonhower <spoons at cs.cmu.edu> wrote:
>> >  After some investigation, I think I have an answer to my problem.  In
>> >  the end, it seems the problem had nothing to do with callcc, but with
>> >  part of the analysis used in the reference flattening optimization
>> >  (http://mlton.org/RefFlatten).
>>  [...]
>>
>>  Interestingly, the patch eliminates the space leak from the two test
>>  cases you gave earlier, but does not remove it from a modified version
>>  of your second test case.  The modified version uses threads rather
>>  than continuations.  I wrote it (back when you sent the test cases) to
>>  see whether or not continuations make it simpler or not (IMO, they do
>>  not; the version using threads is simpler, IMO).  (The modified
>>  version has been intentionally kept close to the original to reduce
>>  diffs.)  See the attachment.  At any rate, the space leak definitely
>>  seems to be caused by a bug in MLton.  I do not know enough about
>>  RefFlatten to comment on your patch.  Perhaps RefFlatten is still not
>>  conservative enough.
>
> I took a closer look at Vesa's version of the code (the one using threads). 
> The leak here goes away if I add the explicit GC at each iteration.  I can't 
> find any difference in the output of RefFlatten between the version with the 
> GC and the one without, so I don't think the problem is there.  In fact, 
> there isn't anything in the generated code that would lead me to expect the 
> version with an explicit GC to use less space (in terms of the maximum size 
> of live objects).
>
> Most (>99%) of the heap at the high-water mark is tied up with stack objects. 
> I think the difference in performance is caused by the fact that when a GC 
> occurs, it shrinks non-active stacks (i.e. it only reserves space for the 
> portion of the stack that is in use).  When there are many threads and GCs 
> are infrequent, much of the heap is tied up with the unused portions of these 
> stacks.  If I disable the thread shrinking (by setting thread-shrink-ratio to 
> 1.0), the performance of the two thread-based versions is about the same. 
> (The version with explicit GCs reports slightly higher "max live" as I would 
> expect since the version without GCs may not sample at the high-water mark.)

I think I agree with Dan here.  MLton's default thread shrinking policy is 
to attempt to halve the reserved size of the stack each time a stack is 
live at a garbage collection.  Hence, a thread may need to survive through 
many GCs to be shrunk completely.  Dan noted that disabling thread 
shrinking (by @MLton thread-shrink-ratio 1.0 --) causes the extra 
MLton.GC.collect calls to have no effect.  Similarly, I note that forcing 
threads to shrink completely (by @MLton thread-shrink-ratio 0.0 --) causes 
the "space leak" to disappear.

It is somewhat interesting that the callcc version doesn't suffer from the 
same problem.  The key is that when performing a thread copy (both copying 
the current thread and copying a saved thread), the new thread is 
allocated with reserved equal to used (up to alignment requirements). In 
both versions, the "myrev" function calls "copy", which is a non-tail 
recursive function.  Hence, "myrev" grows the stack proportional to the 
size of the input list.  However, once the "copy" is done, the used stack 
space is quite small.  In the callcc version, "fork" uses callcc and 
copies only the used stack.  Hence, every suspended computation has a very 
small stack.  In the thread version, "fork" uses switch and holds a 
reference to the current stack (with a large reserved size).  Hence, every 
suspended computation has a large stack.  This is particularly bad because 
the longest suspended computations are the first computations that were 
suspended, which copied a long input list and therefore have the largest 
reserved stack size.

Perhaps a more aggressive stack resizing policy should be used when the 
used/reserved ratio is very low.