[MLton] Ongoing parallel runtime work

Eric McCorkle ericmcc at cs.brown.edu
Sat Jan 19 18:57:22 PST 2008


Hello everyone,

I've been working steadily on my efforts to implement the parallel  
runtime system I described in earlier discussions.  The scheduler is  
at a point where I'm comfortable adding it to subversion.  I'll have  
to come back and do some work to implement the CML synchronization  
primitives, but the scheduler itself, as far as I can tell, is solid  
(I have a randomized test program I can send with it).  I already  
have subversion write access, but need a bit of direction on what  
subdirectories, etc my code should go under.

I also have an (as yet untested, but by-the-paper) implementation of  
Maged Michael's lock-free malloc.  This is primarily for the  
scheduler's sake, but will benefit whatever bits of C code exist as  
well.  It compiles, but this is C, so that means nothing.

Lastly, I'm currently working on a parallel, mostly lock-free garbage  
collector (it requires only a barrier at the beginning and end of  
collection right now, and I don't think this is going away).  It's  
unavoidable that I'll have to have a separate compilation mode to use  
the parallel collector, as I've had to alter the memory  
representations of objects, and different code generation will be  
necessary in some cases.

In particular, I've done the following:

- The object header now occupies a cache line, primarily because the  
collector has to perform a compare-and-set operation on a pointer to  
the object in the destination heap to "claim" it.
- The object header has some space for other information, like an  
array's length, as well as a pointer which is used to build queues.
- Arrays are (sometimes) preceded by a bitmap which is used to claim  
their elements (using compare-and-sets), allowing arrays to be  
scanned and copied by multiple collector threads.
- Global pointers have to be double-pointers, one of which is used by  
the collector, one by the program's threads (Cheng-Blelloch does  
this).  The roles are switched at the end of every collection (see my  
comments on generations).  There is a fairly clever trick to avoid  
having to check a global variable for every pointer access.
- Weak pointers got absorbed into the objects themselves.  They come  
after the regular pointers.
- Finalizers attach a cache line at the end of the object, and spawn  
a thread.  The thread wakes up only when the object is no longer  
reachable.
- Stack objects are gone (I'll allocate frames directly), and threads  
are dealt with by the scheduler, leaving only normal objects and  
arrays (for now).

In the beginning, I was planning not to do generational collection at  
this time, but I've built things to make it easy to add.  I'm  
reconsidering this, as supporting generational collection would make  
some aspects of calculating the root set alot easier (the global  
pointers and the currently-existing threads can be treated as  
externally-stored objects and described using type signatures).   
However, the price is that *all* pointers must be represented with  
double-pointers, not just global pointers.  Does anyone have  
practical experience/observations which can inform this decision?

-- 
Eric McCorkle
Brown University
CS Graduate Student


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mlton.org/pipermail/mlton/attachments/20080119/55708665/attachment.htm


More information about the MLton mailing list