[MLton] More on Parallel Runtime

Eric McCorkle ericmcc at cs.brown.edu
Sat Oct 20 17:13:19 PDT 2007


On Oct 20, 2007, at 3:40 PM, Philip Schatz wrote:

> My current solution uses separate heaps for each pthread, the mark- 
> compact algorithm without card-marking, and a global lock for  
> collection and atomic operations.
>

The design I had in mind grabs a big block of memory per executor,  
and allocates by advancing a pointer, grabbing more when it runs  
out.  What would be the advantages of a separate heap for each executor?

> Notes:
> - The "executor" model seemed necessary because I found no way in  
> the FFI to spawn pthreads with different (unit -> unit) functions.
>
> - A global lock for GC_collect and atomic operations  
> (Thread_atomicBegin/End). In the interest of changing as little of  
> MLton's runtime as possible, global locks seemed a quick-and-dirty  
> solution.
>
> - C-codegen: to look up per-pthread data with minimal changes to  
> the compiler (I didn't want to change the x86 codegen until I knew  
> it worked).
>
> - Per-pthread heaps: to not require locks when allocating heap space.
>
> - Split GC_state structure into global data (args, globals, summary  
> info) and pthread-specific data (heap, frontier, stackTop, etc).
>
> - Thread_atomicBegin/End are now function calls that acquire the  
> global lock. This was because I needed the frontier/stack  
> information to be when a GC_collect occurred in another pthread.
>
>

I'm investigating an idea, which requires me to build alot more from  
scratch.  I suspect a runtime based on M:N threading, garbage  
collection (and heap-allocated stack frames), and activations will be  
much more amenable to concurrent functional languages.  The C-style  
runtime is geared towards imperative (and mostly toward non- 
concurrent) programs.  I want to see how concurrent functional  
programming works when it's running in an environment built for  
specifically for it.

An interesting note about this runtime model is that it makes  
implementation of a bare-metal port pretty straightforward.


> I have tested this code with a few programs that spawn a simple  
> "executor" on multiple pthreads. Currently I'm trying to get CML  
> working using this model and have run into a little snag getting  
> signal handling to work (CML uses preemptive threads). Well-defined  
> safe-points would be an elegant way to overcome this.
>

I've not done signalling yet, but I've prepared for it.  I have a  
separate signal thread on the pthread version, and the activation  
versions will get a signal stack.

> I'd be happy to send you what I have. A home in MLton's SVN seems  
> like the best place to work from.
>
> To answer your questions:
> a) Correct. The runtime uses GC_state's frameLayouts and  
> objectTypes to traverse the heap.
> b) They are generated in the mlton/backend/backend.fun  
> (frameLayouts and frameOffsets) and in mlton/backend/ssa-to- 
> rssa.fun (objectTypes)
>

I'd like to compare the two implementations once I get my garbage  
collector going.  I'm hoping to be debugging the runtime in whole by  
the end of the year.  I already have a random test program for the  
threading system.

It bears mention that the code I have so far hits alot of corner  
cases in the C compiler dealing with volatile variables.  Different  
optimization levels result in crash vs no crash.  So far, I've  
identified 2 "oddities" (based on actually disassembling programs)  
which I strongly believe to be Intel C Compiler bugs.

-- 
Eric McCorkle
Brown University
CS Graduate Student





More information about the MLton mailing list