[MLton] Parallel Runtime System

Sat Jun 16 23:22:51 PDT 2007

On Sat, 2007-06-16 at 20:34 -0700, Eric McCorkle wrote:

> > I think you missed Matthew's meaning of "portable" -- he meant a
> > runtime portable across compilers (i.e. the runtime's clients), not
> > across platforms the runtime runs on.
> 
> 
> That was my point.  Ideally, it would be a runtime system that is easy
> for a compiler to target (or at least for a compiler that thinks the
> way most modern functional language compilers do).  Ideally it'd be
> something that other compilers could target.  I can't imagine a
> parallel runtime system for something like Haskell or Erlang (or even
> non-functional typesafe languages like Cyclone or Java) being terribly
> different from one for ML, at least at a distance.  Encoding type
> information is probably the hardest part of the interface, but even
> this would be a non-issue if typed assembly languages became common
> and some standard format was adopted.

Most functional languages don't require any 'type' information.
They would need 'shape' information to increase GC performance,
i.e. for a precise/exact GC.

Ocaml uses 1 bit to chose integer vs. pointer. Heap blocks
pointed at use a tag and count in one word to decide
the block length and if the data needs to be scanned by the gc.
Ocaml supports dynamic finalisers for Ocaml code (and static
finalisers for C code).

At the moment Felix uses a shape object with an array of offsets.
Each heap block contains a pointer to the shape object. There is
also some type information there such as human readable 
version of the C++ type name. Felix supports static finalisers.

> The real trouble will come from very nitpicky incompatibilities and
> issues between different compilers I imagine (for example, the 16-byte
> C stack alignment issue that was the trouble with Intel Darwin MLton
> or some similar wierdness)

I would think the differences are in execution models are more
profound. The garbage collector is only part of the story.

Ocaml uses the stack, and it supports threading, but threads
must be interleaved so it can't multi-processes. This is ensured
with a global lock. Ocaml appears to fully support callbacks
from C code. C++ exceptions aren't supported. Ocaml can tell
the difference between C and Ocaml pointers.

Felix uses the stack, but the gc can't be called unless it is 
empty because Felix has no way to scan the stack for pointers.
The execution model supports ultra-high performance fibres
(user space threads), as well as pre-emptive threading and
asynchronous I/O, which is supported by what amounts to
a user space operating system. The gc is thread safe, the model
supports multi-processing with world stop collection.
Felix only partially supports callbacks from C code.
Felix procedures have a distinct heap allocated spaghetti stack.

The Felix gc implementation is being changed to use Judy arrays
to find information about objects and keep track of all the
heap blocks -- the new method supports finding interior pointers
and distinguishing Felix and C pointers. Felix is 100% re-entrant,
that is, no 'static/global' variables are used anywhere,
although this will change for Posix signal handling.

Mlton has its own stack which is garbage collected.. this
is very cool! No idea how it manages its C interface.
MLton supports fast user space threading.

GHC also has a cool C interface and supports fast user space
threading.

OCS_scheme is written in Ocaml so its run-time is just a set
of Ocaml functions and it doesn't have a private GC.
It uses a hashtable of weak pointers to get unique addresses
for symbols to speed up comparisons.

GNU Java uses Boehm collector, as does Neko.

It's not clear when you say 'functional' language what you
actually mean. Underneath NONE of these systems are remotely
functional, and that is one of the major impediments to parallel
processing IMHO.

I posit that in fact if Haskell were to completely drop 
self-tail-rec-to-loop optimisation, optimisations for
STM monad, etc, so the execution engine was purely
functional, it would be a win with respect to parallel
programming because a parallel collector for a purely
functional execution engine appears to be very easy to
design. I.e. it would actually be faster to spew new closures
for all recursions that try to reuse a stack frame, because the
latter implies mutation.

OTOH a collector for an engine which supports mutable storage
is very hard to design.

Of course I'm not sure. Interestingly, circular data is
impossible for a purely functional language -- the only
way to create cycles is via closures. This would simplify
a collector for a purely functional execution engine even more.

BTW: design for purely functional gc: each thread has its
own minor heap in the form of a linear stack. When the stack
is full it can be compacted. This can be done independently
of other threads.

When a stack runs out of space, fully initialised objects
can be streamed into the major heap. This has to be done
serially, i.e. only one thread at a time can do this.
The major heap collection must also be stopped at this
time I think. The major heap can be collected by its own 
thread(s).

The parallel collector designs I've seen work like this
EXCEPT that they also allow mutation, which is typically
handled by a journal .. and this really starts to make
a superior mess of things. Once mutation is allowed the
major heap can point at one or more of the thread local
heaps and the invariant which allows the parallelism
of the design above is lost.

I actually think it would be interesting to look at using
type information better in collectors. For example most
collectors including Boehm distinguish objects that
can't contain pointers.

However some object can only contain certain pointers.
For example a list of 'a can only contain a 'a pointer
or a pointer to a list node. If 'a is int, for example,
there's no need to scan any int list nodes looking for pointers
when garbage collecting non-list-node type heap objects
(assuming int is unboxed).

More sophisticated static type analysis might help partition
the task of collecting garbage somehow.. 

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net