[MLton] Transactions for ML

Thu Apr 19 22:39:43 PDT 2007

On Apr 19, 2007, at 8:31 PM, Suresh Jagannathan wrote:

> As Matthew rightly points out, the stabilizer implementation is  
> easily adapted to support
> concurrent transactions.  Our current implementation provides an  
> STM interface that
> implements a pessimistic write/optimistic read incarnation,  
> although we have
> experimented in the past with an optimistic (logging) write version  
> as well.

McRT papers suggest that read versioning/undo-logging is the best  
approach.  These strategies are much better for compiler  
optimizations as well.  The biggest problem with undo logging,  
however, is that you either have to do eager conflict detection on  
every access, or you have to somehow deal with the possibility of  
seeing inconsistent data (McRT opts for the former).

I have a list of ideas for how a compiler can do things that a  
library can't possibly do.  To name a few: generate undo logs as  
actual code, rather than building a data structure (similar to  
implementing exceptions with continuations versus unwinding the  
stack), use knowledge of transaction behavior available at compile  
time to do things better.

There's also several optimizations one can perform on transactions  
themselves (most of these require very sophisticated analysis)

> Of course, porting this implementation to a multiprocessor/multi- 
> core environment is
> significantly more challenging, requiring modifying the garbage  
> collector, atomic
> primitives

The most advanced schedulers, like FreeBSD's ULE scheduler maintain a  
separate structure for each OS-level thread, and only communicate  
between schedulers to steal work.  You need a barrier to make sure no  
one tries to steal work while legitimate scheduling is going on, but  
aside from that, with lock-free data structures, one could should be  
able to implement a scheduler with no locks at all.

I know significantly less about garbage collectors.  If I recall,  
cheng-blelloch (which is the only one I know anything about) claims  
only to need a barrier and a room, though I'm not sure.

Modern STM systems use locks and memory allocation, and in the worst  
case can invoke arbitrarily complex contention managers.

My intuition at this point would be to have the scheduler run  
separate from everything, the garbage collector use locks, and the TM  
system allocate from the heap and use locks.  The sticking point for  
me is how to implement the scheduler.  If you do it in C or something  
else, then communicating with SML efficiently becomes an issue.   
However, if you do it in SML, you have to somehow avoid allocating  
memory while in the scheduler context.  How do you and philip deal  
with this, or do you organize things differently than what I'm  
imagining?

-- 
Eric McCorkle
Brown University
CS Graduate Student

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mlton.org/pipermail/mlton/attachments/20070420/b2ce9f4d/attachment.htm