local refs

Matthew Fluet fluet@CS.Cornell.EDU
Fri, 7 Dec 2001 11:59:02 -0500 (EST)


> copyCurrent can be used to implement both threads and continuations.
> To implement continuations, callcc uses copyCurrent and throw uses
> copyToThread followed by switchTo.  

Yep.

> To implement threads, the library
> does a copyCurrent at the beginning of the program to stash away a
> baseline thread that can later be used by Thread.spawn, which will
> copyToThread the baseline thread.

And the reason we do a copyCurrent at the beginning of the program is
because spawning a function f should execute f in essentially an empty
stack; when f finishes executing, control doesn't resume at the point of
the spawn (if f raises or doesn't exit properly, we get nasty messages);
likewise, f's execution shouldn't keep live the stuff on the stack at the
point of the spawn.

> Hopefully, this makes clear the fact that threads cannot be copied
> except by using copyCurrent, and hence the following worries
> unfounded.

O.k.  This makes a lot more sense.  From misc/primitive.sml we have:

      structure Thread =
	 struct
	    type thread = thread
	    val atomicBegin = _ffi "Thread_atomicBegin": unit -> unit;
	    val atomicEnd = _ffi "Thread_atomicEnd": unit -> unit;
	    val clearSaved = _ffi "Thread_clearSaved": unit -> unit;
	    (* copy stores the copy in saved. *)
	    val copy = _prim "Thread_copy": thread -> unit;
	    (* copyCurrent stores the copy in saved. *)
	    val copyCurrent = _prim "Thread_copyCurrent": unit -> unit;
	    val current = _prim "Thread_current": unit -> thread;
	    val finishHandler = _prim "Thread_finishHandler": thread -> unit;
	    val saved = _ffi "Thread_saved": unit -> thread;
	    val setHandler = _ffi "Thread_setHandler": thread -> unit;
	    val switchTo = _prim "Thread_switchTo": thread -> unit;
	 end      

So, 
> 	    val copyCurrent: unit -> preThread
corresponds to:  (copyCurrent () ; saved ())
> 	    val copyToThread: preThread -> thread
corresponds to:  (copy t ; saved ())
> 	    val current: unit -> thread
corresponds to:  (current ())
> 	    val switchTo: thread -> unit
corresponds to:  (switchTo t)

I note that 
   val copyToThread: preThread -> thread <==>
   val copy = _prim "Thread_copy": thread -> unit ==>
   Prim.Name.Thread_copy ==>
   invokeRuntime {prim = Prim.Name.Thread_copy, ...} ==>
   C-call to GC_copyThread

where

inline void
GC_copyThread(GC_state s, GC_thread t)
{
	GC_enter (s);
	assert (t->stack->reserved == t->stack->used);
	copyThread (s, t, STACK_HEADER, stackNeedsReserved(s, t->stack));
	assert(s->frontier <= s->limit);
	leave(s);
}

Now that assert makes sense.  It's never valid to try to do
 copy (current ())
because the current thread always has a little stack slop, whereas
suspended threads have reserved == used.

> BTW, there is still the usual mess with separating copyCurrent and
> copyToThread into two pieces, one that does the copy, and the other
> that extracts it from saved.  But that's an orthogonal issue.

Right; invokeRuntime's don't return arguments, because the invocation
might switch threads, and we don't have anywhere to stash the return
result until we get back to the thread that wanted it.  So, we store the
return result in globals and fetch them with ffi's.  This, in turn, only
works out in this case because we ensure that no other thread trashes the
stored values by using atomicBegin/End.  So, maybe a better
correspondence is:

> 	    val copyCurrent: unit -> preThread
corresponds to:  (atomicBegin (); copyCurrent () ; saved (); atomicEnd ())
> 	    val copyToThread: preThread -> thread
corresponds to:  (atomicBegin (); copy () ; saved (); atomicEnd ())

Although, in practice, we don't tightly wrap with atomicBegin/End.  Also,
switchTo performs an atomicEnd automatically.

Just trying to wrap my head around this...

> > The other aspect that is missing is atomicBegin/End, which really do
> > demark single-threaded portions of the program.  Of course, those are
> > FFI's, so I don't know how useful they are, but it would be trivial to
> > turn them into Prim.t's that are implemented as FFI's.
> 
> I'm not sure why this is relevant.  The fact that a region is atomic
> does not have any connection with whether it's labels are
> multiThreaded or multUse.

It's not relevant under my new-found understanding of threads.

> Here are my new rules.
> 
> Ref localization rule:
> You can localize a global ref if all uses are within a single function
> and the function is not multi-used.

Agreed.  One thing I was missing before, but is now incorporated in
local-ref.fun, is that uses of a global in other globals count as a use at
a function unequal to every other function (i.e., the initGlobals
function).

Just minor tweaking needed to get "idempotent" localization:
You can localize a global ref R in function f if 
(a) all uses of R in globals are of the form R' = Ref_ref R
     where R' is localizable in function f and
(b) all other uses of R are within function f and
(c) f is not multi-used

> A local ref can be flattened if all uses are assigns or derefs and it
> all uses that are live across a call to Thread_copyCurrent are of the
> same flavor.
> (which I believe is your original definition, modulo terminology :-)
> 
> The reasoning behind the flattening rule is that even if blocks are
> multiThreaded, because threads cannot be copied at arbitrary points, a
> ref can only be shared if a copyCurrent happens in the thread after
> the ref's creation.

I think we need "live across a call to a potential Thread_copyCurrent", by
which I mean either an actual call to Thread_copyCurrent or a non-tail
call to a function that reaches a call to Thread_copyCurrent.

This is all becoming a little more clear.  Thinking back to my earlier
example where I had a ref shared by two threads where thread A busy-waited
until thread B released it -- that ref doesn't really escape to the stack,
because the copyCurrent () would have occured at the very beginning of the
program; rather, the ref escapes into the closures of the spawned
functions, which would manifest itself as the ref being stuffed into some
Env closure; or, if the ref were globalized, then it would not be
localized because the functions in which it was used would be
multi-threaded.

> Assuming this flattening rule is OK, then flattening should work in
> programs that use threads (as MLton unfortunately still does), which
> will have a call to Thread.copyCurrent at the beginning of the
> program.  With the exception of possibly a couple of refs within main,
> all local refs should be flattenable.  

Agreed.

> Unfortunately, the existence of the Thread.copyCurrent at the
> beginning of the program kills localization, since every label and
> function (except for main) become multi-used.  One possible fix to
> this would be to have a new primitive for creating a raw thread that
> is only used by Thread.spawn, hence removing the copyCurrent from the
> beginning of the program (leaving copyCurrent only used by callcc).
> But I'm not sure offhand what that primitive should look like.  I
> think I'll sleep on it.  Over and out.

That is somewhat unfortunate.  You mean something like a primitive
threadNew that essentially returns an empty stack?