[MLton] switching to handler caught threads

Wed, 31 Mar 2004 19:07:25 -0800

> I've tracked down a problem to an interesting (and I presume buggy)
> interaction between the threads caught by the signal handler and other
> threads.
...
> Now, we're in a totally screwy state.  In particular, both this
> atomicSwitch' and the signal handler both have a hold of the current
> thread, which is bad.

You analysis makes sense to me.

The goal of the code was to be in a critical section when calling
Prim.switchTo to avoid the problem (that you saw) of a signal handler
interfering with the switch.  It looks like the code is (incorrectly)
trying to do that by having the thread that is switched to perform an
atomicEnd as its first operation, which doesn't work for primitive
threads, because the atomicEnd occurs before the switch, not after.

Here's the code for creating new threads

   val func: (unit -> unit) option ref = ref NONE
   val base: Prim.preThread =
      (Prim.copyCurrent ()
       ; (case !func of
	     NONE => Prim.savedPre ()
	   | SOME x =>
		(* This branch never returns. *)
		(func := NONE
		 (* Close the atomicBegin of the thread that switched to me. *)
		 ; atomicEnd ()
		 ; (x () handle e => MLtonExn.topLevelHandler e)
		 ; die "Thread didn't exit properly.\n")))
   fun newThread (f: unit -> unit) =
      (func := SOME f; Prim.copy base)

The comment "Close the atomicBegin ..." confirms what I said above.
This code also shows why it is essential to be in the critical region
until a new thread starts -- the !func and func := NONE.

So, it looks like we need two different behaviors when switching to
threads.  For new threads created as above, we need to be in a
critical section when starting the switch and after ending the switch
and starting SML code.  For paused threads created by fromPrimitive,
we need to *not* be in a critical section after ending the switch and
returning to SML.  For paused threads created by Thread.switch, we
could go either way, depending on whether or not we place a call to
atomicEnd after the call to switchTo in atomicSwitch'.

The easies fix I can think of is to have two variants of
Prim.switchTo.  One will be used for all paused threads, and perform
the canHandle--.  The other will be used for all new threads, and will
let the thread being switched to do the canHandle--.

Does that make sense?  This stuff is always very tricky.