[MLton] cvs commit: critical sections during thread switch

Sun, 4 Apr 2004 11:15:36 -0700

> For example, if we switch to the signal handler, then the GC we did
> for the current thread is (in a sense) wasted, because we'll need to
> do an ensureFree (and possibly doGC) when switching back to the
> current thread (if ever).  Likewise, even if we did to a GC for the
> current thread, there is the possibility that the signal handler
> thread needs more bytes than the current thread, so that while a
> minor GC serviced the current thread, upon switching to the signal
> handler, we may need to do a major GC.

I agree that this may happen, and also agree that it is extremely
rare.

> All of this suggested to me that we should switch to the signal handler
> _first_.  Unfortunately, this plays havoc with the stackTopIsOk invariant
> (and also the force GC argument, but that is really a side issue).
...
> So, what is the rationale for maintaining the stackTopIsOk invariant; I
> see that the comment on it says:
> /* stackTopIsOk ensures that when this stack becomes current that
>  * the stackTop is less than the stackLimit.
>  */

Let s be a stack in some thread.  In mutator code, we maintain the
invariant that

	stackTop (s) + 2 * maxFrameSize <= endOfStack (s)

everywhere except possibly at the start of a function, after having
just pushed a frame on the stack.  At function start, we know that

	stackTop (s) + maxFrameSize <= endOfStack (s)

For all inactive stacks in the heap we maintain the stackTopIsOK
invariant, i.e.

	stackTop (s) <= endOfStack (s) - 2 * maxFrameSize + topFrameSize (s)

so that whenever we switch to a thread and return to the top frame on
its stack, popping the topFrameSize bytes, we are assured of our
mutator invariant.

	stackTop (s) <= endOfStack (s) - 2 * maxFrameSize

Getting back to the mutator, notice that

	endOfStack (s) = stackBottom (s) + reserved (s)

Also, the code in gc.c computes stack limit as

	stackLimit (s) = endOfStack (s) - 2 * maxFrameSize

In backend/limit-check.fun, we insert a stack limit check at the start
of each function

	if (stackTop (s) > stackLimit (s))
		GC_gc (...);

By the above, this check is the same as

	stackTop (s) > endOfStack (s) - 2 * maxFrameSize

That is, this checks in the one place where the strong invariant may
fail, and if it fails, invokes the GC.  Because the previous push
can't have pushed more than maxFrameSize bytes, we know that

	stackTop (s) <= endOfStack (s) - maxFrameSize

That is, we know there is enough space to push the frame to call
GC_gc.  

The mutator assumes that whenever we return to this thread either
immediately from the GC or after switching threads, that the invariant
holds

	stackTop (s) <= stackLimit (s)

We ensure that invariant by growing the stack of the current thread,
if necessary, in the GC. 

So, the only way for a thread to get in a state where stackTopIsOK =
FALSE is by running, and if it does so, it immediately invokes the GC
which grows the stack.

> when switching to a thread, we may GC based on its bytesNeeded, so
> why not GC based on its stackTopIsOk?

We didn't need to in the past because we maintained stackTopIsOK for
all stacks in the heap except possibly the current stack, and the
current stack's invariant is made true by the GC.  So, when switching
to a thread, we knew that stackTopIsOK = TRUE.

By switching threads before invoking the GC, the invariant is busted,
because the current stack becomes inactive, and yet may not satisfy
stackTopIsOk.

It seems reasonable to me change things and to treat the stack
invariant like we treat the frontier invariant.  That is, we could
change stackTopIsOK for inactive stacks to

	stackTop (s) <= endOfStack (s) - maxFrameSize + topFrameSize (s)

This would automatically be true for threads that call GC_gc, so there
would be no need to do a GC for the current thread to grow its stack.
We would then need to check when switching to a thread that the
stronger condition holds

	stackTop (s) <= endOfStack (s) - maxFrameSize + topFrameSize (s)

for the thread being switched to.   We could check this just like we
check ensureFree (bytesNeeded) for the thread being switched to.  

That would allow you to put the thread switch before the GC in GC_gc
and would avoid the "double GC" problem in GC_gc and
GC_threadSwitchto.  This approach seems a bit more uniform than what
we currently do.  And it doesn't require any mutator changes -- the
mutator sees the same invariant.  We're simply delaying the stack
check in the runtime until we're absolutely sure we need it, i.e. when
we're about to return to a mutator thread.