[MLton] inferring getAndSet via an Atomic{Begin, End} optimization

Thu Aug 31 14:17:35 PDT 2006

> I remember you (Stephen) talking about how it used to be that
> Atomic{Begin,End} pairs that weren't needed (because there were no
> limit checks between them) were previously removed, but that it had
> been a real problem (although I don't remember why).

I believe you are referring to a slightly different optimization that
was eliminated in r2183.  Here's the relevant part of the commit
message:

  Fixed the bug Ken Larsen triggered with mgtk.  It was due to broken
  use of atomic{Begin,End} in MLtonThread.sml.  The reason the bug
  didn't get caught in testing is that there is an optimization in the
  basis library that turns atomic{Begin,End} into noops if signals
  aren't handled by the program (because there can be no preemptive
  thread switching).  The only testing I had done with _export had
  been with a single-threaded program and hence the critical sections
  hadn't been tested at all.  But mgtk uses MLton.Finalizable, which
  uses signals, which triggered the bug.

  First off, I decided to eliminate that optimization.  Partly because
  of this problem, and partly because there are various places in the
  runtime and basis where you want to do asserts on canHandle, but
  these won't be right if atomic{Begin,End} are turned into noops.

The point is that atomic{Begin,End} has a side effect on the
"canHandle" counter in addition to providing a critical section.  The
old optimization (implemented in the basis, not the compiler) was
wrong because it indiscriminately eliminated this side effect.  The
optimization I propose could avoid this problem, I think.  First off,
it could look in the code region between the Atomic{Begin,End} in the
basic block and verify that canHandle isn't used.  And if it were
used, the optimization could still eliminate the test of
gcState.signalIsPending while keeping around the canHandle side
effect.

Hmmm, perhaps there are two optimizations.  Here's a more detailed
attempt to explain them.  First off, a reminder what Atomic{Begin,End}
expand to in the ssaToRssa pass.

  AtomicBegin
    gcState.canHandle++;
    if (gcState.signalIsPending)
      gcState.limit = gcState.limitPlusSlop - LIMIT_SLOP;

  AtomicEnd
    gcState.canHandle--;
    if (gcState.signalIsPending and 0 == gcState.canHandle)
      gc;

So, each is basically a side effect on canHandle followed by a test.
The optimization is to eliminate the side effect and/or the test
depending on what we know.

  1. If an Atomic{Begin,End} matches inside a basic block, then
    a. Eliminate the test.
    b. If there is no use of canHandle inside the Begin/End, then
       eliminate the side effect.
  2. If an Atomic{Begin,End} does not match inside a basic block and
     the program doesn't use signals, then eliminate the test.

Unfortunately, atomicEnd still has a "canHandle = 0" test in the basis
library code. 

      fun atomicEnd () = 
         if canHandle () = 0w0
            then raise Primitive.Exn.Fail8 "Thread.atomicEnd"
            else _prim "Thread_atomicEnd": unit -> unit; ()

I don't quite see how to get rid of that test.  Although perhaps that
should really only be on for debugging.

> Mind you, it also means that it wouldn't be clear from the source
> code when things were cheap and when they weren't.

Welcome to whole-program optimization :-).