local refs

Wed, 5 Dec 2001 16:38:56 -0800

> I tried running doing -diag localRef on a self-compile, but it left the
> program unchanged because it found threads or conts.
> 
> I'm fairly confident that mlton isn't using threads, so it must be using
> continuations somewhere.  Any idea where?  Besides localRef, aren't we
> missing out on some stuff in constantPropagation because the Once pass
> finds conts?

Here are the thread primitives I see used in mlton.ssa.

Thread_atomicBegin
Thread_atomicEnd
Thread_copyShrink
Thread_copy
Thread_current
Thread_finishHandler
Thread_saved
Thread_setHandler
Thread_switchTo

So, MLton uses threads and signal handlers, but not conts (there is no
use of Thread_switchToCont).  The threads arise because MLton uses
World.save, which always involved threads.  This was unnecessary, and
I have checked in a fix.

Unfortunately, this doesn't help anything, since MLton also refers to 
MLton.Signal.Handler.{get,set}.  To see how disastrous this is, look
at the dot for the following program

	val _ = MLton.Signal.Handler.get

That reference is enough to get all of the signal handler thread and 
signal handler table initialization code.  This includes uses of all
the above thread primitives.  Now, if localRef would run, it could
simplify a lot, but it doesn't, because it sees those thread
primitives.

One thing we need to understand is the difference between what the
various passes use to determine that nothing executes once and hence
turn off their optimization.

1. globalization (in the closure converter) and constant propagation
   (via once.fun) use the presence of Thread_switchToCont

2. local-ref uses the presence of any Thread_ primitive

In thinking about it now, I believe (1) is wrong, because Thread
primitives copy the initial thread very early in the program, and jump
back to that piece of code whenever starting a new thread.  This
control-flow is not expressed in the control-flow graph.  I believe my
erroneous reasoning at the time was that threads are one-shot and
hence only go forward, and are unable to repeat code.

I believe looking for any Thread_ primitive is a bit much, and all you
need to look for is Thread_switchTo or Thread_switchToCont.  If those
are not present, then there isn't any control-flow that's not in the
graph.   Since there is no longer anything special about
Thread_switchToCont, I propose to eliminate it, replacing it with
Thread_switchTo, and to make both (1) and (2) above only look for
Thread_switchTo.

On a related point, I believe that local-ref can be improved.  At
present it does two different things:

1. Move globals refs into functions that only execute once
2. Turns local refs into SSA vars

Of course (1) only makes sense if the once computation is correct.
But (2) seems like it's OK to do no matter what.  So, I guess I'm
saying that local-ref shouldn't turn off all optimization when it
determines that a program has threads/conts.  It should just turn off
(1).  Does that make sense?