single threading mlton

Stephen Weeks MLton@sourcelight.com
Tue, 15 Jan 2002 12:19:37 -0800


> I recalled (although I can't find the email) that Steve noted that
> lib/mlton/basic/engine.sml brought threads in.  Turns out that with
> reference to mlton, it's only used in lib/mlton/basic/net.sml which is
> pulled in for fullHostname; but, the computation of fullHostname doesn't
> require threads, so I copied the computation directly into
> src/mlton/control.sml.  (Maybe it belongs in
> lib/mlton/basic/proc-env.sml?) 

Fine with me.  But if I understand your later analysis, the presence
of engine.sml wasn't actually hurting anything, since Engine.run (and
hence Signal.handleWith') was never called.

> I discovered that what's really pulling threads in is
> lib/mlton/basic/vector.fun and the tabulator function by referencing
> lib/mlton/basic/thread.sml and the generate function.

Yuck.  I thought I'd gotten rid of that.

> More importantly, that function actually requires thread switching,
> so Thread_switchTo appears in the final program, foiling the multi
> analysis.  Vector.tabulator is only used by AppendList.toVector,
> and, while a clever use of threads (it took me a while to figure out
> exactly what was going on),

It may be clever, but it is *slow*.

> it's only really used in some of the elaboration and I suspect that
> no one will really notice the difference with an implementation as
> val toVector = Vector.fromList o toList.

Since this is a conversion from a tree to a vector, it is easier for
the tree walk to own the stack than for the vector creator.  That's
why I used a tabulator (which doesn't own the stack) instead of an
unfold (which does).  The real problem is the implementation of
Vector.tabulator.  One possibility would be to change it to

----------------------------------------------------------------------
fun tabulator (n: int, f: ('a -> unit) -> unit) =
   let
      val a = Pervasive.Array.array (n, NONE)
      val r = ref 0
      val _ =
	 f (fn x =>
	    let
	       val i = !r
	    in
	       if i >= n
		  then Error.bug "Vector.tabulator: too many elements"
	       else (Pervasive.Array.update (a, i, SOME x)
		     ; r := i + 1)
	    end)
   in
      if !r < n
	 then Error.bug "Vector.tabulator: not enough elements"
      else tabulate (n, fn i => valOf (Pervasive.Array.sub (a, i)))
   end
----------------------------------------------------------------------

I would like to find a way to avoid the array->vector copy, but don't
see it.

Another possibility, which I have heard Henry espouse, is to expose
the tabulator more like a folder, with the following signature

      type state
      val tabulator: int * (state * ('a * state -> state) -> state) -> 'a t 

Then, the implementation is

----------------------------------------------------------------------
local
   datatype state = S of int
in
   type state = state
   fun tabulator (n: int, f: (state * ('a * state -> state) -> state)) =
      let
	 val a = Pervasive.Array.array (n, NONE)
	 val S n' =
	    f (S 0,
	       fn (x, S i) =>
	       if i >= n
		  then Error.bug "Vector.tabulator: too many elements"
	       else (Pervasive.Array.update (a, i, SOME x)
		     ; S (i + 1)))
      in
	 if n' < n
	    then Error.bug "Vector.tabulator: not enough elements"
	 else tabulate (n, fn i => valOf (Pervasive.Array.sub (a, i)))
      end
end
----------------------------------------------------------------------

And the implementation of AppendList.toVector is

fun toVector ds = Vector.tabulator (length ds, fn (s, f) => fold (ds, s, f))

Of course, we would have liked to have written the type of tabulator
in ML as 

      val tabulator:
	Forall 'a. int * (Forall 'b. 'b * ('a * 'b -> 'b) -> 'b) -> 'a t 

but we can't because of prenexity.  If you change the type to

      val tabulator:
	Forall 'a. Forall 'b. int * ('b * ('a * 'b -> 'b) -> 'b) -> 'a t 

you get a completely different beast.

Anyways, here's my opinion on the right way to go. I prefer hiding the
state, because it is abstract and is a hack on the true type.  Also,
we don't have a linear type system to enforce that it is used
correctly.  So, I propose to leave the implementation of
AppendList.toVector as is, and to replace the implementation of
Vector.tabulator with the first one I gave above, unless someone can
think of something even better.

> Anyways, making those changes eliminates all references to
> Thread_switchTo, so the multi analysis should be significantly more
> accurate for constantPropagation and localRef.
...
> The multi analysis only marks block reachable from a
> Thread_copyCurrent as multiThreaded if the program has an instance
> of Thread_switchTo.  Since not, we're golden.

Ahh the joys/perils of whole-program optimization.  We're clearly
going to need some better thread implementation and analysis if we're
going to ever use threads seriously.
 
> It seems to have shaved about a minute off of my self compiles.

Quite nice.  I can't wait to get the new limit check stuff in to see
if we can break 400s.