local refs

Matthew Fluet fluet@CS.Cornell.EDU
Thu, 29 Nov 2001 21:44:47 -0500 (EST)


I was getting tired of looking at the code for printing 
"toplevel handler not installed" in every program when the toplevel
handler was clearly installed.  So, I started in on a local ref
elimination pass, which will also turn accumulation refs into function
local variables.  

It's a little over half done, I'd say.  The structure is as follows:

for each global
  record if it is a Ref_ref and what functions it is use in

for each function
  determine localizable refs: variables are of type ref and are only
   ever created, assigned to, or derefed (ie., that don't escape into any 
   datatype, tuple, non-ref related prim argument, or control transfer
   argument); this also includes any globals that are only used in this
   function
  do a dfs walk of the function, recording for each block the set of
   localizable refs in scope
  (Here, we need to do an analysis to determine which blocks really need
    to pass the refs around as arguments; it should be a minimal useless
    type analysis, I just haven't implemented it yet.  There's no big harm
    in always passing the refs into and out of each block, it just means
    more work for the shrinker and removeUnused.  Also, since there is no
    useless pass after localRef, nothing eliminates block arguments that
    are the same at all call sites.)
  each localizable ref maintains a stack of Var.t's, the top of which is
   variable holding the current value of the ref
  do a dfs walk of the function,
    at each block, add arguments for localizable refs that need to be
     passed at this block; push these arguments onto the localizable ref
     stack
    for each statement, rewrite Ref_ref to a simple copy, rewrite assigns
     to a simple copy into a new variable (and pop the top of the stack,
     and push the new variable), rewrite derefs intoa simple copy from the
     variable at the tope of the stack
    for each transfer to a block which needs localizable ref args, create
     a "route" block that passes the tops of the stacks as additional args
     in a goto to the destination block; (we need a "route" block that has
     the same argument type as the original block, because cases, prim,
     conts, etc. can't change the argument type; a lot of these will get
     shrunk away)
    after processing all the blocks' children, pop the top of the stack
     for each localizable ref that is passed at this block

That's it.  It works as I expect on small examples, but there's a problem
with handlers.

Consider:

val pi' = _ffi "printf": string * int -> int;
val pi = fn i => pi'("%i\n\000", i)

val ps' = _ffi "printf": string * string -> int;
val ps = fn s => ps'("%i\n\000", s)

fun fib 0 = 1
  | fib 1 = 1
  | fib n = (fib (n + 1)) + (fib (n + 1))

fun tak (x,y,z)
  = if not (y < x)
      then z
      else tak (tak (x - 1, y, z),
		tak (y - 1, z, x),
		tak (z - 1, x, y))

val w = MLton.Random.rand ()

val func = ref ""

val n = (if Word.andb(w,0wx1) = 0wx0
	   then (func := "fib";
		 fib 20)
	   else (func := "tak";
		 tak (1,2,3)))
        handle _ => (ps (!func); 0)

val _ = pi n

I'd like to localize the func ref; and according to my definition above,
it is localizable, because it is only used in the main function
and is only created, assigned to, and derefed (ie., doesn't escape
anywhere else).  But, the SSA code looks like

  L_11 ()
    x_13: lambdas_0 = Env_1 ()
    Ref_assign(lambdas_0) (x_0, x_13)
    x_12: word = Ref_deref(word) (global_13)
    x_11: word = Word32_mul (global_14, x_12)
    x_10: word = Word32_add (x_11, global_15)
    Ref_assign(word) (global_13, x_10)
    HandlerPush L_5
    x_9: word = Word32_andb (x_10, global_18)
    x_8: bool = MLton_eq(word) (x_9, global_19)
    case x_8 of
      false => L_8 | true => L_10
  L_10 ()
    Ref_assign(string) (global_17, global_20)
    L_9 (fib_0 (global_21)) handle L_5
  L_9 (x_7: int)
    L_6 (x_7)
  L_8 ()
    Ref_assign(string) (global_17, global_22)
    L_7 (tak_0 (global_23)) handle L_5
  L_7 (x_6: int)
    L_6 (x_6)
  L_6 (x_5: int)
    HandlerPop L_5
    L_4 (x_5)
  L_5 ()
    HandlerPop L_5
    x_4: string = Ref_deref(string) (global_17)
    printf (global_24, x_4)
    L_4 (global_0)

Just what you would expect.  I need to "route" the new values of func to
the handler at L_5, so I get something like this:

  L_46 ()
    x_13: lambdas_0 = Env_1 ()
    x_11: word = Word32_mul (global_14, global_31)
    x_10: word = Word32_add (x_11, global_15)
    x_9: word = Word32_andb (x_10, global_18)
    x_8: bool = MLton_eq(word) (x_9, global_19)
    case x_8 of
      false => L_42 | true => L_45
  L_45 ()
    L_43 (fib_0 (global_21)) handle L_44
  L_44 ()
    L_5 (x_13, global_20, x_10)
  L_43 (x_41: int)
    L_6 (x_41, x_13, global_20, x_10)
  L_42 ()
    L_40 (tak_0 (global_23)) handle L_41
  L_41 ()
    L_5 (x_13, global_22, x_10)
  L_5 (x_40: lambdas_0, global_38: string, global_37: word)
    printf (global_24, global_38)
    L_4 (global_0, x_40, global_38, global_37)
  L_40 (x_39: int)
    L_6 (x_39, x_13, global_22, x_10)
  L_6 (x_5: int, x_38: lambdas_0, global_36: string, global_35: word)
    L_4 (x_5, x_38, global_36, global_35)

Actually, you see here that we "route" all the localizable refs to the
handler (and everywhere else); (the lambdas_0 is the toplevel handler ref,
and the word is the MLton.Random seed ref.)

Now, you see the problem.  I need to change the handlers at the non-tail
calls, because each call needs to pass a different value for func.  
The shrinker has taken the liberty of removing the HandlerPush L_5, 
because L_5 isn't used in a handle slot.

I think the useless analysis should figure out that we don't need to route
the lambdas_0 and word localizable refs through the handlers (the value
of each of these refs is the same at the point the handler is installed
and at the point of handler invocation); But, I still need to do something
about the func ref.  In this case, there are two choices: 1. make func
non-localizable, 2. rewrite the handlers.

Until the exceptions paper, I guess we need to live with 1.  I suspect
that I can fudge the useless analysis to mark func as non-localizable.

After the exceptions paper, we could go either way.  The code I have right
now would automatically do 2, although there are arguments for sticking
with 1 (with 2, we need to install 2 handlers (one down each branch)
instead of 1 handler (above the branch)).

Comments?  Suggestions for the useless analysis?  Suggestions for the
kludge?