SSA

Matthew Fluet Matthew Fluet <fluet@CS.Cornell.EDU>
Tue, 23 Oct 2001 13:45:49 -0400 (EDT)


> To make sure I understand, I would expect the liveFrame and cont field
> of Info.t in allocate-registers.sig to change from
> 
> 		     liveFrame: Machine.Operand.t list,
> 		     cont: {size: int} option,
> 
> to
> 
> 		     conts: {handler: Ssa.Label.t option,
> 			     live: Machine.Operand.t list,
> 			     size: int} list

I actually changed it to

		     liveFrame: (Ssa.Label.t option * Machine.Operand.t list) list,
		     cont: {size: int} option,

Your alternative would be fine.  But it's still a question of figuring out
the right size.

> > Note, this complicates any future work on avoiding the stub shuffle with
> > non-tail calls; the "real" continuation's arguments may not be allocated
> > at the top of the stub's frame.
> 
> Does my proposal above fix this?

Not really.  Recall what started this whole adventure:

  L_3 ()
    x_10: unit = Stdio_print (global_215)
    L_1022 (exit_0 (global_0, env_0)) handle L_1023
  L_166 ()
    L_1022 (exit_0 (global_12, x_21)) handle L_91
  L_1023 (x_1614: exn)
    L_2 ()
  L_1022 ()
    Bug
  L_91 (x_43: exn)
    SetHandler L_0
    case x_43 of
      Fail_0 => L_116 | SysErr_0 => L_101 | _ => L_100

Notice that the cont L_1022 has both L_1023 and L_91 as handlers.  L_91
has out_0 live in it and L_1023 does not.  So, the liveness analysis
thinks out_0 is live at the beginning of L_3, but the dominator check does
not. 

We're really back in this world, except that liveness analysis says that
out_0 is live in the frame corresponding to (L_1022, SOME L_91) while it
is not live in the frame corresponding to (L_1022, SOME L_1023).  That's
fine, and what we want.  Now, we get to allocate registers, which follows
the dominator tree produced from the intra-procedural control-flow graph
that has edges L_3 -> L_1022, L_3 -> L_1023, L_166 -> L_1022, 
L_166 ->L_191.  We can't use the control-flow graph
with L_3 -> L_1022 -> L_1023, L_166 -> L_1022 -> L_91, because that puts
the extraneous side conditions on the SSA IL that we're trying to avoid.

So, what is the size of L_1022's frame?  Well, allocate registers simply
gives it the current stack height.  But, L_1022 neither dominates nor is
dominated by L_1023 or L_91.  So, nothing guarantees that out_0 has been
allocated (it's only defined down some other branch of the dominator tree,
and we haven't gotten there yet).  When we finally allocate registers for
out_0, we're likely to give it a stack slot higher than the size given to
L_1022.

What you really want to do is allocate the continuation immediately after
the paired handler that requires the most stack.  But, there is no way of
forcing that with control-flow edges.  Furthermore, the SSA IL semantics
don't prevent two blocks to be related like:

  ...
  L_A (f ()) => L_B
  ...
  L_B (g ()) => L_A
  ...

That is, L_A is a continuation paired with handler L_B and L_B is a
continuation paired with handler L_A.  Who get's allocated first?

The "max" hack I used simply notes that when we reach a non-tail call with
continuation c and handler h, then the live slots for the frame consist of
slots live down c (which are < c's frame size) and slots live down h
(which are < h's frame size).  Hence, bumping the stack by 
max(c's frame size, h's frame size) for this non-tail call ensures that
the callee doesn't stomp on any live data.  On return, the continuation
stub copies the data from the top of the stack to wherever c's arguments
got allocated.