new SSA IL

Stephen Weeks MLton@sourcelight.com
Tue, 7 Aug 2001 19:21:24 -0700


Matthew and I have been kicking around the idea of replacing the CPS IL with a
new IL that is closer to SSA.  Attached is my first attempt at a signature for
such an IL.  The main difference between CPS and SSA is that local functions are
not nested and there is no variable scoping.  Instead, all local functions are
mutually recursive, and the type system requires that a variable definition
dominate all of its uses.

There are several reasons for the change.

* Many optimizations use control-flow-graph dominators and variable liveness
  information.  Our current IL has lexical scoping, which provides an
  approximation to both of these.  That is, a variable binding certainly
  dominates all of the expressions in the lexical scope of the binding, but may
  dominate more.  Also, a variable is certainly only live in its lexical scope,
  but it may be live less.  It would be better (i.e. enable more optimizations)
  if we used the exact notions of dominance and liveness instead of the
  approximation provided by lexical scoping.  It would also enable
  transformations to worry only about satisfying dominance condition instead of
  scoping constraints.

* We would like to express more optimizations on handlers.  HandlerPush and
  HandlerPop are too restrictive.  This is orthogonal to the SSA change, but
  makes sense to do at the same time.

* It should make syntax trees smaller.

* It should make optimization passes faster, since they won't have to recur down
  syntax trees -- instead they can loop over blocks.

The expected migration path is to write a CPS -> SSA translator, rework the
backend to use SSA instead of CPS, and then to incrementally rewrite each CPS
simplification pass in the optimization pipeline from back to front until
finally reaching the closure converter.  In the meantime, we'd have code for two
ILs around, and duplicated code for any optimization pass that lived in both
worlds.

After you take a look at the signature below, here are some design questions and
the way I am leaning.

* Should the common Var.t vector be lifted out of Exp.t and Transfer.t?
  It would save syntax tree space, but you lose a little bit of compile-time
  checking (like Select only taking one arg).  I lean very slightly towards
  not lifting.

* What should be done about case branches and value carrying constructors?
  Should we stick with the current scheme, in which all selects are done
  implicitly by the case, or go back to the old scheme, in which the case just
  narrows the value and there is a separate ConSelect expression?  I lean
  towards leaving as is, and revisiting the issue after the conversion is
  complete.

* Should we create separate classes of labels, Handler.t and Cont.t?
  If we did this, SetHandler would now take a Handler and nontail Call would
  take a Cont.  This would give the necessary hooks to the backend for
  generating the wrapper blocks, but I'm not sure if it helps anywhere else.  I
  lean against doing this.

* Should raise only take one arg?  
  I don't think we have ever used the fact that raise can take multiple
  arguments.  It would shrink the syntax trees slightly and simplify some passes
  to remove this generality.  Again, I lean towards leaving as is and revisiting
  once the conversion is complete.

Please send any thoughts you have or suggestions for the IL.

--------------------------------------------------------------------------------

type int = Int.t
type word = Word.t
   
signature SSA_TREE_STRUCTS = 
   sig
      include ATOMS
   end

signature SSA_TREE = 
   sig
      include SSA_TREE_STRUCTS

      structure Type:
	 sig
	    include HASH_TYPE
	       
	    datatype dest =
	       Char
	     | Int
	     | IntInf
	     | Pointer
	     | Word
	     | Word8
	     | Real
	     | String
	     | Thread
	     | Array of t
	     | Ref of t
	     | Datatype of Tycon.t
	     | Tuple of t vector
	     | Vector of t
	 end
      sharing Atoms = Type.Atoms

      structure Func: HASH_ID
      structure Label: HASH_ID

      structure PrimInfo:
	 sig
	    datatype t =
	       None
	     | Overflow of Label.t (* Must be nullary. *)
	 end

      structure Exp:
	 sig
	    datatype t =
	       ConApp of {con: Con.t,
			  args: Var.t vector}
	     | Const of Const.t
	     | PrimApp of {prim: Prim.t,
			   info: PrimInfo.t,
			   targs: Type.t vector,
			   args: Var.t vector}
	     | RestoreExnStack
	     | SaveExnStack
	     | Select of {tuple: Var.t,
			  offset: int}
	     | SetHandler of Label.t
	     | Tuple of Var.t vector
	     | Var of Var.t
	 end

      structure Statement:
	 sig
	    datatype t = T of {var: Var.t option, (* NONE iff ty = unit. *)
			       ty: Type.t,
			       exp: Exp.t}
	 end
      
      structure Cases: CASES sharing type Cases.con = Con.t
	 
      structure Transfer:
	 sig
	    datatype t =
	       Bug  (* MLton thought control couldn't reach here. *)
	     | Call of {
			func: Func.t,
			args: Var.t vector,
			cont: Label.t option (* NONE means tail-call *)
		       }
	     | Case of {
			test: Var.t,
			cases: Label.t Cases.t,
			default: Label.t option  (* Must be nullary. *)
		       }
	     | Goto of {
			dst: Label.t,
			args: Var.t vector
			}
	     | Raise of Var.t vector
	     | Return of Var.t vector
	 end

      structure Block:
	 sig
	    datatype t =
	       T of {
		     label: Label.t,
		     args: (Var.t * Type.t) vector,
		     statements: Statement.t vector,
		     transfer: Transfer.t
		     }
	 end

      structure Function:
	 sig
	    datatype t =
	       T of {
		     name: Func.t,
		     args: (Var.t * Type.t) vector,
		     start: Label.t, (* Must be nullary. *)
		     blocks: Block.t vector,
		     returns: Type.t vector
		     }
	 end
      
      structure Program:
	 sig
	    datatype t =
	       T of {
		     datatypes: {
				 tycon: Tycon.t,
				 cons: {
					con: Con.t,
					args: Type.t vector
				       } vector
				} vector,
		     globals: Statement.t vector,
		     functions: Function.t vector,
		     main: Func.t (* Must be nullary. *)
		    } 
	 end
   end