local refs

Matthew Fluet fluet@CS.Cornell.EDU
Mon, 10 Dec 2001 13:04:22 -0500 (EST)


> You do the
> new once pass, make Thread_copyCurrent a transfer, and fix local ref.
> Either you or I can then fix constant propagation.

I've introduced a new SsaTree.Transfer.t variant:
               Runtime of {prim: Prim.t,
                           args: Var.t vector,
                           return: Label.t (* Must be nullary. *)
                          }
into which closure-covert translates: MLton_halt, Thread_copyCurrent,
Thread_switchTo, and World_save (those seemed to be the entersRuntime
prims that had a control-flow feel).

I've implemented a new once pass (actually, under the functor Multi) with:
      val multi: Program.t -> 
	         {(* Program has an occurence of Thread_switchTo. *)
		  usesThreadsOrConts: bool,
		  (* usesThreadsOrConts == true
		   * and the func directly or indirectly invokes
		   * Thread_copyCurrent. 
		   *)
		  funcDoesThreadCopyCurrent: Func.t -> bool,
		  (* usesThreadsOrConts == true
		   * and the func may be called by two
		   * different threads during some run of the 
		   * program.
		   *)
		  funcIsMultiThreaded: Func.t -> bool,
		  (* The func may be called more than once
		   * during some run of the program.
		   *)
		  funcIsMultiUsed: Func.t -> bool,
		  (* usesThreadsOrConts == true
		   * and the label's block's transfer is
		   * either Runtime {prim, ...}
		   * with prim = Thread_copyCurrent
		   * or Call {func, ...}
		   * with funcDoesThreadCopyCurrent(func) == true.
		   *)
		  labelDoesThreadCopyCurrent: Label.t -> bool,
		  (* usesTheadsOrConts == true
		   * and the label may be executed by two
		   * different threads during some run of the
		   * program.
		   *)
		  labelIsMultiThreaded: Label.t -> bool,
		  (* The label may be executed more than once
		   * during some run of the program.
		   *)
		  labelIsMultiUsed: Label.t -> bool,
		  (* The var may be defined more than once
		   * during some run of the program;
		   * i.e., varIsMultiDefed(x) = 
		   * labelIsMultiUsed(label of x's def)
		   * when x is defined in a block;
		   *)
		  varIsMultiDefed: Var.t -> bool}

I've fixed localRef to flatten refs in the presence of threads using
back-propagation of assign/deref uses across labelDoesThreadCopyCurrent
labels.  Also, I made localRef's localization of global refs idempotent,
using a flat-lattice of Func.t's.

All this code passes regression and self compiles.

For a self-compile (with -diag localRef), I get the following:
	    display starting
	       localRef starting
		  multi starting
		  multi finished in 0.96 + 0.0 (0.0% GC)
		  restore starting
		  restore finished in 0.03 + 0.0 (0.0% GC)
                  ...
		  restore starting
		  restore finished in 0.21 + 0.0 (0.0% GC)
	       localRef finished in 7.97 + 5.41 (40% GC)
	    display finished in 7.98 + 5.41 (40% GC)
Timing is high because of diagnostics.  

But, here's some of the relevant diagnostic output:

Globals:
x_1: {isGlobalRef = true, funcUses = Top}
x_3: {isGlobalRef = true, funcUses = Top}
x_5: {isGlobalRef = true, funcUses = Top}
x_7: {isGlobalRef = true, funcUses = Top}
x_8: {isGlobalRef = true, funcUses = Top}

So, there were only 5 global refs, each of which was used in multiple
functions, so nothing got localized.  (This might be somewhat biased;
localization of global refs really undoes stuff done by
constantPropagation; but constantPropagation is still using the old once
pass, so it's not globalizing anything.  On the other hand, as Steve
points out, the existence of the Thread_copyCurrent at the beginning of
main will probably mark most functions as funcIsMultiThreaded, so
localization has little to do.)

Out of 1101 top level SSA functions, 126 has flattenable refs.
Most are lists and integers.  Those with readable variable names show that
some common idioms are being altered:

For example, there is:
x_2338 LocalRefs: [todo_7: {reff = Some ((L_115215, list_288)),
			     assigns = [L_115198, L_115210, L_115194],
			     derefs = [L_115198, loop_8103, L_115194],
			     locall = local,
			     threadCopyCurrent = {assign = false, deref = false}},
		    todo_6: {reff = Some ((L_115141, list_288)),
			     assigns = [L_115117, L_115136, L_115110],
			     derefs = [L_115117, loop_8097, L_115110],
			     locall = local,
			     threadCopyCurrent = {assign = false, deref = false}},
		    todo_5: {reff = Some ((L_115090, list_288)),
			     assigns = [L_115073, L_115085, L_115069],
			     derefs = [L_115073, loop_8088, L_115069],
			     locall = local,
			     threadCopyCurrent = {assign = false, deref = false}},
		    todo_4: {reff = Some ((L_115019, list_288)),
			     assigns = [L_114995, L_115014, L_114988],
			     derefs = [L_114995, loop_8082, L_114988],
			     locall = local,
			     threadCopyCurrent = {assign = false, deref = false}}]
x_2338 Violations: (todo_4, todo_5, todo_6, todo_7)

which probably corresponds to some working list loop.


Interestingly, we occasionally get things like:
x_208258 LocalRefs: [plists_92: {reff = Some ((L_114603, list_85)),
				  assigns = [L_114600, L_114569],
				  derefs = [L_114601, L_114569],
				  locall = local,
				  threadCopyCurrent = {assign = false,
						       deref = false}}]
x_208258 Violations: (plists_92)

which looks like something to do with property-lists.

This one is cute:
stats_7 LocalRefs: [numBlocks_0: {reff = Some ((L_84553, int)),
				   assigns = [L_84501],
				   derefs = [L_84547, L_84502],
				   locall = local,
				   threadCopyCurrent = {assign = false,
							deref = false}},
		     numStatements_0: {reff = Some ((L_84553, int)),
				       assigns = [L_84495],
				       derefs = [L_84544, L_84496],
				       locall = local,
				       threadCopyCurrent = {assign = false,
							    deref = false}}]
stats_7 Violations: (numStatements_0, numBlocks_0)

It's basically turning nested List.foreach and Vector.foreach into the
appropriate folds:

	    val numStatements = ref (Vector.length globals)
	    val numBlocks = ref 0
	    val _ =
	       List.foreach
	       (functions, fn f =>
		let
		   val {args, blocks, ...} = Function.dest f
		in
		   Vector.foreach (args, countType o #2)
		   ; (Vector.foreach
		      (blocks, fn Block.T {statements, ...} =>
		       (Int.inc numBlocks
			; (Vector.foreach
			   (statements, fn Statement.T {ty, ...} =>
			    (countType ty
			     ; Int.inc numStatements))))))
		end)


Some other ones that I don't immediatly recognize, but have interesting
function names:
cut_1 Violations: (strs_0, vals_0, types_2, r_422, r_423, r_424)
elaborateSigexp_1 Violations: (r_419, r_420, r_421)
elabDec_0 Violations: (r_407, hasCon_0, r_408, r_409, r_411, r_412)
elabStrexp_0 Violations: (r_395, r_396, r_397, r_398)
matchFlat_0 Violations: (r_356,
			 r_357,
			 r_358,
			 r_359,
			 r_360,
			 r_361,
			 r_362,
			 r_363,
			 r_364,
			 r_365,
			 r_366,
			 r_367,
			 r_368,
			 r_369,
			 r_370,
			 r_371)

Some from the native codegen:
elimBlocks_0 Violations: (changed_6)
computeLiveTransfers_2 Violations: (r_308, queue_24)
computeLiveTransfers_0 Violations: (r_293, queue_21)

This is the function with the most flattenable refs; I don't really know
what it corresponds to; tailCallsItself is from introduceLoops, while
worklist is from simplifyTypes, and newBlocks could be from any of a
number of passes (but it's not used in introduceLoops or simplifyTypes):
 x_24029 Violations: (n_14,
		     r_144,
		     n_15,
		     r_145,
		     n_16,
		     r_147,
		     n_17,
		     r_148,
		     worklist_0,
		     n_18,
		     r_149,
		     r_150,
		     n_19,
		     r_151,
		     n_20,
		     r_152,
		     plists_10,
		     tailCallsItself_0,
		     r_154,
		     r_155,
		     changed_1,
		     n_21,
		     r_157,
		     n_22,
		     r_158,
		     n_23,
		     r_159,
		     n_24,
		     r_160,
		     n_25,
		     r_161,
		     n_26,
		     r_162,
		     r_163,
		     n_27,
		     r_164,
		     n_28,
		     r_165,
		     n_29,
		     r_166,
		     n_30,
		     n_31,
		     r_167,
		     n_32,
		     r_168,
		     n_33,
		     r_169,
		     n_34,
		     r_170,
		     n_35,
		     r_171,
		     n_36,
		     r_172,
		     r_173,
		     n_37,
		     r_174,
		     n_38,
		     r_175,
		     r_178,
		     r_179,
		     r_180,
		     newBlocks_1,
		     n_39,
		     r_186,
		     n_40,
		     r_187,
		     n_41,
		     r_188,
		     n_42,
		     r_189,
		     n_43,
		     r_190,
		     r_191,
		     n_44,
		     r_192,
		     n_45,
		     r_193,
		     r_194,
		     n_46,
		     r_195,
		     n_47,
		     r_196,
		     n_48,
		     r_197,
		     n_49,
		     r_198,
		     n_50,
		     r_199,
		     n_51,
		     r_200,
		     n_52,
		     r_201,
		     n_53,
		     r_202)

main doesn't actually have that many:
main_0 Violations: (r_0,
		    r_1,
		    r_2,
		    r_3,
		    r_4,
		    r_5,
		    r_6,
		    r_7,
		    r_8,
		    r_9,
		    r_10,
		    x_4510)


If you've made it this far, here's a little more good news.  I tried
replacing the once pass in constantPropagation, by replacing
      val once = Once.once program
with
      val {varIsMultiDefed, ...} = Multi.multi program
      val once = not o varIsMultiDefed

No problems on any of the regressions.  I don't know if it's really doing
anything different.