[MLton-commit] r7212

Wesley Terpstra wesley at mlton.org
Fri Jul 10 09:01:10 PDT 2009


Added a new pass supporting elimination / combination of type conversions.
It is able to eliminate costly conversions through LargeWord, but not IntInf.
IntInf conversions tag variables before conversion which blocks the analysis.

After the combination of conversions, it becomes possible to identify useless 
Overflow testing. For example, consider this small program:

val a = ... something that can't be optimized away
val b = (Int16.fromInt o Int8.toInt) a
val () = print (Int16.toString b ^ "\n")

Without the pass, the SSA looks like this:
      L_942 (a_38: word8)
        x_31588: word32 = WordS8_extdToWord32 (a_38)
        x_33834: word16 = WordS32_extdToWord16 (x_31588)
        x_33833: word32 = WordS16_extdToWord32 (x_33834)
        x_34117: bool = Word32_equal (x_31588, x_33833)
        case x_34117 of
          true => L_3098 | false => L_3097
      L_3097 ()
        L_5 (x_31586, global_21)
      L_3098 ()
        Thread_atomicBegin ()
        x_33836: word32 = Thread_atomicState ()
        x_34116: bool = Word32_equal (x_33836, global_17)
        case x_34116 of
          true => L_3101 | false => L_3100

Just after combineConversions it compiles to:
      L_942 (a_38: word8)
        x_31586: word32 = WordS8_extdToWord32 (a_38)
        x_33832: word16 = WordS8_extdToWord16 (a_38)
        x_33831: word32 = WordS8_extdToWord32 (a_38)
        x_34115: bool = Word32_equal (x_31586, x_33831)
        case x_34115 of
          true => L_3098 | false => L_3097
      L_3097 ()
        L_5 (x_31584, global_21)
      L_3098 ()
        Thread_atomicBegin ()
        x_33834: word32 = Thread_atomicState ()
        x_34114: bool = Word32_equal (x_33834, global_17)
        case x_34114 of
          true => L_3101 | false => L_3100

The Overflow test can now be eliminated since x_33831 and x_31586 are the
same expression and the bool is always equal:
      L_942 (a_38: word8)
        x_33832: word16 = WordS8_extdToWord16 (a_38)
        Thread_atomicBegin ()
        x_33834: word32 = Thread_atomicState ()
        x_34114: bool = Word32_equal (global_17, x_33834)
        case x_34114 of
          true => L_3101 | false => L_3100

The algorithm implemented works as follows:
 * It processes each block in dfs order: (to visit definition before uses)
 *   If the statement is not a PrimApp Word_extdToWords, skip it.
 *   After processing a conversion, it tags the Var for subsequent use.
 *   When inspecting a conversion, check if the Var operated on is also the
 *   result of a conversion. If it is, try to combine the two operations.
 *   Repeatedly simplify until hitting either a non-conversion Var or a
 *   case where the conversions cause an effect.
 *
 * The optimization rules are very simple:
 *    x1 = ...
 *    x2 = Word_extdToWord (W1, W2, {signed=s1}) x1
 *    x3 = Word_extdToWord (W2, W3, {signed=s2}) x2
 *
 *    W1 = width(x1), W2 = width(x2), W3 = width(x3)
 *    
 *    If W1=W2, then there is no conversions before x_1.
 *    This is guaranteed because W2=W3 will always trigger optimization.
 *    
 *    Case W1 <= W3 <= W2:
 *       x3 = Word_extdToWord (W1, W3, {signed=s1}) x1
 *    Case W1 <  W2 <  W3 AND (NOT signed1 OR signed2): 
 *       x3 = Word_extdToWord (W1, W3, {signed=s1}) x1
 *    Case W1 =  W2 <  W3
 *       do nothing; there are no conversions past W1 and x2 = x1.
 *
 *    Case W3 <= W2 <= W1:                             ]
 *       x_3 = Word_extdToWord (W1, W3, {whatever}) x1 ]  W3 <= W1 && W3 <= W2
 *    Case W3 <= W1 <= W2:                             ]  just clip x1
 *       x_3 = Word_extdToWord (W1, W3, {whatever}) x1 ]
 *
 *    Case W2 < W1 <= W3: unoptimized   ] W2 < W1 && W2 < W3
 *    Case W2 < W3 <= W1: unoptimized   ] has side-effect: truncation
 *
 *    Case W1 < W2 < W3 AND signed1 AND (NOT signed2): unoptimized
 *       ... each conversion affects the result separately

I ran the benchmark suite three times and only 'checksum' has a significant 
and reproducible change:

	MLton0 -- mlton -drop-pass combineConversions
	MLton1 -- mlton
	run time ratio
	benchmark  MLton1
	checksum     0.45
	size
	benchmark          MLton0  MLton1
	checksum          187,726 186,254
	compile time
	benchmark         MLton0 MLton1
	checksum            4.52   4.61
	run time
	benchmark         MLton0 MLton1
	checksum           36.84  16.43
	

... which is not terribly surprising since it (and md5sum) are the only
tests which make use of type conversions and md5sum is dominated by md5.


----------------------------------------------------------------------

A   mlton/trunk/mlton/ssa/combine-conversions.fun
A   mlton/trunk/mlton/ssa/combine-conversions.sig
U   mlton/trunk/mlton/ssa/simplify.fun
U   mlton/trunk/mlton/ssa/sources.cm
U   mlton/trunk/mlton/ssa/sources.mlb

----------------------------------------------------------------------

Added: mlton/trunk/mlton/ssa/combine-conversions.fun
===================================================================
--- mlton/trunk/mlton/ssa/combine-conversions.fun	2009-07-07 21:46:07 UTC (rev 7211)
+++ mlton/trunk/mlton/ssa/combine-conversions.fun	2009-07-10 16:01:09 UTC (rev 7212)
@@ -0,0 +1,150 @@
+(* Copyright (C) 2009 Wesley W. Tersptra.
+ *
+ * MLton is released under a BSD-style license.
+ * See the file MLton-LICENSE for details.
+ *)
+
+functor CombineConversions (S: COMBINE_CONVERSIONS_STRUCTS): COMBINE_CONVERSIONS =
+struct
+
+open S
+
+(*
+ * This pass looks for nested calls to (signed) extension/truncation.
+ *
+ * It processes each block in dfs order: (to visit definition before uses)
+ *   If the statement is not a PrimApp Word_extdToWords, skip it.
+ *   After processing a conversion, it tags the Var for subsequent use.
+ *   When inspecting a conversion, check if the Var operated on is also the
+ *   result of a conversion. If it is, try to combine the two operations.
+ *   Repeatedly simplify until hitting either a non-conversion Var or a
+ *   case where the conversions cause an effect.
+ *
+ * The optimization rules are very simple:
+ *    x1 = ...
+ *    x2 = Word_extdToWord (W1, W2, {signed=s1}) x1
+ *    x3 = Word_extdToWord (W2, W3, {signed=s2}) x2
+ *
+ *    W1 = width(x1), W2 = width(x2), W3 = width(x3)
+ *    
+ *    If W1=W2, then there is no conversions before x_1.
+ *    This is guaranteed because W2=W3 will always trigger optimization.
+ *    
+ *    Case W1 <= W3 <= W2:
+ *       x3 = Word_extdToWord (W1, W3, {signed=s1}) x1
+ *    Case W1 <  W2 <  W3 AND (NOT signed1 OR signed2): 
+ *       x3 = Word_extdToWord (W1, W3, {signed=s1}) x1
+ *    Case W1 =  W2 <  W3
+ *       do nothing; there are no conversions past W1 and x2 = x1.
+ *
+ *    Case W3 <= W2 <= W1:                             ]
+ *       x_3 = Word_extdToWord (W1, W3, {whatever}) x1 ]  W3 <= W1 && W3 <= W2
+ *    Case W3 <= W1 <= W2:                             ]  just clip x1
+ *       x_3 = Word_extdToWord (W1, W3, {whatever}) x1 ]
+ *
+ *    Case W2 < W1 <= W3: unoptimized   ] W2 < W1 && W2 < W3
+ *    Case W2 < W3 <= W1: unoptimized   ] has side-effect: truncation
+ *
+ *    Case W1 < W2 < W3 AND signed1 AND (NOT signed2): unoptimized
+ *       ... each conversion affects the result separately
+ *)
+
+val { get, set, ... } = 
+   Property.getSetOnce (Var.plist, Property.initConst NONE)
+
+fun rules x3 (conversion as ((W2, W3, {signed=s2}), x2)) =
+   let
+      val { <, <=, ... } = Relation.compare WordSize.compare
+      
+      fun stop () = set (x3, SOME conversion)
+      fun loop ((W1, _, {signed=s1}), x1) = 
+         rules x3 ((W1, W3, {signed=s1}), x1)
+   in
+      case get x2 of
+         NONE => stop ()
+       | SOME (prev as ((W1, _, {signed=s1}), _)) =>
+            if W1 <= W3 andalso W3 <= W2 then loop prev else
+            if W1 <  W2 andalso W2 <  W3 andalso (not s1 orelse s2) 
+               then loop prev else
+            if W3 <= W1 andalso W3 <= W2 then loop prev else
+            (* If W2=W3, we never reach here *)
+            stop ()
+   end
+
+fun markStatement stmt =
+   case stmt of
+      Statement.T { exp = Exp.PrimApp { args, prim, targs=_ },
+                    ty = _,
+                    var = SOME v } =>
+        (case Prim.name prim of
+            Prim.Name.Word_extdToWord a => rules v (a, Vector.sub (args, 0))
+          | _ => ())
+    | _ => ()
+
+fun mapStatement stmt =
+   let
+      val Statement.T { exp, ty, var } = stmt
+      val exp =
+         case Option.map (var, get) of
+            SOME (SOME (prim as (W2, W3, _), x2)) =>
+               if WordSize.equals (W2, W3)
+               then Exp.Var x2
+               else Exp.PrimApp { args  = Vector.new1 x2, 
+                                  prim  = Prim.wordExtdToWord prim, 
+                                  targs = Vector.new0 () }
+          | _ => exp
+   in
+      Statement.T { exp = exp, ty = ty, var = var }
+   end
+
+fun combine program =
+   let
+      val Program.T { datatypes, functions, globals, main } = program
+      val shrink = shrinkFunction {globals = globals}
+      
+      val functions = 
+         List.revMap
+         (functions, fn f =>
+          let
+             (* Traverse blocks in dfs order, marking their statements *)
+             fun markBlock (Block.T {statements, ... }) =
+                (Vector.foreach (statements, markStatement); fn () => ())
+             val () = Function.dfs (f, markBlock)
+             
+             (* Map the statements using the marks *)
+             val {args, blocks, mayInline, name, raises, returns, start} =
+                Function.dest f
+             
+             fun mapBlock block =
+                let
+                   val Block.T {args, label, statements, transfer} = block
+                in
+                   Block.T {args = args,
+                            label = label,
+                            statements = Vector.map (statements, mapStatement),
+                            transfer = transfer}
+                end
+             
+             val f =
+                Function.new {args = args,
+                              blocks = Vector.map (blocks, mapBlock),
+                              mayInline = mayInline,
+                              name = name,
+                              raises = raises,
+                              returns = returns,
+                              start = start}
+             
+             val f = shrink f
+          in
+             f
+          end)
+      
+      val () = Vector.foreach (globals, Statement.clear)
+   in
+      Program.T { datatypes = datatypes, 
+                  functions = functions, 
+                  globals = globals, 
+                  main = main }
+   end
+
+end

Added: mlton/trunk/mlton/ssa/combine-conversions.sig
===================================================================
--- mlton/trunk/mlton/ssa/combine-conversions.sig	2009-07-07 21:46:07 UTC (rev 7211)
+++ mlton/trunk/mlton/ssa/combine-conversions.sig	2009-07-10 16:01:09 UTC (rev 7212)
@@ -0,0 +1,21 @@
+(* Copyright (C) 2009 Wesley W. Tersptra.
+ * Copyright (C) 1999-2008 Henry Cejtin, Matthew Fluet, Suresh
+ *    Jagannathan, and Stephen Weeks.
+ * Copyright (C) 1997-2000 NEC Research Institute.
+ *
+ * MLton is released under a BSD-style license.
+ * See the file MLton-LICENSE for details.
+ *)
+
+
+signature COMBINE_CONVERSIONS_STRUCTS = 
+   sig
+      include SHRINK
+   end
+
+signature COMBINE_CONVERSIONS = 
+   sig
+      include COMBINE_CONVERSIONS_STRUCTS
+
+      val combine: Program.t -> Program.t
+   end

Modified: mlton/trunk/mlton/ssa/simplify.fun
===================================================================
--- mlton/trunk/mlton/ssa/simplify.fun	2009-07-07 21:46:07 UTC (rev 7211)
+++ mlton/trunk/mlton/ssa/simplify.fun	2009-07-10 16:01:09 UTC (rev 7212)
@@ -15,6 +15,7 @@
 structure CommonArg = CommonArg (S)
 structure CommonBlock = CommonBlock (S)
 structure CommonSubexp = CommonSubexp (S)
+structure CombineConversions = CombineConversions (S)
 structure ConstantPropagation = ConstantPropagation (S)
 structure Contify = Contify (S)
 structure Flatten = Flatten (S)
@@ -77,6 +78,7 @@
    {name = "localRef", doit = LocalRef.eliminate} ::
    {name = "flatten", doit = Flatten.flatten} ::
    {name = "localFlatten3", doit = LocalFlatten.flatten} ::
+   {name = "combineConversions", doit = CombineConversions.combine} ::
    {name = "commonArg", doit = CommonArg.eliminate} ::
    {name = "commonSubexp", doit = CommonSubexp.eliminate} ::
    {name = "commonBlock", doit = CommonBlock.eliminate} ::
@@ -183,6 +185,7 @@
    val passGens = 
       inlinePassGen ::
       (List.map([("addProfile", Profile.addProfile),
+                 ("combineConversions",  CombineConversions.combine),
                  ("commonArg", CommonArg.eliminate),
                  ("commonBlock", CommonBlock.eliminate),
                  ("commonSubexp", CommonSubexp.eliminate),

Modified: mlton/trunk/mlton/ssa/sources.cm
===================================================================
--- mlton/trunk/mlton/ssa/sources.cm	2009-07-07 21:46:07 UTC (rev 7211)
+++ mlton/trunk/mlton/ssa/sources.cm	2009-07-10 16:01:09 UTC (rev 7212)
@@ -67,6 +67,8 @@
 global.fun
 multi.sig
 multi.fun
+combine-conversions.sig
+combine-conversions.fun
 constant-propagation.sig
 constant-propagation.fun
 contify.sig

Modified: mlton/trunk/mlton/ssa/sources.mlb
===================================================================
--- mlton/trunk/mlton/ssa/sources.mlb	2009-07-07 21:46:07 UTC (rev 7211)
+++ mlton/trunk/mlton/ssa/sources.mlb	2009-07-10 16:01:09 UTC (rev 7212)
@@ -54,6 +54,8 @@
    global.fun
    multi.sig
    multi.fun
+   combine-conversions.sig
+   combine-conversions.fun
    constant-propagation.sig
    constant-propagation.fun
    contify.sig




More information about the MLton-commit mailing list