[MLton] Re: PackWord to/from nonsense

Matthew Fluet fluet at tti-c.org
Wed Jul 8 10:00:15 PDT 2009


On Tue, 7 Jul 2009, Wesley W. Terpstra wrote:
> On Tue, Jul 7, 2009 at 2:16 PM, Wesley W. Terpstra <wesley at terpstra.ca>wrote:
>
>> I'm implementing
>>
>>>   x_1227: word32 = Word8Vector_subWord32 (x_1072, x_1074)
>>>   x_1226: word64 = WordU32_extdToWord64 (x_1227)
>>>   x_1225: word32 = WordU64_extdToWord32 (x_1226)
>>> into
>>>   x_1225:Word = x_1227
>>>
>> at the moment. Since working on the LLVM codegen, I now have the tools to
>> get the job done.
>
> I've attached my optimization pass. As it's my first, I'd appreciate
> feedback. I'm now working on a gigantic test-conversion.sml regression to
> make certain this optimization has no effect on output.

Commenting on the third conversion.patch; overall, it looks very good.  A 
nice pass with clear benefits and a simple implementation.

Index: mlton/ssa/combine-conversions.fun
===================================================================
--- mlton/ssa/combine-conversions.fun	(revision 0)
+++ mlton/ssa/combine-conversions.fun	(revision 0)
@@ -0,0 +1,149 @@
+(* Copyright (C) 2009 Wesley W. Tersptra.
+ * Copyright (C) 1999-2008 Henry Cejtin, Matthew Fluet, Suresh
+ *    Jagannathan, and Stephen Weeks.
+ * Copyright (C) 1997-2000 NEC Research Institute.

No need to give outdated copyright on a new file unless it is a derivative 
work of an existing file (in which case it inherits the existing file's 
plus the current year).  I don't think this file is a derivative of any 
existing optimization pass (except in the most obvious manner of coding 
style), so I think 2009 suffices.

+(*
+ * This pass looks for nested calls to (signed) extension/truncation.
+ *
+ * It processes each block in order:

in depth-first order  (only because the default order of blocks in a 
function is arbitrary, and this optimization requires visiting defs before 
uses)

+ *   If the statement is not a PrimApp Word_extdToWords, skip it.
+ *   After processing a conversion, it tags the Var for subsequent use.
+ *   When inspecting a conversion, check if the Var operated on is also the
+ *   result of a conversion. If it is, try to combine the two operations.
+ *   Repeatedly simplify until hitting either a non-conversion Var or a
+ *   case where the conversions cause an effect.
+ *
+ * The optimization rules are very simple:
+ *    x_1 = ...
+ *    x_2 = Word_extdToWord (x_1, {signed1})
+ *    x_3 = Word_extdToWord (x_2, {signed2})
+ *
+ *    W1 = width(x_1), W2 = width(x_2), W3 = width(x_3)

I think the detailed comments are great.  I might suggest matching the 
prim-app data constructors:

   x_1 = ...
   x_2 = Word_extdToWord (W1, W2, {signed1}) x_1
   x_3 = Word_extdToWord (W2, W3, {signed2}) x_2

+ *    If W1=W2, then there is no conversions before x_1.
+ *    This is guaranteed because W2=W3 will always trigger optimization.
+ * 
+ *    Case W1 <= W3 <= W2:
+ *       x_3 = Word_extdToWord (x_1, {signed1})
+ *    Case W1 <  W2 <  W3 AND (NOT signed1 OR signed2): 
+ *       x_3 = Word_extdToWord (x_1, {signed1})
+ *    Case W1 =  W2 <  W3
+ *       do nothing; there are no conversions past W1 and x_2 = x_1.
+ *
+ *    Case W3 <= W2 <= W1:                        ]
+ *       x_3 = Word_extdToWord (x_1, {whatever})  ]  W3 <= W1 && W3 <= W2
+ *    Case W3 <= W1 <= W2:                        ]  just clip x1
+ *       x_3 = Word_extdToWord (x_1, {whatever})  ]
+ *
+ *    Case W2 < W1 <= W3: unoptimized   ] W2 < W1 && W2 < W3
+ *    Case W2 < W3 <= W1: unoptimized   ] has side-effect: truncation
+ *
+ *    Case W1 < W2 < W3 AND signed1 AND (NOT signed2): unoptimized
+ *       ... each conversion affects the result separately
+ *)
+
+val { get, set, ... } = 
+   Property.getSet (Var.plist, Property.initConst NONE)

You might use Property.getSetOnce, since the property doesn't change after 
being set.  (It will also assert if you attempt to set a property more 
than once, which would indicate a bug).

+fun rules x3 (conversion as ((W2, W3, {signed=s2}), args, targs)) =

Since the rules only apply to conversions, args must be a singleton and 
targs must be empty.  Might be clearer to use the single arg and omit 
targs.  (This does require a little more work at the point where the 
primapp is reconstructed.)

+   let
+      val x2 = Vector.sub (args, 0)
+      val { equals, <, <=, ... } = Relation.compare WordSize.compare
+ 
+      fun stop () = set (x3, SOME conversion)
+      fun loop ((W1, W2, {signed=s1}), args, targs) = 
+         rules x3 ((W1, W3, {signed=s1}), args, targs)
+   in
+      case get x2 of
+         NONE => stop ()
+       | SOME (prev as ((W1, _, {signed=s1}), _, _)) =>
+            if W1 <= W3 andalso W3 <= W2 then loop prev else
+            if W1 <  W2 andalso W2 <  W3 andalso (not s1 orelse s2) 
+               then loop prev else
+            if W3 <= W1 andalso W3 <= W2 then loop prev else
+            (* If W2=W3, we never reach here *)
+            stop ()
+   end
+
+val markStatement = fn
+   Statement.T { exp as Exp.PrimApp { args, prim, targs },
+                 ty = ty,
+                 var = SOME v } =>
+     (case Prim.name prim of
+         Prim.Name.Word_extdToWord a => rules v (a, args, targs)
+       | _ => ())
+ | stmt => ()

See http://mlton.org/SyntacticConventions for some general coding style 
rules.  I always find a fn with multiple branches with multiple lines very 
hard to read.  I would use

  fun markStatement stmt =
     case stmt of
        Statement.T {exp as Exp.PrimApp {args, prim, targs},
                     ty = ty,
                     var = SOME v} =>
           (case Prim.name prim of
               Prim.Name.Word_extdToWord a => rules v (a, args, targs)
             | _ => ())
      | _ => ()

+fun combine program =
+   let
+      val Program.T { datatypes, functions, globals, main } = program

We always run the SSA shrinker at the end of optimizing each function, to 
perform the "clean-up" optimizations discussed in the previous e-mail.  To 
prep the shrinker, seed it with the global variables:

        val shrink = shrinkFunction {globals = globals}

+      val functions = 
+         List.map

Because the order of functions in an SSA program doesn't matter, 
List.revMap avoids an extra reversal.

+         (functions, fn f =>
+          let
+             (* Traverse blocks in dfs order, marking their statements *)
+             fun markBlock (Block.T {statements, ... }) =
+                (Vector.foreach (statements, markStatement); fn () => ())
+             val () = Function.dfs (f, markBlock)
+ 
+             (* Map the statements using the marks *)
+             val {args, blocks, mayInline, name, raises, returns, start} =
+                Function.dest f
+             fun mapBlock block =
+                let
+                   val Block.T {args, label, statements, transfer} = block
+                in
+                   Block.T {args = args,
+                            label = label,
+                            statements = Vector.map (statements, mapStatement),
+                            transfer = transfer}
+                end
+             val f =
+                Function.new {args = args,
+                              blocks = Vector.map (blocks, mapBlock),
+                              mayInline = mayInline,
+                              name = name,
+                              raises = raises,
+                              returns = returns,
+                              start = start}

Now that we have a new function, run the shrinker.  The shrinker will 
clear the function:

               val f = shrink f
+ 
+             (* Clear the mess we've made on the variables *)
+             val () = Function.clear f
+          in
+             f
+          end)

However, the shrinker will not clear globals and the property will 
automatically be set for any global to which get was applied.

        val () = Vector.foreach (globals, Statement.clear)

Alternatively, use Program.clearTop, though you haven't added properties 
to datatypes or function names, so it is a little overkill.

+   in
+      Program.T { datatypes = datatypes, 
+                  functions = functions, 
+                  globals = globals, 
+                  main = main }
+   end
+
+end

I think your current Function.dfs (for the analysis) followed by 
Vector.map (for the transformation) is probably sufficiently efficient 
(and certainly the clearest), but here are some alternative structures:

  * Because you have sufficient information for the transformation when
    visiting a block in DFS order, you could accumulate transformed blocks
    into a Block.t list ref during the traversal.  This probably isn't more
    efficient, since you are not changing the number of blocks; a
    Vector.map is likely to be more efficient than a bunch of ref updates
    followd by a Vector.fromList.

  * In the mapBlock and mapStatement, if the transformation is sufficiently
    rare, then it can sometimes be better to first check if the block needs
    to be transformed at all.  If not, then simply return the existing
    block, avoiding the allocation (and delayed GC cost of reclaiming the
    discarded block).  Of course, the SSA shrinker does completely rewrite
    the program, so it is probably a wash.

Index: mlton/ssa/simplify.fun
===================================================================
--- mlton/ssa/simplify.fun	(revision 7211)
+++ mlton/ssa/simplify.fun	(working copy)
@@ -15,6 +15,7 @@
  structure CommonArg = CommonArg (S)
  structure CommonBlock = CommonBlock (S)
  structure CommonSubexp = CommonSubexp (S)
+structure CombineConversions = CombineConversions (S)
  structure ConstantPropagation = ConstantPropagation (S)
  structure Contify = Contify (S)
  structure Flatten = Flatten (S)
@@ -77,6 +78,7 @@
     {name = "localRef", doit = LocalRef.eliminate} ::
     {name = "flatten", doit = Flatten.flatten} ::
     {name = "localFlatten3", doit = LocalFlatten.flatten} ::
+   {name = "combineConversions", doit = CombineConversions.combine} ::
     {name = "commonArg", doit = CommonArg.eliminate} ::
     {name = "commonSubexp", doit = CommonSubexp.eliminate} ::
     {name = "commonBlock", doit = CommonBlock.eliminate} ::
@@ -183,6 +185,7 @@
     val passGens =
        inlinePassGen ::
        (List.map([("addProfile", Profile.addProfile),
+                 ("combineConversions",  CombineConversions.combine),
                   ("commonArg", CommonArg.eliminate),
                   ("commonBlock", CommonBlock.eliminate),
                   ("commonSubexp", CommonSubexp.eliminate),

As mentioned in the previous e-mail, commonArg doesn't really apply to the 
result of the combineConversions optimization.  combineConversions should 
come before commonSubexp for the reasons you've noted earlier.  I would 
have liked to see combineConversions come before some inlining, since it 
has the effect of shrinking the size of functions using conversions.  I 
wonder if we shouldn't have an earlier round of commonSubexp (and thereby 
able to move combineConversions earlier) after contify2 and before 
inlineNonRecursive.



More information about the MLton mailing list