[MLton] Re: PackWord to/from nonsense

Thu Jul 9 15:53:34 PDT 2009

On Wed, 8 Jul 2009, Wesley W. Terpstra wrote:
> Thank you very much for the detailed suggestions and feedback!
>
> On Wed, Jul 8, 2009 at 7:00 PM, Matthew Fluet <fluet at tti-c.org> wrote:
>> +fun rules x3 (conversion as ((W2, W3, {signed=s2}), args, targs)) =
>>
>> Since the rules only apply to conversions, args must be a singleton and
>> targs must be empty.  Might be clearer to use the single arg and omit targs.
>>  (This does require a little more work at the point where the primapp is
>> reconstructed.)
>
> I thought the targs was the type of the things in args? So targs must have
> one element in it.

The "targs" element of the Exp.PrimApp constructor is "type arguments" 
(not "type of arguments").  It specifies the type at which a polymorphic 
primitive is being used.  E.g., the Array_sub primitive requires a type 
argument in order to determine the type of the array and the type of the 
result.  In contrast, the Word_extdToWord primitive (technically, family 
of primitives as instantiated by the src/dst word sizes and signedness) 
are not polymorphic, hence, have no type arguments.

> +      val functions = +         List.map
>> Because the order of functions in an SSA program doesn't matter,
>> List.revMap avoids an extra reversal.
>
> I know, but it's much easier to read a 'diff -u' of the pre.ssa and post.ssa
> this way. However, I guess I'm done debugging it this way now.

Good point.  A lot of optimizations do sufficient "damage" to the 
control-flow graph that a diff isn't particularly helpful.

> However, the shrinker will not clear globals and the property will
>> automatically be set for any global to which get was applied.
>>       val () = Vector.foreach (globals, Statement.clear)
>
> The shrinker is marking the globals? I don't think I am doing so. Should I
> also have run this pass on the globals? I wasn't clear on what computations
> (if any) globals can do.

The shrinker is adding properties to globals, so they should be cleared. 
The combine conversions property is also being added to globals, since the 
init property is added to any property list when you "get" it without 
"set"ing it.  In your case, if there is a Word_extdToWord applied to a 
global, then you will "get" the property from the global (adding and 
returning the default NONE).

Globals can do any pure computations.  On the other hand, most globals are 
constants, there are probably few applications of Word_extdToWord to a 
global that isn't immediately constant folded.

> I think your current Function.dfs (for the analysis) followed by Vector.map
>> (for the transformation) is probably sufficiently efficient (and certainly
>> the clearest), but here are some alternative structures:
>
> Yes, I saw those tricks already when reading other optimization passes for
> tips.
>
> How expensive is Function.dominatorTree? I had a much simpler version of the
> code where I used Tree.map to pre-order rewrite variables and then folded
> the tree into a list. I decided that the algorithm involved was probably
> prohibitively expensive. It would be very nice if there was a dfsMap
> function or something that let you rewrite blocks at finish.

Function.dominatorTree is more expensive than Function.dfs; both are more 
expensive than a simple Vector.map.  I don't think that the dominator tree 
gives you any more useful structure than the dfs; you just need to visit 
defs before uses.

>> As mentioned in the previous e-mail, commonArg doesn't really apply to the
>> result of the combineConversions optimization.  combineConversions should
>> come before commonSubexp for the reasons you've noted earlier.
>
>> I would have liked to see combineConversions come before some inlining,
>> since it has the effect of shrinking the size of functions using
>> conversions.
>
> It doesn't shrink the code that much, and if you don't inline first you
> might miss an opportunity to simplify the code.

Sure; there is always a phase ordering problem.  But, I think that 
combineConversions would be a fairly cheap optimization and could be 
applied more than once.

>>  I wonder if we shouldn't have an earlier round of commonSubexp (and
>> thereby able to move combineConversions earlier) after contify2 and before
>> inlineNonRecursive.
>
> When I compiled MLton with this optimization it affected a grand total of
> five conversions. So I think you're overestimating the impact on code size.
> The optimization has the most impact on tight loops involving conversions
> (typical for serialization code like I'm working on). Even there the
> code-size isn't greatly reduced.

Interesting.  I guess it is true that most of the conversions would be 
quite explicit: WordN.fromLargeWord o WordM.toLargeWord.

BTW, what were the results from the benchmarks?  That is good data to 
include in the initial commit of the combineConversions optimization.

>>> I meant PackWord versions without bounds checking. I would still really like
>>> exposed this somewhere. The Unsafe library, for example.
>>>
>> O.k.  Alternatively, someone could write a bounds check elimination pass.
>> ;-)
>>
> How hard is this to do? For it to be useful it would have to work with
> loops.

There is a rich literature on bounds check elimination.  Actually, the 
redundantTests optimization is able to eliminate some array bounds checks.