[MLton-user] Tracking down allocations

Matthew Fluet fluet at tti-c.org
Mon Jul 16 08:08:02 PDT 2007


On Mon, 16 Jul 2007, Joe Hurd wrote:
>>  If tuple_32 is passed as an argument to
>>  another SSA function, then it may be a case of the input programs being
>>  just different enough that the flattening optimization behaves differently
>>  on them.
>
> tuple_32 is indeed passed into another function. To make things
> crystal clear to the compiler I've stuck with the manually
> lambda-lifted version.

I'd still be interested in seeing the source program, to see if there is a 
clear reason why the tuple is being eliminated in one version but not the 
other.

> One last question. I have a big function with 81 patterns like this:
>
> datatype t = A | B | C;
>
> fun f A A A A = ...
> |  f A A A B = ...
> |  ...
> |  f C C C C = ...
>
> where each of the right hand sides are just a single function call. It
> is called precisely once, and the call-site is accounting for 50% of
> the execution time of the whole program. Am I right in assuming that
> this is reflecting the cost of the pattern match, and is there
> anything that can be done to speed it up?

Time profiling will include the time for the pattern match.  If this 
function is called exactly once in the program, and accounts for 50% of 
the execution time, then I don't think the program is executing long 
enough to be meaningfully profiled.  The match compiler will turn that 
into something like:

   fun f w x y z =
     case w of
        A => (case x of
                 A => (case y of
                          A => (case z of
                                   A => ...
                                 | B => ...
                                 ...)
                        ...)
               ...)
      ...

Even so, it shouldn't be more than 24 instructions to discriminate (a 
comparision and conditional jump for each of the three variants, times 
four values to match).



More information about the MLton-user mailing list