[MLton-user] timing anomoly

Tue Dec 4 07:28:12 PST 2007

On Mon, 3 Dec 2007, Sean McLaughlin wrote:
>  I found some very strange behavior of mlton while running some
> floating point experiments.
> The attached file evaluates some polynomials and returns them.  If you
> multiply a value by a
> single argument, it multiplies by 10 the performance time of the
> entire function.  I'd very much
> like to have this kind of code run fast.  When the line is commented,
> it runs faster than
> C++ (hurray!).  When uncommented, 10X slower :(

As Florian discovered, your 'doit' function (with the additional 
multiplication) just crosses an inlining threshold.  You can discover this 
by using the '-keep ssa' and '-keep ssa2' options to look at the at the 
end of the main optimization passes.

The default value for the '-inline <N>' compiler option is 60, and using 
'-inline 65' gets the test program to inline the slightly larger function. 
You probably don't need to go all the way to '-inline 500' or
'-inline 1000'.

However, be very careful extrapolating from your timing.sml program to 
your real application.  I don't believe that timing.sml is measuring quite 
what you think it is measuring.  Recall the 'repeat' function:

   fun repeat_fun f n =
       let
         val msg = n div 10
         fun repeat_fun' f 0 = ()
           | repeat_fun' f n =
             let in
               if n mod msg = 0 then print ("iter: " ^ Int.toString n ^ "\n") else ();
               ignore (f ());
               repeat_fun' f (n-1)
             end
       in
         repeat_fun' f (n-1);
         f ()
       end

This ignores there result of the call to 'f' in 'repeat_fun'', and only 
returns the result of the final call to 'f'.  When the 'doit' function is 
inlined, MLton quite happily determines that all the fp arithmetic (since 
it has no side-effects) that is inserted into the 'repeat_fun' loop can be 
discarded.  So, with the "fast" version, you are executing a nearly empty 
loop that just occassionally prints a message.  With the "slow" version, 
you are executing the 'doit' arithmetic every iteration of the 
'repeat_fun'' loop, so that is the reason you see a 10X slowdown.  The 
actual assembly sequence for the arithmetic is identical (modulo the extra 
multiplication).

Even when the 'doit' function is not inlined, MLton could discard the call 
to 'doit' in 'reapat_fun''.  Since the result of the call is unused, we 
can discard the call when the called function has no side-effects, only 
returns normally (i.e., doesn't raise exceptions), and terminates.  The 
'removeUnused' optimization pass computes a maySideEffect and mayRaise 
predicate for each function, but does not currently compute a 
mustTerminate predicate.  Thus, the call stays, and you see the longer 
execution time when 'doit' is not inlined.