# [MLton-user] timing anomoly

Sean McLaughlin seanmcl at gmail.com
Tue Dec 4 08:48:24 PST 2007

```Fair enough :)

However, is it possible to make this code run faster?  It currently runs
over 3X slower than the equivalent C program:

(* BEGIN SML *)

fun eval_real_poly n =
let
val msg = n div 10
val x1 = 4.0
val x2 = x1
val x3 = x1
val x4 = x1
val x5 = x1
val x6 = x1
fun eval (0,store) = store
| eval (n,store) =
let in
if n mod msg = 0 then print ("iter: " ^ Int.toString n ^
"\n") else ();
eval (n-1,store + (~x1)*x4 + x2*x5 +(~x3)*x5 +(~x2)*x6 + x3*x6 +
x4*(x2 + x3 + x5 + x6) + x4*(~x1)+x4*(~x4))
end
in
eval (n,0.0)
end

val _ = print (Real.toString (eval_real_poly 500000000) ^ "\n")

(* END SML *)

// BEGIN C

#include <stdio.h>

double eval_real_poly(int n){
int msg = n / 10;
double x1 = 4.0;
double x2 = x1;
double x3 = x1;
double x4 = x1;
double x5 = x1;
double x6 = x1;

double store = 0.0;
int i;
for(i=0;i<n;i++){
if(i % msg == 0) printf("iter: %d\n",i);
store += (-x1)*x4 + x2*x5 +(-x3)*x5 +(-x2)*x6 + x3*x6 +
x4*(x2 + x3 + x5 + x6) + x4*(-x1)+x4*(-x4);
}
return store;
}

int main(){
printf("res: %f,\n",eval_real_poly(500000000));
return 0;
}

// END C

On Dec 4, 2007 10:39 AM, Frank Pfenning <fp at cs.cmu.edu> wrote:
> That's funny :-)  I will have to tell the students in my compiler
> class about this later today...  - Frank  (Sorry, Sean, couldn't resist...)
>
>
>
> On Dec 4, 2007 10:28 AM, Matthew Fluet < fluet at tti-c.org> wrote:
> >
> >
> >
> >
> >
> >
> >
> > On Mon, 3 Dec 2007, Sean McLaughlin wrote:
> > >  I found some very strange behavior of mlton while running some
> > > floating point experiments.
> > > The attached file evaluates some polynomials and returns them.  If you
> > > multiply a value by a
> > > single argument, it multiplies by 10 the performance time of the
> > > entire function.  I'd very much
> > > like to have this kind of code run fast.  When the line is commented,
> > > it runs faster than
> > > C++ (hurray!).  When uncommented, 10X slower :(
> >
> > As Florian discovered, your 'doit' function (with the additional
> > multiplication) just crosses an inlining threshold.  You can discover this
> > by using the '-keep ssa' and '-keep ssa2' options to look at the at the
> > end of the main optimization passes.
> >
> > The default value for the '-inline <N>' compiler option is 60, and using
> > '-inline 65' gets the test program to inline the slightly larger function.
> > You probably don't need to go all the way to '-inline 500' or
> > '-inline 1000'.
> >
> > However, be very careful extrapolating from your timing.sml program to
> > your real application.  I don't believe that timing.sml is measuring quite
> > what you think it is measuring.  Recall the 'repeat' function:
> >
> >   fun repeat_fun f n =
> >       let
> >         val msg = n div 10
> >         fun repeat_fun' f 0 = ()
> >           | repeat_fun' f n =
> >             let in
> >               if n mod msg = 0 then print ("iter: " ^ Int.toString n ^
> "\n") else ();
> >               ignore (f ());
> >               repeat_fun' f (n-1)
> >             end
> >       in
> >         repeat_fun' f (n-1);
> >         f ()
> >       end
> >
> > This ignores there result of the call to 'f' in 'repeat_fun'', and only
> > returns the result of the final call to 'f'.  When the 'doit' function is
> > inlined, MLton quite happily determines that all the fp arithmetic (since
> > it has no side-effects) that is inserted into the 'repeat_fun' loop can be
> > discarded.  So, with the "fast" version, you are executing a nearly empty
> > loop that just occassionally prints a message.  With the "slow" version,
> > you are executing the 'doit' arithmetic every iteration of the
> > 'repeat_fun'' loop, so that is the reason you see a 10X slowdown.  The
> > actual assembly sequence for the arithmetic is identical (modulo the extra
> > multiplication).
> >
> > Even when the 'doit' function is not inlined, MLton could discard the call
> > to 'doit' in 'reapat_fun''.  Since the result of the call is unused, we
> > can discard the call when the called function has no side-effects, only
> > returns normally (i.e., doesn't raise exceptions), and terminates.  The
> > 'removeUnused' optimization pass computes a maySideEffect and mayRaise
> > predicate for each function, but does not currently compute a
> > mustTerminate predicate.  Thus, the call stays, and you see the longer
> > execution time when 'doit' is not inlined.
> >
> >
> >
> >
> >
> > _______________________________________________
> > MLton-user mailing list
> > MLton-user at mlton.org
> > http://mlton.org/mailman/listinfo/mlton-user
> >
>
>

```