[MLton] Constant folding and simplifying FP ops

Fri Dec 21 13:20:01 PST 2007

The pessimistic compilation of FP has been the topic of several recent
posts.  I definitely agree that, at least by default, MLton should opt for
correctness at the potential expense of performance.  However, I think
that there are many minor optimizations that could be done on FP code
without breaking correctness.  Attached is an experimental patch (it is
not yet finished) that implements some of the ideas I will briefly describe
here.

AFAIU, the main reasons behind pessimistic compilation of FP computations
are the dependence on rounding mode and inability to generally reorder FP
computations due to rounding.  So, the question is how do we effectively
enough avoid those problems?

Personally, I think that one of the main benefits of using an optimizing
compiler, like MLton, is that it frees the programmer from many trivial
optimization tweaks.  So, although an optimization might not speed up
every program, it can be useful if it frees the programmer from having to
perform the same optimization manually.

So, back to the topic.  Suppose we wish to constant fold FP operations.
How do we ensure that it is safe?  IOW, how do we ensure that changing the
rounding mode in the program does not render the optimization invalid.
Perhaps the simplest approach is to just ensure that the operation
produces the same result regardless of the rounding mode.  This is fairly
easy to implement (there are a few special cases to consider, e.g. cross
compilation), and while it doesn't apply in all cases, I believe it does
what the user would expect.  For example, simple add, sub, and mul with FP
constants that are integers (and more) will typically be constant folded.
I have implemented an experimental version of this optimization and
verified that it works on some test cases.  As an example, after properly
parenthesizing the SML version (to correspond to the C version; see the
code at the end), this optimization is effective enough to eliminate the
arithmetic inside the eval loop in Sean McLaughlin's post

  http://mlton.org/pipermail/mlton-user/2007-December/001294.html

and MLton gets much closer to gcc.

What if only some of the arguments to an FP operation are known?  In some
special cases, it is still possible to improve the code, because one can,
for example, use faster FP operations.  A particular case in point is FP
divide by a power of two.  It is not totally uncommon to see something
like "x / 2.0".  In cases where an FP number has an exact inverse, AFAIU,
it is a valid optimization to replace division with multiplication by the
reciprocal.  A simple variant of this optimization is to only optimize the
cases where the FP number is of the form 2^n, because in those cases the
number trivially has the exact inverse 2^(-n) (when representable).  I
almost implemented this (see the XXX in the patch), but then I noticed
that the current spec for Prim.apply does not allow it; only variables are
allowed in the result.  This may present a fundamental restriction, but if
that isn't the case, as there are only 3 uses of Prim.apply, it probably
wouldn't be too difficult to change this and I'll probably try that soon.

Multiplication by a power of two has the special property that it is exact
modulo underflow/overflow.  This could be used to reassociate FP
expressions to expose further opportunities for optimization.  For
example, an expression such as

  (x * Math.pi) * 2.0

could be rewritten safely to

  x * (Math.pi * 2.0)

because it always gives the same result (even when x is nan or when one of
the multiplications overflows).  I have not implemented this optimization.

Other valid, AFAICT, special cases include the following (which is by no
means an exhaustive list):

    x (+|-|*|/) nan => nan
  nan (+|-|*|/) x   => nan

  x (+|-) +0.0 => x
  +0.0 + x => x
  +0.0 - x => neg x

  x (*|/) +1.0 => x
  +1.0 * x => x

  x (*|/) ~1.0 => neg x
  ~1.0 * x => neg x

Some of these might result from other optimizations or from naively
written code (e.g. written by a "Fortran programmer" :-)).  I think I
implemented all of the above identities.

Anyway, there are still many more FP optimizations (for FP primitives)
that could be done in Prim.apply.  I plan to look into implementing some
of them, disable the constant folding during cross compilation (in
real-x.fun), and then commit the patch unless there are objections (or
corrections).  Currently all regressions pass with my patch on x86.  After
that I could look into making Prim.apply flexible enough to express a few
more optimizations (notably Real_div => Real_mul and perhaps Word_mul =>
Word_lshift and Word_quot => Word_rshift and Word_mod => Word_andb).

-Vesa Karvonen

<--- sean.sml --->
fun eval_real_poly n = let
   val msg = n div 10
   val x1 = 4.0
   val x2 = x1
   val x3 = x1
   val x4 = x1
   val x5 = x1
   val x6 = x1
   fun eval (0,store) = store
     | eval (n,store) =
       let in
          if n mod msg = 0 then print ("iter: " ^ Int.toString n ^
                                       "\n") else ();
          eval (n-1,
                store + ((~x1)*x4 + x2*x5 +(~x3)*x5 +(~x2)*x6 + x3*x6 +
                         x4*(x2 + x3 + x5 + x6) + x4*(~x1)+x4*(~x4)))
       end
in
   eval (n,0.0)
end

val _ = print (Real.toString (eval_real_poly 500000000) ^ "\n")
<--- sean.sml --->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fp-opts.patch
Type: text/x-diff
Size: 9925 bytes
Desc: not available
Url : http://mlton.org/pipermail/mlton/attachments/20071221/5f78f890/fp-opts.bin