benchmarking Poly/ML & floating point

Mon, 11 Sep 2000 17:46:53 -0400 (EDT)

> > No, I haven't seen merge.  Is it in the latest tar.gz that you posted?
> 
> No.  But it is available at http://www.sourcelight.com/MLton/benchmarks/

Actually, it was.  Here are some results:

Compile time (x86 G1):  1.73 (C) vs. 1.63  (x86)
Executable size:       36794 (C) vs. 35874 (x86)
Run time:              82.73 (C) vs. 76.40 (x86)  0.92 (x86/C)

Which might be just about equal to Poly/ML.  I'm running on a 550MHz PIII. 

> > In other news, I have a mostly complete floating point backend set up.  It
> > natively handles all of the prim's except: cosh, sinh, tanh, exp, pow,
> > tan, which are done as ffi calls.  It also does ffi calls for copysign,
> > frexp, and modf, but does not have any inline assembly for those
> > operations.  (On the other hand, for the first group of prims, gcc does
> > have inline assembly, but I haven't been able to completely grok it all;
> > particularly something like pow which has lots of branching and special
> > cases to consider. 
> 
> My guess would be that the right thing to do is to move these cases into SML
> basis library code and make the primitive as simple as possible.  I don't mind
> if some of the tests are duplicated in the SML code and the C backend.

O.K.  I definitely want to get exp and tan into assembly.  The only issue
with tan is that the native instruction automatically pushes 1.0 onto the
flt. stack, which unfortunately makes it different from every other class
of instruction.  Looking at the output of gcc, it inlines the exp code
pretty much as you would expect:

e^x = 2^(e * log_2 x)

except that Intel has two sorts of 2^ instructions: one is
2^(floor(st(0))) and the other is 2^(st(0)) - 1, but -1 < st(0) < 1.
So there is some trickery with getting the fractional and integral
portions of x.  This is also aparent in the hyperbolic math functions.  On
the other hand, pow is all over the place, with lots of branching.
Ideally, I'd want to push stuff into the basis library so that any
instance of the prim Real_pow could be calculated as

x^y = 2^(y * log_2 x)

using the same trickery as above.

> 
> > I'm not sure that the semantics of Real_nequal and Real_qequal as given in
> > mlton-lib.h are correct. For one thing, gcc produces the same assembly
> > sequence for both functions. 
> 
> How can this be?  Looking at mlton-lib.h I see that Real_qequal is the negation
> of Real_nequal -- this may or may not be wrong, but I don't see how it could
> lead to the same code.

I'm sorry.  I wrote that wrong.  I'm still not sure that the semantics of
Real_qequal are correct.  It's the fact that gcc produces the same
assembly for Real_equal (not Real_nequal) as Real_qequal that made me
start thinking about this.  And these do in fact return the boolean
negation of Real_nequal.

So, I think it comes down to the fact that 
#define Real_qequal(x,y) (!((x) != (y)))
isn't the right semantics.

int main(int argc, char* argv[]) {       | output:
  double x = 1.0/0.0 + -1.0/0.0;         |
                                         |
  fprintf(stderr, "%f\n", x);            | nan
  fprintf(stderr, "%d\n", x == x);       | 0
  fprintf(stderr, "%d\n", (x != x));     | 1
  fprintf(stderr, "%d\n", !(x != x));    | 0
                                         |
  return 1;                              |
}                                        |

But the spec states:
?= (x, y) 
    returns true if either argument is a NaN or if the arguments are
bitwise equal, ignoring signs on zeros. It is equivalent to the IEEE ?=
operator. 

The short end of the story is that I found the assembly for isgreaterless
in /usr/include/bits/mathinline.h (although I can't figure out how to get
gcc to include it; just #define __USE_ISOC9X causes parse errors in
mathinline.h, so obviously no one is that concerned about following the
spec that closely), and it seems to jive with what the intel spec has on
the floating-point status word, so I'm inclined to go with it.