[MLton-user] Mostly MLton performance.

Stephen Weeks MLton-user@mlton.org
Tue, 28 Mar 2006 18:38:22 -0800


> If I run the MLton compiled program several times, running times
> increase significantly
...
> My first guess was that the GC may behaves differently in different
> runs, but fixed-heap did not eliminate the difference, and using
> gc-summary I can see that the GC only takes slightly more time in
> the longer runs (the differences in "total GC time" are less than
> one second). When the program is run on a computer that has "rested"
> (low load average last minute or so) it seems to always take roughly
> the same time.

I am curious to hear if you have made any progress on this.  I am
sceptical that MLton's GC is responsible.  The difference from one run
to the next (almost a factor of two in running time) seems too large
-- I have never seen this kind of behavior.

Are you using anywhere near the amount of available RAM on the
machine?  Is anything else running?

In the worst case, if you can make the source code available, we can
try to duplicate what you see on other machines.

> The second point is that when removing checks for nan-ness on reals,
> I get a significant speedup. I would like to use nan-ness as a kind
> of flag (instead of real option) as I expected it not to give me a
> large penalty. In C++ using gcc I feel there is a much smaller
> penalty (around 4 seconds) compared to MLton (around 15 seconds),
> and my gcc compiled program produces reasonable results.

I looked into the MLton implementation of Real.class and Real.isNan.
The costs of Real.class are the following.

 1. A function call to Real64_class, a C function defined in the MLton
    runtime library.  On most platforms, this function is a wrapper
    around the fpclassify macro.
 2. Actually doing fpclassify.
 3. Converting the integer returned by Real64_class to the
    IEEEReal.float_class datatype.

Doing some micro(nano?)benchmarking on my machine, I see the following
times, in nanoseconds, for each of these.

 (1) 11 ns
 (2) 14 ns
 (3)  1 ns

That is, an SML call to Real.class takes about 26 ns, with most of the
work evenly split between fpclassify and C call overhead.

Real.isNan is implemented with
 
  fun isNan r = class r = NAN

So, the cost of Real.isNan is going to be about the same as
Real.class, since the SML call to Real.class will be inlined and the
subsequent test costs little, relatively speaking.  Microbenchmarking
confirms this, with a call to Real.class taking about 28 ns.

C obviously implements the isnan macro better that simply calling
fpclassify, since it only takes about 11 ns on my machine.  MLton
could do better by implementing Real.isNan as a call to a C wrapper
around the C isnan macro, but unfortunately that won't save us a
significant amount, since it doesn't affect the overhead from (1) and
(3).

So, unfortunately, it looks like an SML Real.isNan, at 28ns, is about
2.5 times slower than a C isnan, at about 11ns.  That roughly
corresponds with the 15s vs 4s penalty you saw.

The only way to speed this up is to eliminate the C call.  Since MLton
doesn't support calling C macros, you would also have to implement the
isnan functionality in SML.  That would be fine, except the only
operations in the SML basis for getting at the bits of a real are
provided by the PackReal structures.  And, you guessed it, MLton
implements those with C calls too.  So, I don't see any way to improve
things with a stock MLton.  The right way to go is to put a primitive
in MLton that lets one get at the bits of a real.  Since Real.class is
used all over the place in the implementation of the Real structure,
this would speed up a number of functions (including, of course, isNan).
If you're interested in doing this, it would probably be a couple days
work, and you can ask for guidance on the MLton developer list
(MLton@mlton.org).

> Ps. In the "fold" page on the MLton homepage (liked from the for-loop
> page), the term "eta expand" is used without any reference. Is this the
> same as the eta-reduction mentioned in the wikipedia article on lambda
> calculus? In that case I think that a link to the wikipedia article
> might be helpful for beginners like me.

I added a new "EtaExpansion" page, linked from "Fold".  BTW, the MLton
website is also a wiki, and we welcome contributions large and small.