[MLton-devel] nucleic benchmark times

Wed, 13 Nov 2002 11:56:21 -0800

> In an attempt to put some hard numbers with these opinions, I ran
> benchmarks with the following options:
...

Thanks for the excellent analysis.  It shows there is lots to be
learned by just playing around with MLton and not even implementing
new stuff.

> Here are the partial orders I would expect.
...
> MLton0 <= MLton1  -- aliasing the stack and heap hurts
> MLton2 <= MLton3
> MLton0 <= {MLton4,MLton5} -- and C-codegen hurts
> MLton2 <= MLton6
...

It would be nice to automate the benchmark script to check partial
orders and print anomalies.

> MLton0 <= MLton1 -- [], 0.28
> MLton2 <= MLton3 -- ["fft"], 0.25

Very impressive wins from aliasing information.  It's gonna be tough
for gcc (or any other codegen missing aliasing information) to make up
for that kind of slowdown with other optimizations.  Maybe there's a
PLDI submission comparing C and native codegens hiding in here?

> MLton1 <= MLton4 -- ["DLXSimulator","tsp"],3.47925
> MLton1 <= MLton5 -- ["DLXSimulator","fft","smith-normal-form","tsp"],0.56575
> MLton3 <= MLton6 -- ["DLXSimulator","tsp"],0.6385
> 
> "tsp" and "fft" violations suggest that floating-point could still be a
> little better in the native codegen, but no violations in nucleic and
> raytrace suggest we're not doing too badly.  

	MLton1	MLton5	
DLXSim	1.50	1.21
fft 	1.09	1.06
tsp	1.36	1.13
s-n-f	1.00	0.99

I'd say something even stronger.  The floating-point stuff in MLton's
native codegen is good enough that we shouldn't even consider spending
time improving it right now -- there's just not enough room for
improvement.  

DLXSim isn't floating-point intensive, right?   Maybe there are
register allocation problems?  One thing that turning off overflow
detection could do is cause more inlining and lot's bigger basic
blocks.

> My only guess with smith-normal-form is that since virtually
> everything will be FFI calls to gmp, the C-codegen is just better at
> organizing around C-calls.

Maybe, but you're comparing 0.99 to 1.00, so it could very well be
noise.  All of the time in smith-normal form is spent in the C code,
as a profile shows.

> But, obviously the majority of time the native codgen is beating the
> C-codgen by a non-trivial margin.  I'm not sure where to attribute the
> speedup.

Yeah, I dunno.

> Comparing -detect-overflow {true,false} yields more anomalies.
...
> DLXSim is again an unexplained outlier, and I think it skews the MLton2 <=
> MLton0 average quite a bit.  Looking through the benchmarks above, the
> violations (other than DLXSim) are all < 0.07 (which is still a little odd
> that there are any violations). 

Yeah.  The only worrying one is DLXSim.

> I think it would be worthwhile to implement an SSA pass that
> eliminates integer overflow checks
...

I agree.  It's been on my todo list since I sent out mail back in Jul
2001 describing an analysis to do so. :-(

Having seen your numbers, I am now sufficiently motivated to do at
least a simple one.

> Very surprisingly, -detect-overflow true -DFAST_INT vs
> -detect-overflow false has almost no effect in the C-codegen on
> average.

It's at least good to know that one can without a performance hit use
the former, which is less harmful semantically, rather than the
latter.

-------------------------------------------------------
This sf.net email is sponsored by: Are you worried about 
your web server security? Click here for a FREE Thawte 
Apache SSL Guide and answer your Apache SSL security 
needs: http://www.gothawte.com/rd523.html
_______________________________________________
MLton-devel mailing list
MLton-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlton-devel