[MLton-devel] nucleic benchmark times

Fri, 8 Nov 2002 14:40:22 -0500 (EST)

> > The times for the corrected nucleic benchmark on my machine ended up as
> > 
> > MLton: 		.235 seconds
> > Gambit 4.0b1:	.250 seconds (after inlining all calls to tfo_apply)
> > C (gcc -O2):	.126 seconds
> > 
> > So we (the FP community) still have a way to go.
> 
> Yep.  I am sad to say that MLton probably does worse on nucleic than
> any of our other benchmarks.  It is the only benchmark that we have
> where SML/NJ does significantly better than MLton.  Here are the
> normalized running times on my machine.
> 
> MLton:		1.72
> SML/NJ:		1.21
> C (gcc -O2):	1
> 
> I took a look again at profiling nucleic and trying to figure out
> anything we're doing obviously wrong, and I don't see it.
> 
> It is some small consolation that at least one functional language
> compiler at least compares pretty well with C.

Here's my situation (he says after mailbombing the MLton development list ...).

I don't live and die by the nucleic benchmark.  Gambit doesn't do so well
on it either for various reasons that are specific to Gambit and that I
probably don't need to go into.

On the other hand, I do use Scheme for high-performance scientific computing.
My numerical PDE code for elliptic and parabolic equations, compiled using
Gambit+Meroon, is pretty damn fast.  At the lowest level (sparse matrix-vector
multiply), it's as fast as C.  I like that; that's important to me.  It's
also important to me that I, and/or some students, can actually *complete*
the program---I tried it in Fortran as a graduate student around 1978, and
I failed, I could just never get it debugged.  In object-oriented Scheme,
it's almost transparent.

64-bit implementations are important to me (or at least, having arrays
that are > 16 MB in size is important to me, and with the type tags
in the vector headers in Gambit, this only happens in 64-bit implementations).
I've been working on a project for functional MRI processing with wavelet
transforms, etc., and in Gambit I get about 200 Mflops on a 500 MHz
Alpha.  That's important to me, too, when I have 80 MB single-precision
floating-point arrays as data sets.

It's been a while since I had looked at nucleic, so it struck me as possible
that MLton was 2.5 times as fast as Gambit.  For a 2.5 time speed improvement,
I might learn ML; hell, I might even work to port it to other (64-bit) machines.
For a few percentage points improvement on nucleic, I won't do that; Suresh
indicated he might be interested, sometime, in taking the central part
of my numerical PDE code and translating it to ML to see what the performance
is on MLton (that code doesn't need a 64-bit implementation).  I don't
think ML will do significantly better than Gambit on that code, but, hey,
I like surprises as much as the next guy :-).

Now, some personal comments on how I see your approach.  It is not clear
to me that you want MLton to be a production compiler, and not just a
research project.  That's fine, but it makes it less appealing to an
applications guy like me.  My approach has been to bug Feeley to improve
Gambit, bug Queinnec to improve Meroon, and, later, bug the gcc developers
to improve gcc.  And, to some extent, I've been successful in all three
instances.  Sometimes I had to wait a long time; but with Feeley reporting
a 50% improvement on some benchmarks by using computed gotos instead of
switches, I think it was worth my while to get an example to the gcc folks of
clearly miscompiled code using computed gotos, and it got fixed by gcc 2.95.  I
bugged them about miscompilations on 64-bit sparc, and that finally worked
to my satisfaction in gcc 3.1.  And there are side improvements along the
way---when Apple switched from gcc 2.95 to gcc 3.1 going from MacOS X 10.1.5
to 10.2, there was a general 8 to 15% improvement in code speeds.  A new,
modern register allocator will be included in gcc 3.3; it is not planned to
make it generally usable until 3.4, but at least it's there to test and
to provide examples where it has difficulties.  And now gcc takes into account
certain memory aliasing rules; perhaps MLton and Gambit can take advantage of
that for higher performance.  (I find it kind of ironic that MLton is using
C more or less as a typeless language, much as one might naively translate
Scheme into ML using a single unified type.)

I just happened to have lunch with Suresh and Jeff Siskind, who's also at
Purdue now.  There was talk about using different back-end toolkits for
people, like the mlton team, who want to concentrate on front-end issues,
including high- and mid-level optimization strategies.  That's fine, but will
these back-end teams continue to support their code? Will it be ported to new
architectures that come out?  (Suresh mentioned interest in a native MLton
backend for x86-64; where will MLton be in the unlikely event that the Itanic
is the winner of the 64-bit processor battle? Do you guys really want to write
a VLIW back end for MLton?)

Anyway, thank you for indulging me in this little bit of cross-language
performance comparison.  My only x86 machine is now too slow (350 MHz
Pentium II) and has too little memory (384 MB) to make further tests of
MLton tolerable.  Perhaps if I upgrade to a more modern x86 processor (which
seems unlikely; I think my emphasis beginning next year will be on
commodity 64-bit systems if I can get them) I'll come back to it.

Brad

-------------------------------------------------------
This sf.net email is sponsored by: See the NEW Palm 
Tungsten T handheld. Power & Color in a compact size!
http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0001en
_______________________________________________
MLton-devel mailing list
MLton-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlton-devel