[MLton-devel] Fwd: C back end for MLton

Stephen Weeks MLton@mlton.org
Wed, 6 Nov 2002 18:01:55 -0800


> If this isn't enough to figure out what's going on, then I put nucleic.scm
> and prefix-gambit.scm at
> 
> http://www.math.purdue.edu/~lucier/{nucleic,prefix-gambit}.scm
> 
> > Actually, it occurs to me that it is easier for a quick comparison for
> > you to delete code than for me to add it.
> 
> I'm not concerned about the extra run time, I'd just like to make sure
> that the codes are computing the same thing.

Indeed, they were not.  Not even close, as a simple inspection of the
code showed.  I patched our version of the benchmark (you can get it
from our CVS) so it matches your Scheme version.  Now it computes
33.7975948908, which passes the result test.

Here is the data for the old and new patched versions, both compiled
by mlton -native true on my 1.6Ghz P4

	ms/loop	bytes/loop
old 	26.8	4,907,956
new	57.3	8,874,283

Extrapolating a little, it looks like MLton will be slightly faster
than Gambit on your machine, but no where near a factor of 2.5.
Please send the comparison once you run it.  Thanks.

> If the conclusions will always be "C back ends are worse than native back
> ends" and/or "gcc is buggy", then tell me now and we can stop wasting
> time discussing it.  My guess is that MLton doesn't have deterministic finite
> automata models of x86 processor execution in its native back end,
> or any number of other optimizations, that gcc has.

Sorry if we appeared a bit resistant :-).  I think it is fair to
conclude that we believe that it would be difficult to impossible to
make the quality of the MLton C codegen exceed the quality of the
MLton native codegen on x86.  One of the things that surprised me the
most about the native codegen project, which was started in the summer
of 2000, is how quickly it was able to catch up to and surpass the C
codegen.  This, despite the fact that we had been using and improving
the C codegen for over 2 years, were highly motivated to make it good,
and had had quite a bit of success with it compared to other SML
compilers.  This also despite the fact that the native codegen was
very simple (compared to gcc), as Matthew mentioned.

The conclusion that I drew from this is the same as Matthew (I hope
I'm not putting too many words in his mouth) -- that the information
lost by translating to C that gcc is unable to recover dwarfs the many
advantages of gcc's optimizer.  There are surely some programs with
some very hot loops that will violate this rule.  But those are
probably also the ones that are easy to rewrite with a C FFI for the
hot stuff.

I'm still happy to see improvements to the C codegen.  But, given my
beliefs, it's unlikely I'll spend significant time trying to improve
it on x86.  I would much rather see improvements to the C codegen for
another architecture, where there is no native codegen to beat and
hence the benefits are much greater.  I also think it would be
interesting to repeat the experiment on another architecture of
building a C codegen, improving it, and then building a native codegen
to see how difficult it is to beat the C codegen.

A bit more about the C codegen.  Like I said, we used to worry about
its performance a lot, but almost not at all lately.  So it's fair to
say that the current spread between the native and C codegens is
larger than it could be.  One very relevant example is the case of
integer overflow.  With the C codegen, we couldn't figure out any way
to implement overflow as required by the language without a huge
performance hit (lots of extra tests per arithmetic op).  So for the C
codegen, we abandoned correctness and tested for overflow, but didn't
allow the program to handle overflow, as required by the language.
Once we had the native codegen, in which it is easy to correctly
implement overflow, working well enough and didn't care as much about
performance of the C codegen, we introduced some slowdown into the C
codegen so that we could implement overflow detection correctly.  This
is now enabled by default.  If you want to use the older, faster, but
incorrect code with the C codegen, you need to compile with
-DFAST_INT.

Anyways, this is a good example of how decisions that we have made
lately have slowed down the C codegen.

> Playing with command-line switches is, of course, one or two orders of
> magnitude easier than implementing computed gotos.

Agreed.  In fact, to ease your experiments, and to make it easier for
people to improve the quality of C codegen on their own machine
without needing to modify the compiler, we have added two new
command-line switches to MLton.

-cc /path/to/c/compiler
-ccopt <options>

The first will cause MLton to use the specified C compiler and will
reset the list of switches normally passed to the C compiler to be
empty.  The second adds switches to be passed to the C compiler.  So,
you can now do the following

benchmark -mlton "mlton -native {true,false -cc gcc{, -ccopt -fomit-frame-pointer}{, -ccopt -fschedule-insns}}" nucleic

This will benchmark nucleic in the following five ways

MLton0 -- mlton -native true
MLton1 -- mlton -native false -cc gcc
MLton2 -- mlton -native false -cc gcc -ccopt -fschedule-insns
MLton3 -- mlton -native false -cc gcc -ccopt -fomit-frame-pointer
MLton4 -- mlton -native false -cc gcc -ccopt -fomit-frame-pointer -ccopt -fschedule-insns

All of this is checked in to our CVS.  There are instructions on how
to access our CVS at http://www.mlton.org/download.html.  Let us know
if you need help.

BTW, -cc and -ccopt are so called "expert" options.  This means that
they are not documented in the manual,  and are much more likely to
change than other options.  They also do not appear in the regular
usage message.  However, if you do "mlton -v -z" you can get a usage
message that displays the expert options in addition to the normal
ones.


-------------------------------------------------------
This sf.net email is sponsored by: See the NEW Palm 
Tungsten T handheld. Power & Color in a compact size!
http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0001en
_______________________________________________
MLton-devel mailing list
MLton-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlton-devel