[MLton] native vs. C code generation

Stephen Weeks MLton@mlton.org
Thu, 2 Jun 2005 11:57:12 -0700


> "How much effort would it be?"  The x86 native codegen is the product of 
> one full-time summer internship (2000), one half-time summer internship 
> (2001), and steady part-time work (since fall 2000).  And the x86 native 
> codegen is by no means a highly tuned beast.  We've revised the lowest 
> level ILs since the x86 native codegen was started (for the better, making 
> future native codegens likely to be slightly easier), but I would still 
> estimate a good 4 to 6 months of full-time work to get something in the 
> ballpark (i.e., with the C-codgen about 25% slower).
> 
> So, probably the best answer is what one thinks of that trade off:
> say, 6 months of work for 1.3X speedup.

My guess is that the time estimate is a little pessimistic, given the
changes we have made to the low-level ILs and the stuff we have moved
from the non-portable codegens to the portable IL, as well as the many
design issues that we've already faced and hence have experience with.
But not too far off.

A couple of points that weren't mentioned.

1. The C/native ratio may be very different on platforms other than
x86.  In my experience, the x86 is very forgiving and tends to smooth
out the effects of decisions.  On other platforms, the additional
information that a native codegen has might make for more improvement,
and might make it more easily achievable.

2. A native codegen offers significantly improved compile times.  This
can be important in large projects, such as MLton itself.  We would be
in a pretty unpleasant development situation if we only had the C
codegen on x86.

Other than that, I agree with Matthew's analysis.  Unless one has a
very specific need, or wants to do it for pedagogical purposes, there
are many better places to spend one's time improving MLton than
writing a native codegen.