[MLton] Project Proposal: an MLton "port" to LLVM

Duraid Madina duraid@kinoko.c.u-tokyo.ac.jp
Thu, 17 Nov 2005 12:32:03 +0900


On Tue, Nov 15, 2005 at 11:17:33PM -0500, Matthew Fluet wrote:
> 
> I, at least, have been aware of LLVM for a while, though I haven't studied 
> it in much detail.  I note that it has just recently had a new public 
> release.

I'm very glad to hear this! I used to think LLVM was something of a
well-kept secret in the functional realm but now I know otherwise. :-)
 
> It is certainly an appealing idea, but one that is shared by a number of 
> other projects (MLRISC, C--, etc.).  I'd be happy to see MLton get out of 
> the native-codegen game and be able to focus on higher-level 
> optimizations.  In fact, a few months ago, I undertook an experiment 
> writing a C-- backend for MLton.

Right - it must be nice to forget about codegen, and I hope LLVM will make
2006 a happy year for MLton users on other architectures. ;-) Thanks for the
references to your experience with C--: LLVM will doubtless have issues of
its own, but I'm sure they can be worked out one way or another.

> The big issue is that there is a lot of semantic meaning that we can't 
> convey to gcc through the C language.  The question for any new target is 
> whether or not that semantic meaning can be conveyed through the language 
> interface.

Exactly. LLVM certainly exposes more than C: you can do tail calls and
exceptions efficiently, for example. From time to time LLVM does need to be
extended, but it's never been a problem yet. For example, LLVM previously
did tail call optimization on a "best effort" basis, but now it has explicit
tail calls for which this is guaranteed.
 
> >( http://llvm.cs.uiuc.edu/pubs/2005-05-04-LattnerPHDThesis.html ) can
> >perform minor miracles on the performance of pointer-intensive code, even
> >that coming from weakly typed source languages (C or C++).
> 
> I looked all over the LLVM website and couldn't find benchmark results 
> anywhere.  A major theme of MLton is producing high-performance code, so 
> I'd want to know that LLVM can really do better than gcc.

It's true, there aren't that many benchmarks on the LLVM site. That will
hopefully be fixed one day soon, most likely after LLVM brings its C/C++
front-end up to date with GCC 4's. Anecdotally, LLVM's codegen is roughly on
a par with GCC for x86, generally ahead on PowerPC, and generally behind on
Alpha and Itanium (these last two because they're currently rather young and
have had very little in the way of performance work just yet.) I just
grabbed the "pidigits.sml" benchmark, and compiling MLton's C output with
llvm-gcc leads to more or less identical performance on x86 (+/- 0.02sec out
of 5sec - noise, really.) Having said that, LLVM offers interprocedural
optimization, tail calls, exceptions, accurate garbage collection and so on
which you just can't get from GCC.
 
> Not necessarily.  If you look at the MLton/C-- experiment thread, I 
> describe the "abstract machine" that feeds into all MLton codegens.  The 
> major issue is that to C or C-- (and presumably to LLVM) most of the code 
> looks to be manipulating heap data, since MLton allocates ML stacks on the 
> heap to support very deep call stacks, to support multiple ML stacks for 
> light-weight concurrency, and to support accurate garbage collection.  So, 
> most of the optimizations you mention above won't be effective at this 
> level, since it is generally unsound to rearrange (what would to LLVM 
> appear to be) arbitrary heap reads and writes.

Supporting this well sounds like a *great* extension to LLVM, and the
developers would be happy to help with this.
 
> You really need to write a new codegen pass, translating from Machine. 
> If you don't go through Machine, then you would need to rewrite much of 
> the runtime system (i.e., garbage collector), since the Machine IL is 
> computing all the info needed there.  It probably does make sense to tweak 
> some of the translation into Machine depending on the final codegen.  For 
> example, we could keep things looking more SSA-ish, which would probably 
> be a benefit for something like LLVM that wants SSA form.

OK, that's fair enough.
 
> We're always interested in getting more people involved with MLton 
> development.  We'd certainly encourage the experiment, though (speaking 
> for myself), I'm not sure how much effort we could devote.  We're very 
> happy to answer questions.

Well, I guess we'll see! If any non-x86 MLton users are longing for more
performance, whipping up LLVM codegen might be the quickest way to get it.
I fall into that category myself, so will probably give it a shot early next
year if nobody beats me to it!

	Duraid