[MLton] MLton calling convention and closure conversion

Wesley W. Terpstra terpstra at gkec.informatik.tu-darmstadt.de
Mon Jan 22 16:08:15 PST 2007


I wanted to make sure I understand these issues before I spread  
misinformation:

As I understand it, gcc doesn't do tail recursion very well, because  
it is the caller's responsibility to pop the stack; the callee does  
not know the number of arguments passed to the function. This is  
different in the code output by MLton since tail recursion is  
required by SML. I am fairly certain that the MLton stack for ML  
threads is heap allocated from when I looked at that code last.

 From what I've seen of the C codegen, MLton uses goto (or switch)  
within some sort of 'chunk' files (I still don't understand what  
these are) and when you need to branch between these files, you  
return the function pointer of the destination from the current  
function. This then gets processed by a 'trampoline' that invokes the  
returned pointer. So, the ML stack lives on the heap and the C stack  
is never more than one-level deep due to the trampoline. When a C  
call is needed, this is simply invoked on the C stack.

However, recent progress in gcc now allows functions with the same  
number and type of arguments to be invoked tail-recursively.  
Therefore, isn't this whole goto/switch/trampoline infrastructure  
unnecessary and possibly slow? Since MLton is tied to gcc for its  
assembler anyways, why not use these extensions?

I am guessing that the x86 codegen does something smarter.  
Presumably, the x86 codegen uses its own calling convention (what is  
it?) that supports tail recursion. This is easy(?) since MLton knows  
the number of arguments to a function. The C stack doesn't even need  
to be kept in a register within SML code. When the x86 codegen needs  
to call C, it uses an alternate stack (probably the original stack  
that the program was started with to appease glibc's obsession with  
thread-specific values at the bottom of the stack?) and manually  
pushes the necessary arguments, calls the method, and then pops those  
arguments.

Finally, I tried to read your paper on closure conversion in MLton. I  
didn't understand any of the type equations or transformation rules  
as it uses notation I've never seen (I've never studied compiler  
theory). However, the example SML code with the freshly introduced  
datatypes made sense to me. I was wondering, however, how deeply the  
flow analysis goes: can it always eliminate the need for a function  
pointer? (for example if I had an event heap with a callback function  
for each event, where many different events are registered all over  
the program)

Finally, it seems to me that the functor and polymorphism elimination  
via code replication and MLton's closure conversion are probably the  
main reasons that MLton is so fast(?).  F# and SML.NET can not  
benefit from these tricks, because they must output CLR object code  
that is linked by microsoft's linker. That CLR includes polymorphism  
and closures in unconverted form. These optimizations would have to  
be performed at link time when all the flow&use information is  
available. As MS's linker does not do this, F# and SML.NET are slow.

In conclusion: I recently had to write a large piece of software in C+ 
+ for the first time since I've learned SML. It was horrible! I've  
become addicted to the expectation that I can write in a high-level  
language and expect good performance. I only did it as I needed 64GB  
of RAM. Please hurry with 64 bit support and multi-threading. :-)




More information about the MLton mailing list