[MLton] MLton calling convention and closure conversion

Tue Jan 23 10:17:55 PST 2007

> Part of the reason why I thought it would be an improvement is that I  
> still don't understand what a chunk is. How can MLton possibly know  
> the branch frequency of the code before runtime?

It doesn't.  It simply puts parts of the control-flow graph and call
graph together that jump to each other, using a not very clever
heuristic that I don't recall at the moment.  Because chunks are big
enough, not much cleverness is required to do well.

> Also, users (including me) complain about the C-codegen's compile
> time and gcc memory usage. I believe this stems from large
> functions?

Yes.

> So, if you didn't need the switch&trampoline infrastructure, then
> you could have smaller functions and it should be faster compiling
> at the least.

True.  Although I think you will pay a noticeable run-time performance
cost.  That tradeoff goes against our ususal philosophy (namely, use
another compiler if you want fast compile times and not fast run
times).  You can use -coalesce (with the C codegen) to make chunks
smaller, which can lessen compile time (and increase run time).
Although, there are other constraints that sometimes leave big C
functions around.

> Furthermore, I don't know how smart gcc is with that switch  
> statement. At every trampoline call, doesn't the switch need to be  
> evaluated? With so many possible entry points it can't be quick.  
> Finally, calling through a function pointer blows away all the branch  
> prediction of the CPU; the trampoline is an opaque wall the CPU can't  
> see beyond. Certainly if interchunk jumps are very rare, then I'll  
> agree this is all no big deal.

Agreed on all counts.

> However, I didn't believe it would take this to the extreme that
> Henry mentioned (all lambdas) since that would make a huge number of
> call-sites which would need to be chosen among (heavy branching,
> large code-size). It seemed to me that MLton would (obviously, but
> wrong) keep the function pointer in this case.

Nope.  We've never found the need to go back and revise that (extreme
I agree) design choice.

> I'm still wondering about the MLton calling convention, in  
> particular, after Matthew said alloca() would be difficult. In the C  
> codegen, I would've thought that alloca() is easy. I am guessing that  
> the x86 codegen doesn't have a frame pointer or it should be easy  
> there too?
> 
> He did mention that the GC needs to walk the stack, but since you  
> have local variables there already, I presume there is a way to mark  
> some words as pointers.

Sure.

Two problems with allocating on the stack that I see are that the
runtime doesn't support variable sized stack frames or pointers into
stack frames.  The former seems pretty easy, but the latter seems hard
(and would have other costs).

A possibly even bigger problem is space safety.  How do you reclaim
the space used by a variable-sized object allocated on a stack frame
when the object is dead but the frame is live, and buried somewhere on
the stack?