[MLton-user] Some numbers on SML/NJ & MLton runs and also a few questions

Matthew Fluet fluet@cs.cornell.edu
Tue, 7 Oct 2003 08:09:30 -0400 (EDT)


> The MLton runs are compiled with
>     mlton -static true -verbose 2 -detect-overflow false -safe false  \
>       -inline 100000 weeks.cm
> (But I don't think any of these fancy options really buy much.)

The overflow and safe flags buy you a little, but probably not that much.
The inline flag sets the threshold over 300X the default; that is probably
trading a bit of space for speed.  I can't imagine SML/NJ's cross-module
inliner moving anything of that size.

> I am wondering how good is MLton's heap allocation? SML/NJ can
> alloc&initialise an n-word block in about n+2 insns. How about
> MLton?

Ideally, n+3 instructions:
  *(frontier) = header
  dst = frontier + objectHeaderSize
  dst[0] = word0
  ...
  dst[n] = wordn
  frontier += size
The frontier is almost always in a register (but the codegen is free to
spill it).

> I guess the issue is that SML/NJ's implementation of the "stack" algo
> doesn't actually use a stack -- it allocates the proc frames in the heap.
> Total heap allocation is *greater* than if we iteratively allocated
> temporary conses (as each proc frame is probably bigger than a 2-word
> cons cell), *plus* you have to load target addresses in from memory when
> you return up through the "stack."
>
> I think I sort of understand why the iterative algo does better in SML/NJ.
> But I *don't* understand how MLton makes the recursive algo work so well.

Stack frames are pretty cheap in MLton.  The stack as a whole is heap
allocated, but individual frames are pushed and popped by adjusting the
stack top pointer (with stack limit checks on pushes).  But I don't know
if that explains why the recursive algo works so much better than the
iterative algorithm.

-Matthew