performance page

Matthew Fluet Matthew Fluet <fluet@CS.Cornell.EDU>
Tue, 9 Oct 2001 18:31:31 -0400 (EDT)


> I  just did a quick comparison of fib in C (which always passes all arguments
> on the stack, but returns results in a register) with  MLton,  and  MLton  is
> still  20%  slower  than  C (and this with overflow checking turned off).  In
> this case the extra overhead is two extra compares  of  registers  (%ebp  and
> %esp)  to  the  values  in  memory  locations.  One of these is for the heap.
> Note, there is no allocation in the loop.  Is the other some interrupt check?

O.k.  The comparisons I see are:

fib_0:
statementLimitCheckLoop_1:
	cmpl ((gcState+48)+(0*4)),%ebp
	jae doGC_5
checkFrontier_1:
	cmpl ((gcState+8)+(0*4)),%esp
	jnbe doGC_4
skipGC_1:

The first one is the stack limit check.  The second one is the limit
check.  We need the stack limit check; we (may) need the limit check for
interrupt/thread handling -- it's for whatever reason we decided that the
LimitCheck macro in ccodegen.h needs to put the invocation of GC_gc inside
a 
do {
  // invoke GC_gc
} while (frontier + (b) > gcState.limit)

I think the issue is the following:

...
thread A checks for 100 bytes; fails and invokes GC_gc
GC_gc invoked; fiddles with heap; gets 200 free bytes; switches to thread B
thread B consumes 175 bytes;
thread B checks for 50 bytes; fails and invokes GC_gc
GC_gc invoked; fiddles with heap; gets 75 free bytes; switches to thread A
thread A continues, thinking it has 100 bytes, when there are really only 75

This example requires threads.  There is something in insert-limit checks
that avoids putting in limit checks in loops when there aren't threads.
There isn't anything in either codegen that produces different types of
limit checks for depending on whether or not there are threads.

Anyways, looking at limit-checks should probably go on the "SSA todo"
list; I think that insert limit-checks will sometimes hoist limit checks
into non-allocating loops, which is bad, for the comparisions above.  It
may also make sense to make limit-check types more fine-grained -- i.e.,
decide when we really need the checkFrontier loop, as above.