x86 performance

Matthew Fluet fluet@research.nj.nec.com
Mon, 7 Aug 2000 16:54:53 -0400 (EDT)

Here's a comparision of the c-codegen and the x86-codegen that has me
a little puzzled about where I'm losing performance.  It's a very
simple, non-allocating loop, so I would have thought the two codegens
would be very similar.

The input is the standard even-odd recursion:
fun even 0 = true
  | even n = odd  (n - 1)
and odd  1 = true
  | odd  n = even (n - 1)

Letting that loop for 750 million iterations I got the following results:

          C-codegen                            x86-codegen
          time: 16.99                          time: 22.65

And here's what the spy program showed for the loop.  I've filled in
reasonable labels and marked conditional jumps which are not taken.

          gcState.frontier->%esi               gcState.frontier->%esp
          gcState.stackTop->%ebx               gcState.stackTop->%ebp

even:     leal   0x18(%esi),%eax               movl   %esp,%esi          1
                                               addl   $0x18,%esi
          cmpl   0x8053888,%eax                cmpl   0x8054288,%esi
          jbe    0x804cd47                     jle    0x804bef8
skip_GC:  cmpl   $0x0,0x18(%ebx)               cmp    $0x0,0x18(%ebp)
          jne    0x804cd94                     *je    0x804bf28
                                               jmp    0x804bf00
even_n:   movl   0x18(%ebx),%ebp               movl   0x18(%ebp),%esi
          decl   %ebp                          subl   $0x1,%esi
          cmpl   $0x1,%ebp                     cmpl   $0x1,%esi
                                               movl   %esi,0x80541c4     2
          *je    0x804cd50                     *je    0x804bf28
                                               jmp    0x804bf14
odd_n:                                         movl   0x80541c4,%esi     3
          decl   %ebp                          subl   $0x1,%esi              
          movl   %ebp,0x18(%ebx)               movl   %esi,%edi          4
                                               movl   %edi,0x18(%ebp)
          jmp    0x804ccd4                     jmp    0x804be9c

1. calculate gcState.frontier + 24
2. %esi -> RI(1)
3. RI(1) -> %esi
4. SI(24) = RI(3)

Does the time difference between the programs seem reasonable?  The
essential differences seem to be the two unconditional jumps and the
save and restore of RI(1).