x86 performance

Matthew Fluet fluet@research.nj.nec.com
Wed, 9 Aug 2000 12:57:59 -0400 (EDT)

>     The  C  compiler  uses  leal  as cheap 3-address arithmetic while the x86
>         version uses a move  followed  by  an  add  constant.   Note,  the  C
>         compiler  way is 1 instruction, and it is only 3 bytes long.  The x86
>         version is  2  instructions  and  5  bytes  long.   Also  C  code  is
>         absolutely  filled  with  loads  and  stores  at short offsets from a
>         register (either because the register is a pointer to a struct or the
>         stack)  so  I  am  sure that this addressing mode is very fast.  This
>         could be a big difference.

Turns out this is even trickier than one might first imagine.  It was very
easy to set up the limit check points to use leal instead of movl/addl
when the requested bytes is a non-zero constant.  (On a zero-constant, we
just compare the frontier and the limit with no intermediate calculation.
Which raises an interesting point -- in that non-allocating loop, there is
a check for 24 additional bytes at each entry; probably for the
continuation where I print out the result, but it looks like it got pushed
too far back into the loop.)  Now the tops of the loops look like:

x86-codegen:          spy-ed                                .s
0x804be60:      leal   0x18(%esp,1),%esi             leal (24*1)(%esp),%esi
0x804be64:      cmpl   0x8054288,%esi                cmpl (gcState+8),%esi

0x804ccd4:      leal   0x18(%esi),%eax               leal 24(%esi),%eax
0x804ccd7:      cmpl   0x8053888,%eax                cmpl gcState+8,%eax

Notice that the x86 leal is 4 bytes and the c-codegen leal is 3 bytes.  It
appears (although I can't find this in the documentation for either x86
addressing or GNU assembler), that using %esp automatically incurs a scale
value and an additional byte of instruction.  Sort of annoying,
particularly since it's nice to be able to use %esp as a general purpose