self-compile

Thu, 17 Aug 2000 17:02:13 -0400 (EDT)

> In other news, with some other optimizations from yesterday, the
> x86-codegen is winning in all benchmarks except checksum.  (Yes, this
> includes life and wc.)  I'll send out some hard numbers later today -- I'm
> trying to track down why I lost some performance on fib and tak, which
> were previously the best improvments.  My guess is that it is an artifact
> of only committing live pseudo-regs down their respective branches.  I
> lift out all pseudo-regs that are live down both, and then make the fall
> through case be the branch with the most remaining live pseudo-regs.  This
> might reverse some branches and screw up the branch prediction.

I tracked down the source of that performance drop.  Nothing to do with
the saving of live pseudo-regs.  It had to do with the translation of the
MachineOutput.Move statement.  I've gone back and forth on whether or not
that move should force the destination to a register or to an address at
register allocation.  For a while I thought register, but then I noticed a
lot of SX()'s as the destinations, so I switched it to address.  Yesterday
I noticed a lot of RX()'s as the destinations so I switched it back to
register.  Turns out the best solution is to make the decision on the type
of operand -- register for RX()'s and address for everything else.  That
regained the time on fib and tak and also they also benefitted from the
new peephole optimizations.

The peephole optimization that I think got the big win was the following

RI(0) = Int_add(SI(4), X)
SI(4) = RI(0)

 |
 | translates to
 V

movl SI(4), RI(0)
addl X, RI(0)
movl RI(0), SI(4)

 |
 | the new peephole optimization (also works similarly on unary and
 |  shift/rotate instructions); not dependent on the equality of the first 
 |  movl's source and the second movl's destination, but this is a common
 |  case.
 V

movl SI(4), SI(4)
addl X, SI(4)

 |
 | the self-move elimination optimization (now more important than it was
 |  before)
 V

addl X, SI(4)

(And I'll switch addl to incl if X is 1)

This really helps loop index variables and probably also tail-recursive
functions.