x86 update -- raytrace benchmark

Tue, 17 Oct 2000 18:12:16 -0400 (EDT)

> Wow! I'm impressed with these numbers -- significantly
> better than the benchmarks we tried over the summer.
> Is it essentially all due to register allocation and
> better floating-point management?

I think register allocation contributes a lot to the improved performance.
Although, overall, I've only seen incremental improvement on the integer
benchmarks since the end of the summer.  As I said in my previous message,
copy-propagation seems promising, but at the same time needs some fine
tuning.  I've seen it both add and subtract 0.5s from a benchmark running
time.  Picking up from one of Henry's previous posts, I know that the
native backend would benefit from delaying tuple deconstruction until in
the correct branch of a conditional.  

On the other hand, lots of the floating point improvement arises from the
improved liveness information in the register allocator, which allows
the replacement of a register-register move with just associating the
register with a new memory location and use of the auto-popping
instructions.  For example, on mandlebrot, before this version of the
register allocator, I was seeing x86/C ratios of 2.2 and now I've got it
at 0.97.  Also, I'm a little more careful about floating-point stack
management.  At appropriate points in register allocation, I commit and/or
remove registers; i.e., ensure that the value in the register is written
to memory and/or disassociate the register from the memory location.  For
floating point, there are also trycommit and/or tryremove, which are only
triggered if the floating-point register is at the top of the stack;
hence, we save a lot of extraneous xchgs.