x86 update

Matthew Fluet Matthew Fluet <fluet@CS.Cornell.EDU>
Tue, 17 Oct 2000 10:13:50 -0400 (EDT)


> I think it would be a good idea.  It would be nice to
> brainstorm a bit about what the focus should be, but I
> think the effort would be well worth it.
> 
>    -- Suresh

I agree that picking a particular focus would be necessary.  One thing
that I've been thinking about recently is the degree to which we can
exploit more precise mayAlias information.  Also, there is the importance
of register allocation.  For example, look at the following:

[fluet@lennon tests]$ grep "%eax" test_s0001.s | wc -l
    586
[fluet@lennon tests]$ grep "%ebx" test_s0001.s | wc -l
    310
[fluet@lennon tests]$ grep "%ecx" test_s0001.s | wc -l
    116
[fluet@lennon tests]$ grep "%edx" test_s0001.s | wc -l
     48
[fluet@lennon tests]$ grep "%edi" test_s0001.s | wc -l
    742
[fluet@lennon tests]$ grep "%esi" test_s0001.s | wc -l
    305
[fluet@lennon tests]$ grep "%ebp" test_s0001.s | wc -l
     20
[fluet@lennon tests]$ grep "%esp" test_s0001.s | wc -l
     89
[fluet@lennon tests]$ grep "local" test_s0001.s | wc -l
     56

The test program was just a smart-fib function and a print of fib(20).
Register %edi caches gcState.stackTop and %esi caches gcState.frontier, so
it isn't surprising that they have a high number of uses.  Register %esp
would probably have < 20 uses except that it is always used for ffi calls.
The remainder have this property:

%eax > %ebx > %ecx > %edx > %ebp ~> %esp

To an extent, this is probably due to the nature of my register allocator.
In particular, the list of registers is ordered [%eax,%ebx,...,%ebp,%esp]
so choosing a register tends towards %eax when it is available.  Having
tracked the last uses and defs, I can mark registers available right away,
so %eax is often reused.

The same distribution appears in larger benchmarks (for example, the
raytracer)

[fluet@lennon tests]$ grep "%eax" raytrace_s0001.s | wc -l
  11320
[fluet@lennon tests]$ grep "%ebx" raytrace_s0001.s | wc -l
   5554
[fluet@lennon tests]$ grep "%ecx" raytrace_s0001.s | wc -l
   1516
[fluet@lennon tests]$ grep "%edx" raytrace_s0001.s | wc -l
    436
[fluet@lennon tests]$ grep "%edi" raytrace_s0001.s | wc -l
  11512
[fluet@lennon tests]$ grep "%esi" raytrace_s0001.s | wc -l
  10367
[fluet@lennon tests]$ grep "%ebp" raytrace_s0001.s | wc -l
    178
[fluet@lennon tests]$ grep "%esp" raytrace_s0001.s | wc -l
   1008
[fluet@lennon tests]$ grep "local" raytrace_s0001.s | wc -l
   1937

I don't know if any big conclusions can be drawn, although I'm tempted to
remark that for all of our complaining about the sparcity of registers on
the x86, register pressure may not be that big an issue (at least for
MLton generated code).  If there was a lot of register pressure, I would
expect a more even distribution of uses amongst the registers.  The count
of "local" approximates the number of times I needed to spill a
pseudo-register, although it is an overapproximation, because I don't
carry pseudo-regs in registers across jumps, which gcc will try to do. 

For comparison, here are the gcc distributions on the same two programs:

[fluet@lennon tests]$ grep "%eax" test.s | wc -l
    659
[fluet@lennon tests]$ grep "%ebx" test.s | wc -l
    783
[fluet@lennon tests]$ grep "%ecx" test.s | wc -l
     92
[fluet@lennon tests]$ grep "%edx" test.s | wc -l
    217
[fluet@lennon tests]$ grep "%edi" test.s | wc -l
     60
[fluet@lennon tests]$ grep "%esi" test.s | wc -l
    282
[fluet@lennon tests]$ grep "%ebp" test.s | wc -l
     23
[fluet@lennon tests]$ grep "%esp" test.s | wc -l
    120

[fluet@lennon tests]$ grep "%eax" raytrace.s | wc -l
   9152
[fluet@lennon tests]$ grep "%ebx" raytrace.s | wc -l
   6117
[fluet@lennon tests]$ grep "%ecx" raytrace.s | wc -l
   1614
[fluet@lennon tests]$ grep "%edx" raytrace.s | wc -l
   4363
[fluet@lennon tests]$ grep "%edi" raytrace.s | wc -l
   2629
[fluet@lennon tests]$ grep "%esi" raytrace.s | wc -l
   6883
[fluet@lennon tests]$ grep "%ebp" raytrace.s | wc -l
    860
[fluet@lennon tests]$ grep "%esp" raytrace.s | wc -l
   2454

A little more even, athough gcc uses different registers for different
roles.

Just some thoughts; this was primarily prompted by my consideration of
doing load-hoisting after the register-allocation phase.  However, looking
at some places where I thought it would help, I discovered that it
wouldn't really be possible without some analysis to rename registers --
because %eax was just used too often to be able to simply move a
movl (address),reg
up a few instructions.