x86 Update

Matthew Fluet fluet@CS.Cornell.EDU
Fri, 18 Aug 2000 17:08:24 -0400 (EDT)


Here are the latest performance numbers for x86-G1 MLton.  This uses
the may-alias information that I outlined earlier.

benchmark                        compile-time        
                               C      x86  x86/C   
checksum                    3.19     3.14   0.98
count-graphs                9.68     7.28   0.75
fib                         2.84     2.88   1.01
knuth-bendix               14.36     9.68   0.67
life                        7.19     5.79   0.81
logic                      43.69    24.62   0.56
mlyacc                    402.72   323.45   0.80
mpuz                        4.48     3.89   0.87
ratio-regions              16.73    12.31   0.74
smith-normal-form         220.72    88.72   0.40
tak                        2.89      2.88   1.00
wc                         7.29      6.01   0.82

benchmark                        executable-size
                               C      x86  x86/C   
checksum                   33319    32791   0.98      
count-graphs               54343    52207   0.96
fib                        33223    32559   0.98
knuth-bendix               82807    75623   0.91
life                       50887    47823   0.94
logic                     175935   181559   1.03
mlyacc                    627295   574463   0.92
mpuz                       38695    37719   0.97
ratio-regions              63511    67439   1.06
smith-normal-form         168974   161422   0.96
tak                        33247    32655   0.98
wc                         49543    47831   0.97

benchmark                        run-time
                               C      x86  x86/C
checksum                   11.63    12.40   1.07
count-graphs               18.95    18.99   1.00
fib                        22.24    16.50   0.74
knuth-bendix               37.50    33.87   0.90
life                      103.32    96.83   0.94
logic                      92.15    70.00   0.76
mlyacc                     41.18    30.46   0.74
mpuz                       76.54    74.63   0.98
ratio-regions              41.70    31.82   0.76
smith-normal-form           4.03     4.00   0.99
tak                        48.45    40.02   0.83
wc                         24.85    22.21   0.89


All in all, I think a successful conclusion.

Here's what's on my short-term/mid-term todo list:
1. Floating-point support
2. Full front-end support for x86-codegen;
   Steve and I spoke a little bit about this.  It might make sense for
   each back end to return either a list of object files or a list of
   source files to the front end, which would then either link them
   all together or compile and link.  There are a couple of design
   decisions to be made about how the -S, -C, -c options interact with
   the x86-backend, how multiple assembly-files should be handled,
   etc.
3. Inline frameSize and frameLayout pointers in the code-segment
   (this follows a suggestion from Suresh; since we have the return
   address for the frame we're interested in, we can place the
   relevant size and pointer at negative offsets from the return
   address.  This gives constant time lookup of these values, rather
   than the hash table technique I'm currently using.  The hooks are
   there in the simplifier to add pre-label assembly, although it
   would take some minor changes to the GC.  It probably wouldn't be a
   win on the benchmarks, but I imagine that it might pay off for a
   self-compile where GC's are occuring often.)
4. Using the liveness information to carry pseudo-regs across block
   boundaries in registers
5. Investigate jump tables for large switch transfers.
6. Consider additional peephole optimizations after register
   allocation; this could clean up some spurious register-register
   moves or multiple saves to the same address.

After getting 1 and 2 done, I think the backend will be robust enough
for us to live with for a few months.  I'd like to add 3 and 6,
because I don't think they will be particularly difficult.  On the
other hand, 4 and 5 are probably going to take some work, especially
when they are both in effect (i.e., coordinating all of the jump table
destinations to have the same pseudo-reg to register mappings), but I
could see them paying off.