benchmarks

Stephen Weeks MLton@sourcelight.com
Tue, 15 Aug 2000 11:51:48 -0700 (PDT)


> Compile Times: user+sys
> benchmark       c-codegen  x86-codegen  
> checksum             3.15         3.15
> count-graphs         9.60         7.34
> fib                  2.81         2.88
> knuth-bendix        15.70        10.63
> life                 7.30         5.75
> logic               47.36        30.60
> mlyacc             443.73       387.54
> mpuz                 4.65         4.03
> ratio-regions       17.32        13.03
> smith-normal-form  221.73        94.04
> tak                  2.88         2.90
> wc                   7.50         6.03
> 
> This seems to confirm that we can do better than gcc, particularly for
> large programs.  

Absolutely.  I think there's still a win speeding up your pass, which must
be taking a sizeable portion of compile time.

> I don't quite know what's up with smith-normal-form,
> especially considering that the x86-codegen's executable is just as fast.

All the time in smith-normal-form is in the gmp IntInf libraries.  MLton
optimization doesn't matter.

> Running Times: user+sys
> benchmark       c-codegen  x86-codegen   x86/c
> checksum            11.58        13.20    1.14
> count-graphs        18.87        21.20    1.12
> fib                 21.26        17.28    0.81
> knuth-bendix        37.32        39.27    1.05
> life               103.25       116.52    1.13
> logic               91.48        77.36    0.79
> mlyacc              41.10        34.70    0.84
> mpuz                76.60        90.85    1.19
> ratio-regions       41.40        39.79    0.96
> smith-normal-form    4.04         4.02    1.00
> tak                 48.46        37.86    0.78
> wc                  24.72        36.34    1.47
> 
> We're closing the gap here.

I'll say.  The only really disappointing one is life, because c-codegen is
already (slightly) slower than SML/NJ.

> collapsing if-s whose branches are the same label (yes, this really does
> occur after eliminating jumps to jumps)

Tell me the program and I'll look into why CPS optimization isn't getting it.

> There are two other simplifications that I would like to try:
> (1) currently, before a transfer, all pseudo-regs are flushed. 

all?  You are only flushing live ones, right? :-)

> Adding (1) shouldn't be difficult at all.  Adding (2) will be a little bit
> more difficult, probably not by the end of this week.

Cool.  I'd vote for doubles pretty soon too.  My current thinking is we shoot
for a release by January.  I think we're well on the way, but I'd like to live
with the backend for as long as possible.

> Finally, there are some other tweaks I'd like to try to avoid some memory
> references that I'm seeing.  I don't expect anything spectacular, but
> every little bit might help.

I've been thinking a little bit about all of the must-not alias information
that's available on the CPS and Machine ILs.  For example, we know that stack
slots and heap object must not alias.  We know that heap objects of different
types must not alias.  Would getting this information down to your backend help?
I'm thinking at least of the kinds of things Henry mentioned a while back in the 
FFT code.