benchmarks

Matthew Fluet fluet@CS.Cornell.EDU
Thu, 8 Nov 2001 18:18:34 -0500 (EST)


Latest benchmarks with SSA IL.
MLton0 is 20011006 release
MLton1 is full CPS simplify and SSA simplify
MLton2 is just SSA simplify


MLton2 is still pretty bad; more than 281X slowdown on tailfib before
running out of memory!!

MLton1 generally has a little bit of a slowdown; I think this makes sense
-- second rounds of flatten and local-flatten without the shrinker just
make redundant tuple allocation and selects; gazillions of gotos to gotos
(the x86-codegen cleans up some of those, in the sense that there the jmp
instruction is eliminated, but shuffling values in stack slots are still
going on).  Strangely, there are some decent speedups in ratio-regions and
wc-input1, a 2X speed up in matrix-multiply, and an amazing 9X speedup
in md5!


MLton0 -- mlton-stable 
MLton1 -- mlton 
MLton2 -- mlton -drop-pass removeUnused1CPS -drop-pass leafInlineCPS
-drop-pass raiseToJump1CPS -drop-pass contify1CPS -drop-pass
localFlatten1CPS -drop-pass constantPropagationCPS -drop-pass uselessCPS
-drop-pass removeUnused2CPS -drop-pass unusedArgs1CPS -drop-pass
simplifyTypesCPS -drop-pass polyEqualCPS -drop-pass contify2CPS -drop-pass
inlineCPS -drop-pass localFlatten2CPS -drop-pass removeUnused3CPS
-drop-pass raiseToJump2CPS -drop-pass contify3CPS -drop-pass
unusedArgs2CPS -drop-pass introduceLoopsCPS -drop-pass loopInvariantCPS
-drop-pass flattenCPS -drop-pass localFlatten3CPS -drop-pass
commonSubexpCPS -drop-pass commonBlockCPS -drop-pass redundantTestsCPS
-drop-pass redundantCPS -drop-pass unusedArgs3CPS -drop-pass
removeUnused4CPS
compile time
benchmark         MLton0 MLton1 MLton2
barnes-hut           2.2    2.6    7.0
checksum             0.6    0.6    1.2
count-graphs         1.6    2.0    4.5
DLXSimulator         3.8    4.8   13.3
fft                  1.1    1.3    3.3
fib                  0.7    0.6    0.9
hamlet              41.5   77.8  128.8
knuth-bendix         2.1    2.7    5.0
lexgen               4.7    6.9   13.5
life                 1.2    1.6    2.4
logic                5.5   14.9   19.7
mandelbrot           0.6    0.6    1.1
matrix-multiply      0.6    0.7    1.3
md5                  1.5    1.4    3.2
merge                0.8    0.7    1.1
mlyacc              19.1   29.5   41.7
mpuz                 0.8    1.0    1.6
nucleic              2.9    3.2    4.9
peek                 0.9    1.2    2.5
psdes-random         0.6    0.7    1.1
ratio-regions        2.4    3.0    7.2
ray                  3.0    3.9    9.7
raytrace             8.2   10.5   27.8
simple               5.8    9.4   18.5
smith-normal-form    7.4    8.0   11.1
tailfib              0.6    0.6    1.0
tak                  0.6    0.6    0.9
tensor               2.6    3.5    7.6
tsp                  1.4    1.9    3.8
tyan                 3.4    5.0   11.2
vector-concat        0.6    0.6    1.2
vector-rev           0.6    0.6    1.3
vliw                10.9   19.8   33.9
wc-input1            1.5    1.9    3.6
wc-scanStream        1.6    2.1    3.9
zebra                8.2    8.1    9.7
zern                 1.0    1.1    2.6
run time
benchmark         MLton0 MLton1 MLton2
barnes-hut           3.9    4.8   55.2
checksum             3.2    3.1      *  -- Out of memory
count-graphs         4.9    4.7   62.3
DLXSimulator        15.1   15.7   77.7
fft                  7.7    9.1  119.3
fib                  3.4    3.4    5.1
hamlet               8.1   14.8   87.2
knuth-bendix         6.5   10.5   59.4
lexgen              10.5   12.6  237.9
life                 7.8   18.3   44.1
logic               25.7   34.8   58.4
mandelbrot           6.7    7.0  325.1
matrix-multiply      5.2    2.8   85.9
md5                  3.3    0.4  167.2
merge               48.9   48.7  178.8
mlyacc               9.4   13.1   83.2
mpuz                 4.6    6.3   32.8
nucleic              6.8   12.6  154.4
peek                 3.4    3.6      *  -- Out of memory
psdes-random         3.4    3.4      *  -- Out of memory
ratio-regions        8.2    7.4  331.4
ray                  3.8    4.0   68.9
raytrace             4.6    6.6  178.6
simple               6.0    7.0   63.2
smith-normal-form    0.9    0.9    1.3
tailfib             16.3   16.3      *  -- Out of memory (> 4500.0)
tak                  7.9    8.9   59.0
tensor               7.1    7.0   42.2
tsp                  9.0    8.9  215.7
tyan                19.5   26.3  117.5
vector-concat        5.7    6.4  255.3
vector-rev           4.1    4.3  150.3
vliw                 6.2    7.6   44.2
wc-input1            2.2    2.1  108.3
wc-scanStream        3.6    3.4   96.0
zebra                2.2    5.5   14.8
zern                33.9   33.0  917.3
run time ratio
benchmark         MLton1 MLton2
barnes-hut           1.2   14.0
checksum             1.0      *
count-graphs         1.0   12.8
DLXSimulator         1.0    5.2
fft                  1.2   15.6
fib                  1.0    1.5
hamlet               1.8   10.8
knuth-bendix         1.6    9.1
lexgen               1.2   22.7
life                 2.3    5.6
logic                1.4    2.3
mandelbrot           1.0   48.6
matrix-multiply      0.5   16.6
md5                  0.1   50.7
merge                1.0    3.7
mlyacc               1.4    8.8
mpuz                 1.4    7.1
nucleic              1.9   22.8
peek                 1.0      *
psdes-random         1.0      *
ratio-regions        0.9   40.3
ray                  1.0   18.0
raytrace             1.5   39.2
simple               1.2   10.5
smith-normal-form    1.0    1.4
tailfib              1.0      *  (> 281.3)
tak                  1.1    7.5
tensor               1.0    5.9
tsp                  1.0   23.9
tyan                 1.4    6.0
vector-concat        1.1   45.1
vector-rev           1.0   36.7
vliw                 1.2    7.1
wc-input1            0.9   48.6
wc-scanStream        1.0   26.8
zebra                2.5    6.9
zern                 1.0   27.1
size
benchmark          MLton0    MLton1    MLton2
barnes-hut         59,793    66,272   208,240
checksum           20,917    21,576    36,312
count-graphs       40,461    44,816   127,288
DLXSimulator       78,237    92,432   342,184
fft                29,441    31,564    91,660
fib                20,909    21,568    31,824
hamlet            945,328 1,941,459 3,149,723
knuth-bendix       59,710    70,881   152,049
lexgen            122,061   173,224   356,888
life               38,565    48,536    70,712
logic             147,501   349,384   642,936
mandelbrot         20,901    21,536    36,304
matrix-multiply    21,309    21,760    40,792
md5                34,038    30,249    92,209
merge              21,885    22,784    35,376
mlyacc            409,501   684,472 1,213,080
mpuz               26,645    29,560    50,328
nucleic            60,653    65,680   131,968
peek               28,542    32,345    71,169
psdes-random       21,901    22,640    38,560
ratio-regions      41,893    51,528   192,608
ray                66,688    83,851   268,899
raytrace          159,381   216,552   851,304
simple            146,913   232,164   526,948
smith-normal-form 141,053   146,348   248,140
tailfib            20,637    21,240    32,144
tak                20,957    21,680    32,208
tensor             62,516    74,163   184,987
tsp                33,774    37,481   118,313
tyan               77,054   109,513   313,577
vector-concat      21,557    22,264    39,216
vector-rev         21,389    22,056    40,456
vliw              261,417   496,868   988,180
wc-input1          39,222    47,585   100,529
wc-scanStream      41,614    51,297   112,049
zebra             103,502   195,489   238,961
zern               26,504    28,139    72,155