shrinker checked in

Stephen Weeks MLton@sourcelight.com
Wed, 14 Nov 2001 09:45:51 -0800


> I'm running benchmarks tonight to compare the CPS and SSA optimizers
> and will report on them tomorrow.

Here are the results comparing the 20011006 release and the internal
versions with ssa and cps optimization.  Compile times for the
internal version are of course horrible, because it is the NJ compiled
version.   The runtime ratios are all fine except for knuth-bendix.

Once that is fixed, I think we're ready to drop CPS.  Oh yeah, of
course we need to add enough to SSA.DirectExp and port the closure
converter first.

MLton0 -- /usr/local/bin/mlton  (20011006 release)
MLton1 -- mlton -optimize-ssa false
MLton2 -- mlton -optimize-ssa true
compile time
benchmark         MLton0 MLton1 MLton2
barnes-hut           2.5   16.8   23.2
checksum             0.7    2.8    3.3
count-graphs         1.9   11.3   14.2
DLXSimulator         4.2   28.7   41.4
fft                  1.3    7.6   10.8
fib                  0.7    2.4    2.6
hamlet              44.6  358.0  517.4
knuth-bendix         2.4   13.0   19.0
lexgen               5.2   35.7   50.5
life                 1.4    7.2    8.9
logic                6.0   36.5   27.1
mandelbrot           0.7    2.7    3.4
matrix-multiply      0.8    3.0    3.5
md5                  1.7    9.7   10.3
merge                0.8    2.9    3.3
mlyacc              20.9  142.3  196.0
mpuz                 1.0    4.4    5.0
nucleic              3.2   19.2   19.3
peek                 1.1    5.7    7.6
psdes-random         0.8    2.9    3.3
ratio-regions        2.6   17.1   27.5
ray                  3.4   22.1   29.8
raytrace             8.9   62.9   95.0
simple               6.4   39.2   56.3
smith-normal-form    7.5   38.3   42.9
tailfib              0.7    2.4    2.5
tak                  0.7    2.5    2.6
tensor               3.0   19.9   27.3
tsp                  1.6    9.4   12.9
tyan                 3.7   26.7   38.2
vector-concat        0.8    2.9    3.3
vector-rev           0.7    2.7    3.1
vliw                12.0   82.2  108.2
wc-input1            1.7    9.2   11.6
wc-scanStream        1.7    9.7   12.6
zebra                9.3   37.6   56.7
zern                 1.1    5.7    8.1

run time
benchmark         MLton0 MLton1 MLton2
barnes-hut           5.0    5.2    5.4
checksum             4.0    3.7    3.7
count-graphs         5.6    5.7    5.7
DLXSimulator        13.3   15.0   14.2
fft                  7.1    7.3    7.2
fib                  4.6    4.6    4.6
hamlet               9.1    9.3    9.2
knuth-bendix         8.4    8.4   17.7
lexgen              12.7   12.3   13.2
life                10.6   10.7    9.3
logic               26.2   26.2   22.0
mandelbrot           9.1    9.4    9.2
matrix-multiply      6.8    6.7    7.3
md5                  4.4    4.3    2.7
merge               38.7   39.5   39.3
mlyacc              10.7   10.7   10.8
mpuz                 6.3    6.3    5.8
nucleic              8.3    9.1    9.4
peek                 4.7    4.8    4.7
psdes-random         4.6    4.6    4.6
ratio-regions        9.0    9.2    9.2
ray                  5.0    5.2    5.0
raytrace             5.8    6.0    6.2
simple               6.8    7.0    6.9
smith-normal-form    1.1    1.1    1.1
tailfib             22.2   22.1   22.3
tak                 10.7   10.5   12.1
tensor               9.7    9.5    7.9
tsp                 11.9   11.8   12.0
tyan                19.8   20.0   20.5
vector-concat        7.6    8.0    7.8
vector-rev           3.3    3.1    3.1
vliw                 6.7    6.9    7.0
wc-input1            2.8    2.6    2.7
wc-scanStream        4.5    3.8    3.8
zebra                2.6    2.7    2.9
zern                37.7   38.4   38.8

run time ratio
benchmark         MLton1 MLton2
barnes-hut           1.1    1.1
checksum             0.9    0.9
count-graphs         1.0    1.0
DLXSimulator         1.1    1.1
fft                  1.0    1.0
fib                  1.0    1.0
hamlet               1.0    1.0
knuth-bendix         1.0    2.1
lexgen               1.0    1.0
life                 1.0    0.9
logic                1.0    0.8
mandelbrot           1.0    1.0
matrix-multiply      1.0    1.1
md5                  1.0    0.6
merge                1.0    1.0
mlyacc               1.0    1.0
mpuz                 1.0    0.9
nucleic              1.1    1.1
peek                 1.0    1.0
psdes-random         1.0    1.0
ratio-regions        1.0    1.0
ray                  1.0    1.0
raytrace             1.0    1.1
simple               1.0    1.0
smith-normal-form    1.0    1.0
tailfib              1.0    1.0
tak                  1.0    1.1
tensor               1.0    0.8
tsp                  1.0    1.0
tyan                 1.0    1.0
vector-concat        1.1    1.0
vector-rev           1.0    1.0
vliw                 1.0    1.0
wc-input1            0.9    0.9
wc-scanStream        0.9    0.8
zebra                1.0    1.1
zern                 1.0    1.0

size
benchmark          MLton0    MLton1    MLton2
barnes-hut         59,793    64,809    71,929
checksum           20,917    21,117    20,973
count-graphs       40,461    44,925    45,597
DLXSimulator       78,237    90,389   102,277
fft                29,441    30,665    33,641
fib                20,909    21,109    20,989
hamlet            945,328 1,289,200 1,643,672
knuth-bendix       59,710    66,878    83,230
lexgen            122,061   148,829   161,517
life               38,565    42,957    41,821
logic             147,501   253,949    92,301
mandelbrot         20,901    21,077    20,965
matrix-multiply    21,309    21,485    21,517
md5                34,038    36,390    32,278
merge              21,885    22,309    22,277
mlyacc            409,501   528,397   586,141
mpuz               26,645    27,773    27,085
nucleic            60,653    63,517    63,613
peek               28,542    31,686    33,286
psdes-random       21,901    22,165    22,069
ratio-regions      41,893    45,109    60,709
ray                66,688    80,976    88,384
raytrace          159,381   199,725   235,949
simple            146,913   182,497   190,769
smith-normal-form 141,053   145,445   151,797
tailfib            20,637    20,781    20,637
tak                20,957    21,133    21,133
tensor             62,516    68,332    72,892
tsp                33,774    36,126    42,670
tyan               77,054    91,326   106,334
vector-concat      21,557    21,789    21,749
vector-rev         21,389    21,613    21,541
vliw              261,417   318,009   360,201
wc-input1          39,222    46,030    48,110
wc-scanStream      41,614    48,590    50,806
zebra             103,502   130,086   144,326
zern               26,504    27,328    30,400

> 
> Here's the log.
> 
> --------------------------------------------------------------------------------
> The SSA shrinker now works.  There are also a lot of other changes to the SSA IL
> and corresponding changes to the SSA optimizations and backend.  Here's an
> overview.
> 
> * There is a new switch, -optimize-ssa {true|false}, which determines whether
>   the SSA or CPS optimizer is used.  They are no longer run in sequence.
> 
> * Implement handlers has been moved from the SSA simplify pipeline to backend,
>   and also implements other invariants that the backend needs.
> 
> * SSA IL changes
>   - changed Raise transfer back to taking Var.t vector instead of Var.t.
>     This simplified some of the optimizations and made it easier since it
>     can often be treated exactly as return.
>   - Change the return type of functions from Type.t vector to
>     Type.t vector option.  NONE indicates that a function does not return.
>   - Added mayRaise: bool to functions to record whether or not a function may
>     raise.
>   - Changed returns to the following datatype
> 
> 	    datatype t =
> 	       Dead
> 	     | HandleOnly
> 	     | NonTail of {cont: Label.t, handler: Handler.t}
> 	     | Tail
> 
>     These correspond to 6 of the possible 9 combinations of continuation and
>     handler each being one of {None, Caller, Some l}.  None means that it
>     doesn't matter what the continuation (handler) is since the caller never
>     returns (raises).  Caller means to keep the continuation (handler) the same
>     as in the caller.  Some l means a nontail call in the case of continuations
>     and an installed handler in the case of handlers.
> 
>     3 of the 9 possibilities are disallowed, and the correspondence is as below.
> 
> 	Cont	Handler		equivalent
> 	------	-------		---------------------------------------
> 	None	None		Dead
> 	None	Caller		HandleOnly
> 	None 	Some h		*disallowed*
> 	Caller	None		*disallowed*
> 	Caller	Caller		Tail
> 	Caller	Some h		*disallowed*
> 	Some l	None		Nontail {cont = l, handler = None}
> 	Some l	Caller		Nontail {cont = l, handler = Caller}
> 	Some l	Some h		Nontail {cont = l, handler = Handle l}
> 
>     We could have allowed the (None, Some h) and (Caller, Some h) cases, and
>     put some code in the backend to generate stubs, since if there is a handler
>     there must be some continuation.  But I decided it was easier to just rule
>     them out, essentially meaning that remove-unused, or any other optimization
>     pass, needs to make stubs itself.
> 
> The matchup between returns and what is now expressed in functions types
> (mayRaise and mayReturn) is checked in the type checker.
> 
> The checked in compiler passes all regressions, but is not able to self-compile,
> or even be compiled, at least with 500m.  The bloat due to having all of the CPS
> and SSA optimization passes is too much.  Hopefully we will soon decide to pitch
> the CPS and will be able to self-compile again.