[MLton] Constant Folding of FP Operations

Matthew Fluet fluet at tti-c.org
Sun Jun 1 18:03:26 PDT 2008


On Sun, 1 Jun 2008, Matthew Fluet wrote:
> On Fri, 23 May 2008, Vesa Karvonen wrote:
>>  Attached is an experimental patch that improves MLton's constant folding
>>  of floating point operations.  The problem with constant folding FP
>>  operations in SML is that the FP ops are subject to rounding mode
>>  settings:
>>
>>   http://www.standardml.org/Basis/ieee-float.html#SIG:IEEE_REAL.setRoundingMode:VAL
>>
>>  The workaround used in the patch is to evaluate the operations in all
>>  rounding modes (actually in only TO_NEGINF and TO_POSINF) and check that
>>  the results agree.  This ensures that constant folding is correct in all
>>  rounding modes.
>
> Seems like a good optimization, and a sound one.  Did you run the benchmarks 
> and observe any speedups?

On amd64-linux, I get the following:

MLton0 -- ~/devel/mlton/mlton.svn.trunk/build/bin/mlton -codegen amd64
MLton1 -- ~/devel/mlton/mlton.svn.trunk/build/bin/mlton -codegen c
MLton2 -- ~/devel/mlton/mlton.svn.trunk/build.real-cf/bin/mlton -codegen amd64
MLton3 -- ~/devel/mlton/mlton.svn.trunk/build.real-cf/bin/mlton -codegen c
run time ratio
benchmark         MLton0 MLton1 MLton2 MLton3
barnes-hut          1.00   1.04   0.84   0.88
boyer               1.00   1.12   1.03   1.12
checksum            1.00   5.72   1.01   5.72
count-graphs        1.00   0.92   1.01   0.92
DLXSimulator        1.00   1.08   0.98   1.06
fft                 1.00   1.05   1.00   1.06
fib                 1.00   1.40   1.04   1.40
flat-array          1.00   1.60   0.99   1.61
hamlet              1.00   1.65   1.00   1.68
imp-for             1.00   1.46   0.99   1.47
knuth-bendix        1.00   1.36   1.00   1.36
lexgen              1.00   1.05   1.00   1.04
life                1.00   1.00   1.00   1.00
logic               1.00   1.08   1.00   1.09
mandelbrot          1.00   1.36   0.96   1.35
matrix-multiply     1.00   0.91   1.00   0.91
md5                 1.00   6.94   1.00   6.94
merge               1.00   1.05   1.00   1.05
mlyacc              1.00   1.08   1.00   1.08
model-elimination   1.00   1.25   0.99   1.26
mpuz                1.00   1.93   1.00   1.93
nucleic             1.00   1.00   0.99   0.99
output1             1.00   1.23   1.00   1.24
peek                1.00   0.91   1.05   0.92
psdes-random        1.00   0.83   0.99   0.83
ratio-regions       1.00   1.21   1.00   1.33
ray                 1.00   1.06   0.94   1.05
raytrace            1.00   1.13   0.96   1.05
simple              1.00   1.40   1.06   1.42
smith-normal-form   1.00   1.01   1.01   1.01
tailfib             1.00   1.87   1.06   1.87
tak                 1.00   1.14   1.00   1.13
tensor              1.00   1.57   1.01   1.57
tsp                 1.00   1.01   1.00   1.01
tyan                1.00   1.18   1.02   1.18
vector-concat       1.00   1.08   1.00   1.07
vector-rev          1.00   1.40   1.00   1.47
vliw                1.00   1.35   1.01   1.38
wc-input1           1.00   1.00   1.00   1.00
wc-scanStream       1.00   1.21   0.99   1.21
zebra               1.00   0.79   1.00   0.79
zern                1.00   1.62   1.00   1.64

This seems to only show a non-noise speedup on barnes-hut (and maybe ray).

I'd really like to know a good way of cutting down the noise in the 
benchmarks; consider that fib and tailfib (which use no FP operations, and 
so yield identical assembly code) show a 1.04 and 1.06 slowdown, 
respectively.

The rest of the benchmark data follows:

size
benchmark            MLton0    MLton1    MLton2    MLton3
barnes-hut          165,614   170,237   167,663   172,286
boyer               218,529   220,105   218,529   220,105
checksum             98,257   105,473    98,257   105,473
count-graphs        124,401   127,073   124,401   127,073
DLXSimulator        201,324   210,004   201,324   210,004
fft                 120,687   127,772   120,655   127,708
fib                  98,225    97,321    98,225    97,321
flat-array           97,681    96,913    97,681    96,913
hamlet            1,509,177 1,542,601 1,508,809 1,545,585
imp-for              97,969    97,105    97,969    97,105
knuth-bendix        177,004   186,044   177,004   186,044
lexgen              291,003   318,683   291,003   318,683
life                122,257   118,777   122,257   118,777
logic               182,497   182,665   182,497   182,665
mandelbrot           97,857   100,545    97,841   100,529
matrix-multiply      99,969   102,225    99,969   102,225
md5                 132,252   142,588   132,252   142,588
merge                99,601   106,417    99,601   106,417
mlyacc              663,259   704,187   663,259   704,187
model-elimination   865,986   953,682   866,002   953,666
mpuz                104,241   112,273   104,241   112,273
nucleic             273,760   256,196   273,760   256,196
output1             141,056   148,688   141,056   148,688
peek                137,804   143,212   137,804   143,212
psdes-random        101,169    99,665   101,169    99,665
ratio-regions       125,649   135,905   125,649   135,905
ray                 249,400   257,839   248,728   258,223
raytrace            378,114   397,078   374,530   392,054
simple              347,593   377,012   346,777   376,548
smith-normal-form   276,332   292,884   276,332   292,884
tailfib              97,713    96,897    97,713    96,897
tak                  98,273    97,273    98,273    97,273
tensor              167,507   174,971   167,507   174,971
tsp                 144,827   151,658   144,363   151,354
tyan                217,644   229,268   217,644   229,268
vector-concat        99,617    98,457    99,617    98,457
vector-rev           99,217    98,281    99,217    98,281
vliw                528,426   616,042   526,874   614,490
wc-input1           164,522   169,826   164,522   169,826
wc-scanStream       175,258   184,914   175,258   184,914
zebra               217,196   219,948   217,196   219,948
zern                135,302   140,707   135,318   140,403
compile time
benchmark         MLton0 MLton1 MLton2 MLton3
barnes-hut          9.78  12.15  10.89  12.87
boyer              10.80  23.17  10.32  22.40
checksum            7.70   7.94   7.66   8.25
count-graphs        8.29   9.43   8.93   9.58
DLXSimulator       10.81  15.09  10.92  15.61
fft                 8.19   9.04   8.53   9.21
fib                 7.74   7.89   8.22   8.56
flat-array          7.56   7.58   8.18   8.13
hamlet             44.43 111.25  48.09 112.88
imp-for             7.75   7.79   7.71   7.96
knuth-bendix        9.51  13.01   9.86  13.41
lexgen             12.25  18.86  12.61  19.02
life                8.31   9.84   8.27   9.29
logic              10.09  13.89   9.90  14.41
mandelbrot          7.77   7.80   8.04   8.39
matrix-multiply     8.33   8.52   7.82   8.13
md5                 8.64  10.02   8.74  10.50
merge               7.60   7.82   7.86   8.20
mlyacc             27.15  44.67  26.66  44.90
model-elimination  25.60  57.05  25.57  57.73
mpuz                7.92   8.23   8.05   8.34
nucleic            11.75  23.32  11.74  23.16
output1             8.57  10.34   8.72  10.81
peek                8.45  10.05   8.94  10.31
psdes-random        7.68   8.16   8.04   8.34
ratio-regions       8.97  10.33   9.42  10.98
ray                11.82  17.10  11.64  17.38
raytrace           15.24  26.74  14.96  26.36
simple             13.48  22.12  13.04  21.70
smith-normal-form  11.55  54.59  12.06  52.01
tailfib             7.63   7.94   7.85   8.10
tak                 7.64   8.04   7.88   8.17
tensor             10.38  13.41  10.48  13.56
tsp                 9.25  10.70   9.55  11.00
tyan               10.77  15.96  10.89  16.04
vector-concat       7.44   7.76   7.92   8.20
vector-rev          7.59   7.80   8.10   7.89
vliw               19.42  35.39  19.30  35.81
wc-input1           9.85  12.02  10.13  12.06
wc-scanStream       9.60  12.95   9.90  12.64
zebra              11.04  15.44  11.02  15.17
zern                8.28   9.75   8.85   9.81
run time
benchmark         MLton0 MLton1 MLton2 MLton3
barnes-hut         18.06  18.74  15.15  15.94
boyer              53.59  59.96  55.25  60.11
checksum           18.73 107.23  18.83 107.20
count-graphs       26.00  24.04  26.24  24.05
DLXSimulator       28.01  30.34  27.39  29.75
fft                14.61  15.34  14.54  15.45
fib                37.68  52.67  39.24  52.68
flat-array         29.27  46.93  29.02  47.00
hamlet             50.41  82.93  50.26  84.74
imp-for            26.97  39.41  26.73  39.56
knuth-bendix       24.96  34.07  24.90  34.01
lexgen             22.16  23.22  22.11  23.01
life               26.40  26.39  26.45  26.39
logic              23.22  25.07  23.31  25.41
mandelbrot         21.64  29.39  20.71  29.21
matrix-multiply    35.93  32.61  35.81  32.74
md5                33.22 230.39  33.24 230.60
merge              48.85  51.19  49.00  51.41
mlyacc             26.21  28.28  26.19  28.23
model-elimination  38.80  48.56  38.28  48.76
mpuz               23.48  45.41  23.44  45.35
nucleic            18.37  18.41  18.18  18.25
output1            37.22  45.94  37.34  46.09
peek               21.91  19.99  23.06  20.05
psdes-random       16.05  13.26  15.88  13.28
ratio-regions     124.21 150.12 124.00 164.76
ray                14.96  15.84  14.09  15.66
raytrace           17.23  19.49  16.49  18.15
simple             27.70  38.85  29.36  39.24
smith-normal-form   8.48   8.59   8.59   8.59
tailfib            22.48  41.96  23.79  41.95
tak                32.33  36.78  32.45  36.44
tensor             22.72  35.64  22.98  35.59
tsp                25.33  25.69  25.28  25.59
tyan               27.55  32.38  28.06  32.51
vector-concat      27.96  30.27  27.97  29.98
vector-rev         37.32  52.38  37.26  54.74
vliw               23.96  32.38  24.31  33.13
wc-input1          34.72  34.87  34.85  34.61
wc-scanStream      28.70  34.74  28.31  34.84
zebra              30.52  24.01  30.54  24.12
zern               22.54  36.43  22.59  36.90



More information about the MLton mailing list