sigaltstack and cygwin

Matthew Fluet Matthew Fluet <fluet@CS.Cornell.EDU>
Wed, 6 Mar 2002 12:03:06 -0500 (EST)


> OK.  No problem.  I just checked in the changes to put the assumes in
> and all the tests passed, with one exception.  For testing, I turned
> on "reserveEsp" (defined in x86-codegen.fun) for all compiles instead
> of just for Cygwin programs that use signals.  With that, the "slower"
> regression test gives a segfault.  I'm not sure whether I missed an
> assume or there is a register allocator bug.  Matthew, if you could
> take a look, that would be great.  Just change reserveEsp in
> x86-codegen.fun to true.

Fixed.  It was really easy and I should have remembered it earlier.  There
is a hack in the register allocator to _not_ adjust c_stackP after the
last C call in a basic block; the reasoning is after the final C call,
popping all the arguments by adding to %esp will simply make %esp equal to
c_stackP, which we'll just load in the next block that makes a C call.
So, I was saving one instruction per C call.  In Cygwin (or with
reserveEsp = true) this is very bad, because we'll never refetch c_stackP
from memory, we'll just assume it's up to date in %esp.  So, with
slower.sml, there is a loop that executes for 4304967296 times, each time
through the loop making two C calls -- we simply ran out of stack space.

Anyways, I eliminated that hack.  Here are the benchmark results:

MLton0 -- mlton-stable 
MLton1 -- mlton 
MLton2 -- mlton -native-reserve-esp true
compile time
benchmark         MLton0 MLton1 MLton2
barnes-hut          2.09   2.04   2.01
checksum            0.52   0.48   0.48
count-graphs        1.38   1.38   1.37
DLXSimulator        3.76   3.71   3.71
fft                 1.04   1.03   1.05
fib                 0.46   0.44   0.44
hamlet             45.11  44.86  44.75
imp-for             0.48   0.46   0.47
knuth-bendix        1.82   1.79   1.79
lexgen              4.85   4.83   4.83
life                1.05   1.02   1.02
logic               2.36   2.33   2.33
mandelbrot          0.49   0.47   0.46
matrix-multiply     0.54   0.52   0.53
md5                 1.01   0.99   0.98
merge               0.49   0.49   0.48
mlyacc             18.20  18.15  18.15
mpuz                0.68   0.66   0.64
nucleic             2.26   2.21   2.21
peek                0.83   0.78   0.80
psdes-random        0.53   0.51   0.51
ratio-regions       1.98   1.95   1.98
ray                 3.02   3.02   2.95
raytrace            9.07   9.03   9.02
simple              6.13   6.02   6.05
smith-normal-form   7.01   6.95   6.97
tailfib             0.44   0.45   0.45
tak                 0.45   0.46   0.45
tensor              2.51   2.50   2.49
tsp                 1.22   1.18   1.21
tyan                3.21   3.18   3.15
vector-concat       0.51   0.51   0.52
vector-rev          0.52   0.49   0.50
vliw               10.96  10.81  10.76
wc-input1           1.38   1.34   1.36
wc-scanStream       1.45   1.42   1.38
zebra               4.90   5.10   5.11
zern                0.90   0.85   0.87
run time
benchmark         MLton0 MLton1 MLton2
barnes-hut          3.73   3.74   3.75
checksum            3.18   3.18   3.31
count-graphs        3.54   3.54   3.76
DLXSimulator       14.58  14.58  14.58
fft                 8.76   8.77   8.82
fib                 3.37   3.37   3.37
hamlet              7.20   7.15   7.20
imp-for             7.33   6.61   6.61
knuth-bendix        5.64   5.64   5.52
lexgen              9.33   9.35   9.37
life                5.04   5.11   4.84
logic              17.65  17.70  17.58
mandelbrot          6.06   6.06   6.06
matrix-multiply     2.42   2.42   2.40
md5                 1.76   1.76   1.80
merge              48.12  48.12  48.31
mlyacc              8.65   8.65   8.72
mpuz                4.26   4.26   4.34
nucleic             8.00   8.00   8.00
peek                0.82   0.92   0.82
psdes-random        2.78   2.78   3.14
ratio-regions       8.13   8.12   8.19
ray                 3.36   3.34   3.26
raytrace            4.86   4.86   4.90
simple              5.84   5.84   5.98
smith-normal-form   0.67   0.67   0.67
tailfib            10.96  10.96  10.95
tak                 7.74   7.74   7.74
tsp                 7.51   7.51   7.52
tyan               16.04  16.08  16.06
vector-concat       2.56   2.56   3.16
vector-rev          4.27   4.26   4.30
vliw                5.68   5.68   5.65
wc-input1           1.92   1.92   1.67
wc-scanStream       1.96   1.96   2.20
zebra               1.77   1.76   1.68
zern               32.07  32.10  32.01
run time ratio
benchmark         MLton1 MLton2
barnes-hut          1.00   1.01
checksum            1.00   1.04
count-graphs        1.00   1.06
DLXSimulator        1.00   1.00
fft                 1.00   1.01
fib                 1.00   1.00
hamlet              0.99   1.00
imp-for             0.90   0.90
knuth-bendix        1.00   0.98
lexgen              1.00   1.00
life                1.01   0.96
logic               1.00   1.00
mandelbrot          1.00   1.00
matrix-multiply     1.00   0.99
md5                 1.00   1.02
merge               1.00   1.00
mlyacc              1.00   1..01
mpuz                1.00   1.02
nucleic             1.00   1.00
peek                1.12   1.00
psdes-random        1.00   1.13
ratio-regions       1.00   1.01
ray                 1.00   0.97
raytrace            1.00   1.01
simple              1.00   1.02
smith-normal-form   1.00   1.00
tailfib             1.00   1.00
tak                 1.00   1.00
tsp                 1.00   1.00
tyan                1.00   1.00
vector-concat       1.00   1.23
vector-rev          1.00   1.01
vliw                1.00   0.99
wc-input1           1.00   0.87
wc-scanStream       1.00   1.12
zebra               1.00   0.95
zern                1.00   1.00
size
benchmark            MLton0    MLton1    MLton2
barnes-hut           57,275    57,499    55,195
checksum             23,537    23,569    23,505
count-graphs         45,009    45,073    44,081
DLXSimulator         88,569    88,697    87,193
fft                  33,569    33,601    33,153
fib                  23,569    23,569    23,505
hamlet            1,101,560 1,103,640 1,098,744
imp-for              23,569    23,569    23,505
knuth-bendix         64,994    65,122    63,586
lexgen              149,569   149,825   146,497
life                 40,273    40,273    39,217
logic                80,657    80,657    80,049
mandelbrot           23,633    23,633    23,601
matrix-multiply      24,113    24,145    24,113
md5                  33,218    33,346    32,930
merge                24,785    24,817    24,593
mlyacc              464,577   465,697   458,369
mpuz                 28,145    28,145    27,953
nucleic              62,545    62,545    61,809
peek                 32,194    32,290    31,778
psdes-random         25,009    25,041    24,977
ratio-regions        43,281    43,313    43,153
ray                  84,312    84,632    81,688
raytrace            237,349   237,669   235,173
simple              180,537   180,825   178,233
smith-normal-form   138,667   138,731   136,363
tailfib              23,281    23,281    23,217
tak                  23,697    23,697    23,633
tensor               56,970    57,002    56,074
tsp                  38,594    38,690    38,466
tyan                 85,666    85,954    83,266
vector-concat        24,497    24,497    24,369
vector-rev           24,465    24,497    24,337
vliw                295,665   296,945   286,769
wc-input1            48,666    48,730    47,386
wc-scanStream        49,370    49,434    48,090
zebra               110,178   110,242   106,242
zern                 31,168    31,232    30,720

tensor is raising a runtime exception; is it a known problem?

mlton-stable is code from yesterday.  mlton is the checked in code (i.e.,
with the %esp add hack removed).  mlton -native-reserve-esp true is the
checked in code with reserveEsp forced to true in the codegen.

Results are mixed.  Reserving esp can both hurt or help; it hurts just
from register pressure.  It can help when there are a lot of C calls
(wc-input1) and the cost of fetching c_stackP is the bottleneck.

The esp hack seems not to have much effect, except on peek.  Nothing
obvious going on there; the assembly between mlton-stable and mlton are
identical except for 49 addl instructions that are dropped in
mlton-stable.