[MLton] Question on profile.fun

Matthew Fluet fluet@cs.cornell.edu
Mon, 6 Jun 2005 09:37:15 -0400 (EDT)


> > So, this seems to suggest that the slowdown due to missed SSA 
> > optimizations is fairly low, though it is the cause of the insane behavior 
> > of wc-scanStream.  Knowing that, it is probably worth adding to a TODO to 
> > investigate.
> 
> A quick investigation turned up this surprising result.  I wrote a little
> SSA pass to drop profiling expressions from an SSA IL program (and set
> Control.profile to ProfileNone) and inserted it after every pass in the 
> SSA optimization sequence.
> 
>           dropProfileQ
>           localRef
>           dropProfileR
>           flatten
>           dropProfileS
>           localFlatten3
>           dropProfileT
>           commonArg
>           dropProfileU
>           commonSubexp
> 
> I don't know if this actually lays the blame squarely at 
> the feet of Flatten.flatten (or possibly Shrink.shrinkFunction).  
> Flatten.flatten doesn't appear to be sensitive to the presence of 
> profiling statements in the program.  But, there is something definitely 
> going on there.

Fortunately (or, unfortunately, as it sheds no more light on what's going
on), wc-scanStream is truly an outlier, in that no other benchmark has a
shift at flatten (maybe tsp, but I suspect that's just noise; tyan, vliw,
and zern look like they have just as much variance in the other
dirrection).

MLton0 -- mlton -profile no
MLton1 -- mlton -profile  drop -drop-pass 'dropProfile[A-Q]'
MLton2 -- mlton -profile  drop -drop-pass 'dropProfile[A-R]'
MLton3 -- mlton -profile  drop -drop-pass 'dropProfile[A-S]'
MLton4 -- mlton -profile  drop -drop-pass 'dropProfile[A-T]'
run time ratio
benchmark         MLton0 MLton1 MLton2 MLton3 MLton4
barnes-hut          1.00   1.04   1.04   1.04   1.04
boyer               1.00   1.03   1.02   0.84   0.84
checksum            1.00   1.00   1.00   1.00   1.00
count-graphs        1.00   1.00   1.00   1.00   1.03
DLXSimulator        1.00   0.99   1.00   1.27   1.00
fft                 1.00   1.00   1.00   0.99   1.00
fib                 1.00   1.10   1.10   1.10   1.10
flat-array          1.00   1.00   1.00   1.00   1.00
hamlet              1.00   1.02   1.04   1.03   1.03
imp-for             1.00   1.00   1.00   1.00   1.00
knuth-bendix        1.00   1.10   1.10   1.10   1.10
lexgen              1.00   0.97   0.96   0.96   0.96
life                1.00   1.03   1.03   1.03   1.03
logic               1.00   1.00   1.00   1.00   1.00
mandelbrot          1.00   1.04   1.04   1.04   1.04
matrix-multiply     1.00   1.00   1.00   1.00   1.00
md5                 1.00   1.00   1.00   1.00   1.00
merge               1.00   1.00   1.00   1.00   1.00
mlyacc              1.00   1.01   1.01   1.01   1.01
model-elimination   1.00   1.00   1.00   1.00   1.01
mpuz                1.00   1.02   1.02   1.02   1.02
nucleic             1.00   0.98   0.98   0.98   0.98
output1             1.00   0.94   0.94   0.94   0.94
peek                1.00   1.00   1.00   1.00   1.00
psdes-random        1.00   1.10   1.10   1.10   1.10
ratio-regions       1.00   1.05   1.04   1.05   1.04
ray                 1.00   1.05   1.05   1.05   1.05
raytrace            1.00   1.03   1.02   1.03   1.02
simple              1.00   1.00   1.00   1.00   1.00
smith-normal-form   1.00   1.00   1.00   1.00   1.00
tailfib             1.00   1.00   1.00   1.00   1.00
tak                 1.00   1.31   1.31   1.31   1.31
tensor              1.00   1.00   1.00   1.04   1.01
tsp                 1.00   1.00   1.07   1.09   1.09
tyan                1.00   1.06   0.99   1.05   1.03
vector-concat       1.00   1.00   0.99   0.99   0.97
vector-rev          1.00   1.00   1.00   1.02   1.00
vliw                1.00   1.06   1.00   1.06   1.04
wc-input1           1.00   1.01   1.00   1.02   1.03
wc-scanStream       1.00   1.40   7.40   7.32   7.30
zebra               1.00   1.05   1.01   1.07   1.10
zern                1.00   1.00   1.02   1.04   0.99