local refs

Matthew Fluet Matthew Fluet <fluet@CS.Cornell.EDU>
Fri, 30 Nov 2001 18:46:14 -0500 (EST)


> > Compile times still disappointing; run times unchanged.
> 
> I don't see anything noticeably bad compile-time wise, and the
> runtimes look on the whole OK.  Code sizes are even a percent or two
> better.  

Yeah, I knew that top level handler stuff was killing us. ;)


Here's how wc-input1 sped up:

[fluet@lennon wc-input1]$ mlprof -d 1 wc-input1.old mlmon.old.out
4.16 seconds of CPU time
main_0                      79.57%
     loop_21         51.66%       
     L_163           10.27%       
     L_164            7.25%       
     L_91             5.74%       
     L_92             5.14%       
     L_44             4.83%       
     L_162            4.53%       
     L_36             2.72%       
     L_47             2.42%       
     L_45             2.42%       
     L_41             2.11%       
     L_37             0.30%       
     loop_10          0.30%       
     L_42             0.30%       
<unknown>                   11.30%
Thread_atomicEnd (C)         6.97%
GC_doGC (C)                  2.16%

  L_165 ()
    x_315 = Ref_ref (global_26)
    x_316 = Ref_ref (global_26)
    x_313 = Ref_ref (global_2)
    x_312 = Ref_ref (global_2)
    x_317 = Array_array (global_25)
    x_309 = (x_317, x_316, x_315, x_314, x_313, x_312)
    x_311 = Ref_deref (openIns_0)
    x_310 = ::_5 (x_311, x_309)
    Ref_assign (openIns_0, x_310)
    x_233 = Ref_ref (x_309)
    loop_21 (global_2)
  loop_21 (x_152)
    x_300 = Ref_deref (x_233)
    x_162 = #6 x_300
    x_160 = #5 x_300
    x_157 = #1 x_300
    x_303 = Ref_deref (x_160)
    x_308 = Ref_deref (x_162)
    x_307 = Int_lt (x_303, x_308)
    case x_307 of
      false => L_161 | true => L_164


[fluet@lennon wc-input1]$ mlprof -d 1 wc-input1.new mlmon.new.out
3.55 seconds of CPU time
main_0                      75.77%
     loop_21         31.23%       
     L_90            16.36%       
     L_159           16.36%       
     L_89             9.29%       
     L_160            7.43%       
     L_42             4.83%       
     L_34             4.09%       
     L_45             2.97%       
     L_43             2.97%       
     L_39             2.23%       
     loop_8           0.74%       
     L_161            0.37%       
     L_37             0.37%       
     loop_9           0.37%       
     L_44             0.37%       
<unknown>                   13.52%
Thread_atomicEnd (C)         9.30%
GC_doGC (C)                  1.41%

  L_161 ()
    x_268 = Ref_ref (global_26)
    x_197 = Ref_ref (global_26)
    x_156 = Ref_ref (global_2)
    x_158 = Ref_ref (global_2)
    x_155 = Array_array (global_25)
    x_296 = (x_155, x_197, x_268, x_220, x_156, x_158)
    x_297 = Ref_deref (openIns_0)
    x_295 = ::_5 (x_297, x_296)
    Ref_assign (openIns_0, x_295)
    loop_21 (global_2)
  loop_21 (x_150)
    x_290 = Ref_deref (x_156)
    x_294 = Ref_deref (x_158)
    x_293 = Int_lt (x_290, x_294)
    case x_293 of
      false => L_157 | true => L_160


So, it wasn't an accumulator ref; we just avoid an extra level of
indirection in accessing the known components of the tuple.



Here's wc-scanStream:

[fluet@lennon wc-scanStream]$ mlprof -d 1 wc-scanStream.old mlmon.old.out 
6.38 seconds of CPU time
main_0                      81.19%
     loop_40         25.48%       
     input1_0        24.90%       
     L_182           21.04%       
     L_186            5.60%       
     L_185            4.83%       
     L_184            4.63%       
     L_183            3.47%       
     L_44             2.70%       
     L_45             1.54%       
     L_36             1.35%       
     L_47             1.16%       
     L_42             0.97%       
     L_41             0.77%       
     L_46             0.58%       
     L_178            0.19%       
     loop_9           0.19%       
     loop_8           0.19%       
     L_43             0.19%       
     L_165            0.19%       
<unknown>                   12.54%
Thread_atomicEnd (C)         5.33%
GC_doGC (C)                  0.94%

  loop_40 (x_245, x_349, x_348, x_347, x_346, x_345)
    input1_0 (x_349, x_348, x_347, x_346, x_345)
  input1_0 (x_342, x_336, x_305, x_306, x_304)
    x_344 = Array_length (x_336)
    x_343 = Int_geu (x_342, x_344)
    case x_343 of
      false => L_186 | true => L_187
  L_182 ()
    loop_40 (x_245, x_337, x_336, x_305, x_306, x_304)

[fluet@lennon wc-scanStream]$ nm wc-scanStream.old | grep loop_40
0804d44a t MLtonProfile916$$0.main_0$$1.loop_40$$2.loop_40$$Begin
0804d44a t loop_40
[fluet@lennon wc-scanStream]$ nm wc-scanStream.old | grep input1_0
0804d486 t MLtonProfile917$$0.main_0$$1.input1_0$$2.input1_0$$Begin
0804d486 t input1_0



[fluet@lennon wc-scanStream]$ mlprof -d 1 wc-scanStream.new mlmon.new.out 
7.23 seconds of CPU time
main_0                      81.47%
     loop_37         32.94%       
     L_165           22.75%       
     input1_0        22.41%       
     L_167            3.90%       
     L_168            3.57%       
     L_166            3.06%       
     L_169            2.72%       
     L_42             2.55%       
     L_34             1.53%       
     L_45             1.53%       
     L_39             1.36%       
     L_43             1.19%       
     L_36             0.17%       
     loop_9           0.17%       
     L_44             0.17%       
<unknown>                   13.00%
Thread_atomicEnd (C)         4.70%
GC_doGC (C)                  0.69%
GC_gc (C)                    0.14%

  loop_37 (x_248, x_317, x_316, x_315, x_314, x_313)
    input1_0 (x_317, x_316, x_315, x_314, x_313)
  input1_0 (x_310, x_304, x_273, x_274, x_272)
    x_312 = Array_length (x_304)
    x_311 = Int_geu (x_310, x_312)
    case x_311 of
      false => L_169 | true => L_170
  L_165 ()
    loop_37 (x_248, x_305, x_304, x_273, x_274, x_272)

[fluet@lennon wc-scanStream]$ nm wc-scanStream.new | grep loop_37
0804d104 t MLtonProfile893$$0.main_0$$1.loop_37$$2.loop_37$$Begin
0804d104 t loop_37
[fluet@lennon wc-scanStream]$ nm wc-scanStream.new | grep input_1
[fluet@lennon wc-scanStream]$ nm wc-scanStream.new | grep input1_0
0804d140 t MLtonProfile894$$0.main_0$$1.input1_0$$2.input1_0$$Begin
0804d140 t input1_0


No idea here; they look pretty much the same; and new even looks like it
has better loop alignment.


I'm going to go ahead and check in what I have.