new liveness pass

Stephen Weeks sweeks@intertrust.com
Fri, 1 Dec 2000 17:06:46 -0800 (PST)


Most excellent.  The new liveness pass that I wrote sped up allocate registers
by quite a bit.  For a self compile, it cut down the time from 84 seconds to 11
seconds.  The pass is based on the liveness algorithm described in section 4.13,
page 132, of Morgan's "Building and Optimizing Compiler".  BTW, the Dragon book
and Muchnick's book provided no help at all on speeding up liveness.  They
suggest using bit-vectors, which is infeasible for MLton due to the large size
of CPS functions.

Anyways, here is a description of the new algorithm.

Walk over the whole program and
 1. Build the predecessor graph of basic blocks.  Each basic block records the
    set of its predecessors and the set of variables live at the beginning of
    the block.
 2. For each variable record the block in which is defined and the list of
    blocks where it is used.

Now, for each variable, propagate the liveness information backwards from uses
along basic blocks until the definition block is reached.

That's it.  The reason why it's so fast is that it processes one variable at a
time, and hence the operation to determine if that variable is in the live list
for a particular block is constant time -- the variable is either at the head of 
the list or it's not there.

Matthew, maybe you can use some variant of this algorithm, if we don't figure
out how to avoid doing liveness twice altogether.

Also, here is the log for a self compile with the new pass.  The self compile
time is down to 765 seconds.  This is the first time we're under 800!  The funny 
thing is, the last self compile was at 948 seconds, so the difference can not be 
explained by this pass alone.  The (unchanged) x86 codegen sped up by 59 seconds!
My only conjecture is that the backend is leaving around some property lists on
labels, which is slowing down the codegen.  The new liveness pass doesn't
leave quite as many around, so maybe that was helping.  I'll look into making
sure the backend cleans up after itself.

% cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 8
model name	: Pentium III (Coppermine)
stepping	: 3
cpu MHz		: 731.480093
cache size	: 256 KB
fdiv_bug	: no
hlt_bug		: no
sep_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 2
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr xmm
bogomips	: 729.09

Compiling mlton (takes a while)
time mlton -v -no-polyvariance mlton.cm
MLton 20001130 (built Fri Dec  1 15:47:44 2000 on starlinux.epr.com)
  created this file on Fri Dec  1 15:49:53 2000.
Do not edit this file.
Flag settings: 
   aux: false
   chunk: coalesce 2000
   contify strategy: Both
   defines: [NODEBUG,MLton_safe=TRUE,MLton_detectOverflow=TRUE]
   fixed heap: None
   indentation: 3
   includes: [mlton.h]
   inline: NonRecursive {product = 320,small = 60}
   input file: mlton.cm
   instrument: false
   instrument Sxml: false
   keep Cps: false
   match: left to right
   messages: true
   native: true
   native-commented: 0
   native-copy-prop: true
   native-ieee-fp: false
   native-move-hoist: true
   native-optimize: 1
   native-split: Some(100000)
   polyvariance: None
   print at fun entry: false
   profile: false
   show types: false
compile starting
   parse and elaborate starting
   parse and elaborate finished in 54.500
   core-ml size is 89,907,652 bytes
   numPeeks = 14
   average position in property list = 0.0
   numPeeks = 2450808
   average position in bucket = 0.221
   dead starting
   dead finished in 0.090
   basis size is 867,352 bytes
   numPeeks = 72743
   average position in property list = 0.0
   numPeeks = 2450808
   average position in bucket = 0.221
   size = 187411
   gcc -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileuOZ0Ia /tmp/fileuiNTnF.c -L/home/sweeks/mlton/lib -lmlton -lm -lgmp
   /tmp/fileuOZ0Ia /tmp/fileeaXAYV
   infer starting
      unification starting
      unification finished in 1.570
      finish infer starting
      finish infer finished in 20.950
   infer finished in 23.290
   xml.unsimplified size is 37,952,500 bytes
   numPeeks = 1077855
   average position in property list = 0.000
   numPeeks = 2589647
   average position in bucket = 0.271
   infer simplify starting
   infer simplify finished in 3.510
   xml size is 22,331,948 bytes
   numPeeks = 4570806
   average position in property list = 0.546
   numPeeks = 2589647
   average position in bucket = 0.271
   size = 122292
   num types in program = 20501
   num types in table = 35249
   hash table size is 0 bytes
   mono starting
   mono finished in 5.540
   mono.unsimplified size is 42,350,988 bytes
   numPeeks = 9074433
   average position in property list = 0.275
   numPeeks = 3306823
   average position in bucket = 0.601
   mono simplify starting
   mono simplify finished in 5.740
   mono size is 34,543,836 bytes
   numPeeks = 13789800
   average position in property list = 0.428
   numPeeks = 3306823
   average position in bucket = 0.601
   size = 191119
   num types in program = 12946
   num types in table = 64373
   hash table size is 0 bytes
   implement exceptions starting
   implement exceptions finished in 0.320
   sxml.unsimplified size is 35,101,760 bytes
   numPeeks = 14086561
   average position in property list = 0.419
   numPeeks = 3309224
   average position in bucket = 0.602
   implement exceptions simplify starting
   implement exceptions simplify finished in 2.970
   sxml size is 33,016,448 bytes
   numPeeks = 17578537
   average position in property list = 0.470
   numPeeks = 3309224
   average position in bucket = 0.602
   polyvariance starting
   polyvariance finished in 0.0
   sxml.poly size is 33,016,448 bytes
   numPeeks = 17578537
   average position in property list = 0.470
   numPeeks = 3309224
   average position in bucket = 0.602
   size = 178729
   num types in program = 12513
   num types in table = 64653
   hash table size is 0 bytes
   closure convert starting
      flow analysis starting
      flow analysis finished in 0.760
      flow size is 4,252 bytes
      numPeeks = 18621855
      average position in property list = 0.443
      numPeeks = 3325768
      average position in bucket = 0.605
      free variables starting
      free variables finished in 0.380
      globalize starting
      globalize finished in 0.290
      convert starting
      convert finished in 32.310
   closure convert finished in 34.180
   cps.unsimplified size is 69,048,712 bytes
   numPeeks = 25297174
   average position in property list = 1.679
   numPeeks = 3685461
   average position in bucket = 0.672
   closure convert simplify starting
      simplify starting
	 num functions 12069
	 num local functions 138606
	 num primExps 156489
	 numPeeks = 25548537
	 average position in property list = 1.662
	 numPeeks = 3685461
	 average position in bucket = 0.672
	 remove-unused starting
	 remove-unused finished in 4.440
	 num functions 10422
	 num local functions 80482
	 num primExps 139488
	 numPeeks = 28693549
	 average position in property list = 1.483
	 numPeeks = 3685786
	 average position in bucket = 0.672
	 leaf-inline starting
	    inline starting
	    inline finished in 4.020
	 leaf-inline finished in 4.020
	 num functions 7864
	 num local functions 57906
	 num primExps 138322
	 numPeeks = 30304546
	 average position in property list = 1.411
	 numPeeks = 3685786
	 average position in bucket = 0.672
	 raise-to-jump starting
	    inferHandlers starting
	    inferHandlers finished in 0.160
	 raise-to-jump finished in 4.650
	 num functions 7864
	 num local functions 57524
	 num primExps 138295
	 numPeeks = 31724934
	 average position in property list = 1.357
	 numPeeks = 3685786
	 average position in bucket = 0.672
	 contify starting
	 contify finished in 3.110
	 num functions 3668
	 num local functions 54020
	 num primExps 130444
	 numPeeks = 32978353
	 average position in property list = 1.312
	 numPeeks = 3689431
	 average position in bucket = 0.674
	 constant propagation starting
	    inferHandlers starting
	    inferHandlers finished in 0.140
	    fixed point starting
	    fixed point finished in 3.620
	 constant propagation finished in 7.940
	 num functions 3668
	 num local functions 53397
	 num primExps 95704
	 numPeeks = 35346094
	 average position in property list = 1.244
	 numPeeks = 3823941
	 average position in bucket = 0.823
	 useless starting
	    analyze starting
	    analyze finished in 5.060
	 useless finished in 9.970
	 num functions 3668
	 num local functions 50915
	 num primExps 87502
	 numPeeks = 37873441
	 average position in property list = 1.186
	 numPeeks = 3961408
	 average position in bucket = 0.838
	 remove-unused starting
	 remove-unused finished in 2.550
	 num functions 3606
	 num local functions 49804
	 num primExps 85347
	 numPeeks = 39588129
	 average position in property list = 1.136
	 numPeeks = 3961500
	 average position in bucket = 0.838
	 simplify-types starting
	    fixed point starting
	    fixed point finished in 0.060
	 simplify-types finished in 2.950
	 num functions 3606
	 num local functions 41655
	 num primExps 82150
	 numPeeks = 41543485
	 average position in property list = 1.095
	 numPeeks = 3966951
	 average position in bucket = 0.838
	 poly-equal starting
	 poly-equal finished in 0.170
	 num functions 3618
	 num local functions 42299
	 num primExps 82663
	 numPeeks = 41667937
	 average position in property list = 1.091
	 numPeeks = 3967620
	 average position in bucket = 0.838
	 contify starting
	 contify finished in 2.140
	 num functions 3520
	 num local functions 42285
	 num primExps 82537
	 numPeeks = 42529380
	 average position in property list = 1.072
	 numPeeks = 3967718
	 average position in bucket = 0.838
	 inline starting
	 inline finished in 4.390
	 num functions 978
	 num local functions 67568
	 num primExps 134281
	 numPeeks = 44670916
	 average position in property list = 1.026
	 numPeeks = 3967718
	 average position in bucket = 0.838
	 remove-unused starting
	 remove-unused finished in 0.860
	 num functions 978
	 num local functions 64888
	 num primExps 133203
	 numPeeks = 47282825
	 average position in property list = 0.971
	 numPeeks = 3967731
	 average position in bucket = 0.838
	 raise-to-jump starting
	    inferHandlers starting
	    inferHandlers finished in 0.170
	 raise-to-jump finished in 3.440
	 num functions 978
	 num local functions 64832
	 num primExps 133178
	 numPeeks = 48835397
	 average position in property list = 0.944
	 numPeeks = 3967731
	 average position in bucket = 0.838
	 contify starting
	 contify finished in 3.080
	 num functions 977
	 num local functions 64830
	 num primExps 133176
	 numPeeks = 50122736
	 average position in property list = 0.923
	 numPeeks = 3967732
	 average position in bucket = 0.838
	 introduce-loops starting
	 introduce-loops finished in 0.050
	 num functions 977
	 num local functions 64854
	 num primExps 133176
	 numPeeks = 50297035
	 average position in property list = 0.920
	 numPeeks = 3967756
	 average position in bucket = 0.838
	 loop-invariant starting
	 loop-invariant finished in 3.010
	 num functions 977
	 num local functions 62335
	 num primExps 126505
	 numPeeks = 51676987
	 average position in property list = 0.899
	 numPeeks = 3967756
	 average position in bucket = 0.838
	 flatten starting
	    analyze starting
	    analyze finished in 0.150
	 flatten finished in 3.700
	 num functions 977
	 num local functions 62413
	 num primExps 84082
	 numPeeks = 53677400
	 average position in property list = 0.871
	 numPeeks = 3970447
	 average position in bucket = 0.838
	 redundant starting
	 redundant finished in 0.770
	 num functions 977
	 num local functions 62413
	 num primExps 84082
	 numPeeks = 54394802
	 average position in property list = 0.861
	 numPeeks = 3970447
	 average position in bucket = 0.838
	 remove-unused starting
	 remove-unused finished in 2.900
	 num functions 977
	 num local functions 62119
	 num primExps 82433
	 numPeeks = 56609014
	 average position in property list = 0.829
	 numPeeks = 3970460
	 average position in bucket = 0.838
      simplify finished in 74.090
   closure convert simplify finished in 74.090
   cps size is 49,689,064 bytes
   numPeeks = 56609014
   average position in property list = 0.829
   numPeeks = 3970460
   average position in bucket = 0.838
   backend starting
      compute representations starting
      compute representations finished in 0.030
      inferHandlers starting
      inferHandlers finished in 0.140
      chunkify starting
      chunkify finished in 2.810
      allocate registers starting
      allocate registers finished in 10.570
      reg size is 4,420 bytes
      numPeeks = 67004509
      average position in property list = 0.916
      numPeeks = 3970460
      average position in bucket = 0.838
   backend finished in 15.630
    size is 59,136,316 bytes
   numPeeks = 68683322
   average position in property list = 0.923
   numPeeks = 3986796
   average position in bucket = 0.838
   x86 code gen starting
      outputC starting
      outputC finished in 0.440
      outputAssembly starting
	 translateChunk totals 18.250
	 simplify totals 103.800
	    verifyLiveInfo totals 31.100
	    computeJumpInfo totals 2.680
	    elimGoto totals 13.200
	       elimIff: 2
	       elimSwitch: 17
	       elimSimpleGoto totals 2.660
	       elimComplexGoto totals 3.980
	    verifyJumpInfo totals 1.350
	    peepholeBlock_pre totals 2.030
	       commuteBinALMD: 476
	       elimBinAL0L: 0
	       elimBinAL0R: 0
	       elimAddSub1: 1787
	       elimMDPow2: 183
	    toLivenessBlock totals 9.820
	    moveHoist totals 14.080
	    peepholeLivenessBlock totals 9.840
	       elimALCopy: 17039
	       elimFltACopy: 20
	       elimDeadDsts: 92
	       elimSelfMove: 1076
	       elimFltSelfMove: 0
	       commuteBinALMD: 1059
	       commuteFltBinA: 17
	       conditionalJump: 3249
	    copyPropagate totals 5.190
	    peepholeLivenessBlock_minor totals 4.920
	       elimDeadDsts_minor: 0
	       elimSelfMove_minor: 0
	       elimFltSelfMove_minor: 0
	    verifyLivenessBlock totals 0.0
	    toBlock totals 0.530
	    peepholeBlock_post totals 4.930
	       elimBinALMDouble: 33
	       elimFltBinADouble: 0
	       elimCMPTST: 0
	    generateTransfers totals 2.290
	 allocateRegisters totals 322.390
	    toLiveness totals 133.150
	    toNoLiveness totals 0.010
	    Assembly.allocateRegisters totals 188.560
	       Instruction.allocateRegisters totals 98.020
		  pre totals 22.920
		  post totals 34.130
		  allocateOperand totals 24.600
		  allocateFltOperand totals 0.0
		  allocateFltStackOperands totals 0.0
	       Directive.allocateRegisters totals 13.560
	 validate totals 0.0
      outputAssembly finished in 452.100
   x86 code gen finished in 496.480
   numPeeks = 73619953
   average position in property list = 0.925
   numPeeks = 4048821
   average position in bucket = 0.841
compile finished in 737.110
gcc -S -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filebOiKTi.s /tmp/fileW6a8tX.c
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileVuChSy.o /tmp/filebOiKTi.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filesavxTO.o /tmp/filemC95zY.9.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filexn25xG.o /tmp/fileBXQlhQ.8.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileNUFuxn.o /tmp/fileDYOgyZ.7.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filexEGwPJ.o /tmp/fileRJxRHU.6.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileWxcryO.o /tmp/fileDhBbVB.5.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileOq5dTS.o /tmp/filey4L0Da.4.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filebdLNzJ.o /tmp/filelifDPe.3.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileABQjvN.o /tmp/fileYyPBMd.2.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filer5RlPz.o /tmp/filemVttax.1.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileIn50AH.o /tmp/file4EVnXu.0.s
gcc -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o mlton /tmp/fileVuChSy.o /tmp/fileIn50AH.o /tmp/filer5RlPz.o /tmp/fileABQjvN.o /tmp/filebdLNzJ.o /tmp/fileOq5dTS.o /tmp/fileWxcryO.o /tmp/filexEGwPJ.o /tmp/fileNUFuxn.o /tmp/filexn25xG.o /tmp/filesavxTO.o -L/home/sweeks/mlton/lib -lmlton -lm -lgmp
size mlton
757.02user 7.94system 12:47.17elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (19269major+397606minor)pagefaults 0swaps
   text	   data	    bss	    dec	    hex	filename
3703961	 646120	  23272	4373353	 42bb69	mlton