x86 update

Stephen Weeks MLton@sourcelight.com
Tue, 12 Dec 2000 10:04:05 -0800 (PST)


> Here's my latest version of the x86 backend.  Changelog looks like:

Integration was successful.  I put the latest snapshot at
	http://www.star-lab.com/sweeks/mlton.tgz

> - improved verifyLiveInfo pass;
>   it's about 5X - 8X faster than before;

On the self compile, this sped up from about 36s (on 12/7) to 23s.

> - eliminated code related to supporting MachineOutput.Operand.Void variant

I actually managed to fix the backend so that Void is no longer in the datatype.

> - modified the signatures for ouputC and outputS (file generators)

I went ahead and modified them a bit more -- I made the results record types.

> Steve, I think you should be able to connect the backend's Switch support
> to your revised MachineOutput Cases with a minimum of difficulty.
> Hopefully, you should just need to uncomment the appropriate lines in
> x86-translate.fun; look for 
> 	      (* differentiate between the types of MachineOutput.cases *)
> for the appropriate lines.

That went smoothly as well.

Here are the changes to the x86 stuff in the current snapshot.

* List.flattenX has been renamed as List.concatX for consistency.  All calls
	should be changed
* x86-codegen.{sig,fun}
	record type for outputC and outputS
* x86-translate
	took out Void operands
	took out Operand.toString (it now appears in MACHINE_OUTPUT)
* x86-allocate-registers
	Gave one example of using folds instead of maps and flattens to save
	allocation.  Look for (* added by sweeks *).

Matthew, I noticed a lot of uses of List.concat and List.map in your code, and a
lot of other opportunities for "deforestation" or more efficient uses of list
operations.  In general, List.fold should be used if possible since it does a
single loop over the list.  For example, in x86-peephole, there is a call
	(List.concat l) @ l'
which could be more efficiently implemented as
	List.fold (rev l, l', op @)
There are lots of other examples.  It should be possible to cut down allocation
quite a bit in the backend by going through the code with an eye for such
things.

In any case, here is the latest self compile log.  Overall, things seem to have
slowed down a bit, but nothing insane.  I should also point out that we should
occasionally run self compiles without -v, because they will run quite a bit
faster (the calls to MLton.size will be avoided).  For example, on the current
snapshot with -v the self compile time is 838s, but without -v it is 761s.

--------------------------------------------------------------------------------

time mlton -v -no-polyvariance mlton.cm
MLton internal (built Tue Dec 12 09:17:44 2000 on starlinux.epr.com)
  created this file on Tue Dec 12 09:19:55 2000.
Do not edit this file.
Flag settings: 
   aux: false
   chunk: chunk per function
   contify strategy: Both
   defines: [NODEBUG,MLton_safe=TRUE,MLton_detectOverflow=TRUE]
   fixed heap: None
   indentation: 3
   includes: [mlton.h]
   inline: NonRecursive {product = 320,small = 60}
   input file: mlton.cm
   instrument: false
   instrument Sxml: false
   keep Cps: false
   match: left to right
   messages: true
   native: true
   native-commented: 0
   native-copy-prop: true
   future: 64
   native-ieee-fp: false
   native-move-hoist: true
   native-optimize: 1
   native-split: Some(100000)
   polyvariance: None
   print at fun entry: false
   profile: false
   show types: false
compile starting
   parse and elaborate starting
   parse and elaborate finished in 62.920
   core-ml size is 89,950,008 bytes
   numPeeks = 14
   average position in property list = 0.0
   numPeeks = 2441584
   average position in bucket = 0.177
   lexAndParse totals 11.400
   elaborate totals 51.480
   dead starting
   dead finished in 0.090
   basis size is 825,068 bytes
   numPeeks = 73995
   average position in property list = 0.0
   numPeeks = 2441584
   average position in bucket = 0.177
   size = 189848
   gcc -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileiZ0ADv /tmp/filetN3NUk.c -L/home/sweeks/mlton/lib -lmlton -lm -lgmp
   /tmp/fileiZ0ADv /tmp/fileleCdKY
   infer starting
      unification starting
      unification finished in 2.580
      finish infer starting
      finish infer finished in 19.460
   infer finished in 22.380
   xml.unsimplified size is 36,842,408 bytes
   numPeeks = 1107165
   average position in property list = 0.000
   numPeeks = 2582402
   average position in bucket = 0.231
   infer simplify starting
   infer simplify finished in 2.980
   xml size is 20,490,044 bytes
   numPeeks = 3350460
   average position in property list = 0.100
   numPeeks = 2582402
   average position in bucket = 0.231
   size = 122793
   num types in program = 21354
   num types in table = 36561
   hash table size is 0 bytes
   mono starting
   mono finished in 6.810
   mono.unsimplified size is 43,832,764 bytes
   numPeeks = 8047380
   average position in property list = 0.042
   numPeeks = 3329919
   average position in bucket = 0.635
   mono simplify starting
   mono simplify finished in 3.790
   mono size is 35,129,612 bytes
   numPeeks = 11257104
   average position in property list = 0.079
   numPeeks = 3329919
   average position in bucket = 0.635
   size = 198308
   num types in program = 13484
   num types in table = 67408
   hash table size is 0 bytes
   implement exceptions starting
   implement exceptions finished in 0.330
   sxml.unsimplified size is 35,692,220 bytes
   numPeeks = 11564742
   average position in property list = 0.077
   numPeeks = 3331607
   average position in bucket = 0.635
   implement exceptions simplify starting
   implement exceptions simplify finished in 4.290
   sxml size is 33,512,940 bytes
   numPeeks = 14032337
   average position in property list = 0.088
   numPeeks = 3331607
   average position in bucket = 0.635
   polyvariance starting
   polyvariance finished in 0.0
   sxml.poly size is 33,512,940 bytes
   numPeeks = 14032337
   average position in property list = 0.088
   numPeeks = 3331607
   average position in bucket = 0.635
   size = 184786
   num types in program = 13043
   num types in table = 67690
   hash table size is 0 bytes
   closure convert starting
      flow analysis starting
      flow analysis finished in 2.630
      flow size is 4,608 bytes
      numPeeks = 15112408
      average position in property list = 0.082
      numPeeks = 3348816
      average position in bucket = 0.638
      free variables starting
      free variables finished in 0.370
      globalize starting
      globalize finished in 0.330
      convert starting
      convert finished in 28.630
   closure convert finished in 32.450
   cps.unsimplified size is 69,407,292 bytes
   numPeeks = 22040929
   average position in property list = 1.730
   numPeeks = 3395663
   average position in bucket = 0.637
   closure convert simplify starting
      simplify starting
	 num functions 12564
	 num local functions 143841
	 num primExps 161589
	 removeUnused starting
	 removeUnused finished in 3.340
	 num functions 10831
	 num local functions 83706
	 num primExps 143790
	 leaf-inline starting
	    inline starting
	    inline finished in 4.010
	 leaf-inline finished in 4.010
	 num functions 8195
	 num local functions 59130
	 num primExps 141922
	 raise-to-jump starting
	    inferHandlers starting
	    inferHandlers finished in 0.170
	 raise-to-jump finished in 4.630
	 num functions 8195
	 num local functions 58747
	 num primExps 141896
	 contify starting
	 contify finished in 2.760
	 num functions 3742
	 num local functions 54823
	 num primExps 133167
	 constantPropagation starting
	    inferHandlers starting
	    inferHandlers finished in 0.140
	    fixed point starting
	    fixed point finished in 3.100
	 constantPropagation finished in 7.650
	 num functions 3742
	 num local functions 54243
	 num primExps 97735
	 useless starting
	    analyze starting
	    analyze finished in 4.860
	 useless finished in 9.960
	 num functions 3742
	 num local functions 51799
	 num primExps 89011
	 removeUnused starting
	 removeUnused finished in 0.610
	 num functions 3672
	 num local functions 50505
	 num primExps 86692
	 simplifyTypes starting
	    fixed point starting
	    fixed point finished in 0.050
	 simplifyTypes finished in 2.950
	 num functions 3672
	 num local functions 42038
	 num primExps 83443
	 poly-equal starting
	 poly-equal finished in 0.160
	 num functions 3684
	 num local functions 42675
	 num primExps 83942
	 contify starting
	 contify finished in 2.160
	 num functions 3584
	 num local functions 42657
	 num primExps 83822
	 inline starting
	 inline finished in 4.320
	 num functions 999
	 num local functions 67665
	 num primExps 136619
	 removeUnused starting
	 removeUnused finished in 4.540
	 num functions 999
	 num local functions 65381
	 num primExps 135578
	 raise-to-jump starting
	    inferHandlers starting
	    inferHandlers finished in 0.180
	 raise-to-jump finished in 1.510
	 num functions 999
	 num local functions 65325
	 num primExps 135553
	 contify starting
	 contify finished in 3.060
	 num functions 998
	 num local functions 65323
	 num primExps 135551
	 introduce-loops starting
	 introduce-loops finished in 0.050
	 num functions 998
	 num local functions 65349
	 num primExps 135551
	 loop-invariant starting
	 loop-invariant finished in 2.990
	 num functions 998
	 num local functions 62428
	 num primExps 127775
	 flatten starting
	    analyze starting
	    analyze finished in 0.160
	 flatten finished in 3.800
	 num functions 998
	 num local functions 62510
	 num primExps 86735
	 redundant starting
	 redundant finished in 2.950
	 num functions 998
	 num local functions 62510
	 num primExps 86735
	 removeUnused starting
	 removeUnused finished in 0.730
	 num functions 998
	 num local functions 62208
	 num primExps 85089
      simplify finished in 75.430
   closure convert simplify finished in 75.430
   cps size is 50,671,716 bytes
   numPeeks = 53906183
   average position in property list = 0.792
   numPeeks = 3662642
   average position in bucket = 0.810
   backend starting
      compute representations starting
      compute representations finished in 0.020
      inferHandlers starting
      inferHandlers finished in 0.160
      chunkify starting
      chunkify finished in 0.060
      allocate registers starting
      allocate registers finished in 8.690
   backend finished in 10.550
    size is 58,777,972 bytes
   numPeeks = 62188646
   average position in property list = 0.772
   numPeeks = 3663640
   average position in bucket = 0.810
   x86 code gen starting
      outputC starting
      outputC finished in 0.370
      outputAssembly starting
	 translateChunk totals 16.640
	 simplify totals 93.230
	    verifyLiveInfo totals 22.870
	    computeJumpInfo totals 1.190
	    elimGoto totals 6.490
	       elimIff: 3
	       elimSwitch: 37
	       elimSimpleGoto totals 0.990
	       elimComplexGoto totals 0.850
	    verifyJumpInfo totals 0.0
	    peepholeBlock_pre totals 3.840
	       commuteBinALMD: 508
	       elimAddSub1: 1790
	       elimMDPow2: 180
	    toLivenessBlock totals 17.750
	    moveHoist totals 12.390
	    peepholeLivenessBlock totals 8.330
	       elimALCopy: 17476
	       elimFltACopy: 23
	       elimDeadDsts: 102
	       elimSelfMove: 1072
	       elimFltSelfMove: 0
	       commuteBinALMD: 1037
	       commuteFltBinA: 17
	       conditionalJump: 2930
	    copyPropagate totals 8.360
	    peepholeLivenessBlock_minor totals 2.130
	       elimDeadDsts_minor: 0
	       elimSelfMove_minor: 0
	       elimFltSelfMove_minor: 0
	    verifyLivenessBlock totals 0.0
	    toBlock totals 0.560
	    peepholeBlock_post totals 5.090
	       elimBinALMDDouble: 33
	       elimFltBinADouble: 0
	       elimCMPTST: 0
	    generateTransfers totals 3.600
	 allocateRegisters totals 398.270
	    toLiveness totals 209.460
	    toNoLiveness totals 0.0
	    Assembly.allocateRegisters totals 187.900
	       Instruction.allocateRegisters totals 102.470
		  pre totals 23.720
		  post totals 35.890
		  allocateOperand totals 21.870
		  allocateFltOperand totals 0.0
		  allocateFltStackOperands totals 0.0
	       Directive.allocateRegisters totals 24.360
	 validate totals 0.0
      outputAssembly finished in 522.050
   x86 code gen finished in 563.870
   numPeeks = 69437503
   average position in property list = 0.906
   numPeeks = 3741207
   average position in bucket = 0.831
compile finished in 806.380
gcc -S -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileLvc9HL.s /tmp/fileOREyEX.c
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileXIzt7u.o /tmp/fileLvc9HL.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileY1iOe3.o /tmp/fileV704d8.9.S
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filebiDCQs.o /tmp/fileYbt8Uv.8.S
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileNUjWDs.o /tmp/fileFE54Yu.7.S
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filewXXwnS.o /tmp/filexqdwsb.6.S
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filefoLEfe.o /tmp/file46MCcC.5.S
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileWGGyfJ.o /tmp/fileV8si3W.4.S
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileYm50ZP.o /tmp/filejNaHtE.3.S
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileL3bKKZ.o /tmp/filemasFEK.2.S
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileQaVcxu.o /tmp/fileQjVzGR.1.S
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filerN3oiz.o /tmp/fileansbzc.0.S
gcc -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o mlton /tmp/fileXIzt7u.o /tmp/filerN3oiz.o /tmp/fileQaVcxu.o /tmp/fileL3bKKZ.o /tmp/fileYm50ZP.o /tmp/fileWGGyfJ.o /tmp/filefoLEfe.o /tmp/filewXXwnS.o /tmp/fileNUjWDs.o /tmp/filebiDCQs.o /tmp/fileY1iOe3.o -L/home/sweeks/mlton/lib -lmlton -lm -lgmp
max semispace size(bytes): 226,492,416
max stack size(bytes): 3,776,512
GC time(ms): 350,280 (43.4%)
maxPause(ms): 4,970
number of GCs: 242
bytes allocated: 40,433,081,240
bytes copied: 11,657,694,308
max bytes live: 153,467,768
size mlton
829.63user 8.44system 14:01.44elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (25522major+406338minor)pagefaults 0swaps
   text	   data	    bss	    dec	    hex	filename
3837534	 547056	  27336	4411926	 435216	mlton