[MLton-devel] SPARC self-compiles and benchmarks

Stephen Weeks MLton@mlton.org
Mon, 14 Apr 2003 16:02:27 -0700


This is all on my 500 MHz UltraSPARC-IIe with 640M RAM.

First, the self compiles.

A two-round bootstrap takes about 10 hours.  The first round takes
about an 1 1/2 hours, of which the first 20 minutes is the generation
of C code and the second 70 minutes is the compilation of the C code.
The second round (which is using the slower mlton-stubs libraries)
takes about 8 1/2 hours, of which the first 7 1/2 hours is the
generation of C code and the last hour is the compilation of the C
code. 

A fixpoint self-compile with the resulting compiler takes about 75
minutes, the first 17 being generation of C code and the remaining
hour the compilation of the C code.  

Using my 1.6 GHz Pentium 4 with 512M RAM, cross-compiling the compiler
using a non-natively built MLton (to be fair to the SPARC) takes about
24 minutes, of which the first 6 minutes is generation of C and the
remaining 18 is the C compile.

In summary, here are the times in minutes.

			gen C	compile C
		       ------  	---------
compiling the stubs	 20	    70
compiling with stubs    450	    60
fixpoint compile	 17	    60
cross compile from P4	  6	    18

So, it looks like the SPARC is about 3 times slower than the P4.  And
that bootstrapping on the SPARC is prohibitively slow.  My packaging
script will be based on cross compiles.

The text+data size of the fixpoint compiler is 10,262,388.

Now for all the usual benchmarks, comparing SML/NJ 110.42 and MLton on
the SPARC.  The numbers are below.  The runtime ratios are a lot
better than I would have expected given that this is the nonnative
backend and there hasn't been any tuning.  The only benchmarks where
MLton is worse are barnes-hut, fft, fib, life, ray, tailfib, and zern.
On the P4, MLton native is worse with barnes-hut, logic, nucleic, and
tyan.  So the only overlap is barnes-hut.  We're still using the
simple C code to handle all possibly-misaligned memory accesses for
doubles, so that's probably hurting some.  It's surprising it doesn't
hurt more (unless of course SML/NJ does the same thing).

Glancing at the raw running times compared with a P4, we see that the
times are roughly 3X-5X slower with checksum, md5 and
smith-normal-form notably worse.

There are a few benchmarks that SML/NJ fails to compile: DLXSimulator,
nucleic, and tensor.  DLXSimulator is due to an internal bug in SML/NJ
(uncaught exception RecoverLty).  Nucleic was killed due to excessive
paging -- I let it compile for over 12 hours and reach 769M before I
killed it.  Tensor fails to compile due to a type error.  I see that
the program assumes that Array.appi has spec

	val appi: (int * 'a -> unit) -> 'a array -> unit 

which is the 2002 spec, which MLton agrees with.  Unfortunately, all
the other compilers still seem to support the 1997 spec, with 

	val appi : (int * 'a -> unit) -> 'a array * int * int option -> unit  

So, it is correct that SML/NJ fails, although the benchmark program
really should put a line for tensor in the run time ratio table.

I killed smith-normal-form with SML/NJ for the usual reason: it runs
too slowly.

One benchmark, vliw, failed to run with MLton.  I've tried it since
and it works fine.  I am investigating.

run time ratio
benchmark         SML/NJ
barnes-hut          0.71
boyer               2.93
checksum            3.60
count-graphs        2.46
fft                 0.94
fib                 0.80
hamlet              2.25
imp-for            13.11
knuth-bendix        4.05
lexgen              1.80
life                0.76
logic               1.38
mandelbrot          1.16
matrix-multiply     5.03
md5                 8.65
merge               3.09
mlyacc              1.69
model-elimination   2.21
mpuz                3.18
peek               11.01
psdes-random        5.32
ratio-regions       7.66
ray                 0.71
raytrace            1.64
simple              1.19
tailfib             0.80
tak                 2.04
tsp                 2.39
tyan                1.01
vector-concat      13.47
vector-rev         51.29
wc-input1          25.75
wc-scanStream      10.52
zebra               9.43
zern                0.64

compile time
benchmark         MLton0 SML/NJ
barnes-hut         14.25   4.64
boyer              63.05  13.15
checksum            3.09   0.79
count-graphs        9.10   2.80
DLXSimulator       22.11      *
fft                 6.81   2.35
fib                 2.85   0.75
hamlet            409.34 184.83
imp-for             3.03   0.79
knuth-bendix       20.56   4.98
lexgen             37.32  11.41
life                8.87   2.20
logic              19.98   5.23
mandelbrot          3.19   0.94
matrix-multiply     3.38   1.09
md5                 6.01   3.15
merge               6.27   0.81
mlyacc            173.35  61.75
model-elimination 168.05 100.53
mpuz                4.57   1.46
nucleic           109.93      *
peek                5.25   0.86
psdes-random        3.22   1.01
ratio-regions      13.57   4.84
ray                22.86   3.09
raytrace           70.83  17.47
simple             62.49  11.12
smith-normal-form 255.41  13.99
tailfib             2.85   0.71
tak                 2.90   0.71
tensor             15.07      *
tsp                 8.17   2.06
tyan               23.92   7.56
vector-concat       3.33   0.77
vector-rev          3.05   0.79
vliw               97.39  45.02
wc-input1           8.56   0.85
wc-scanStream       8.71   0.89
zebra              25.66   2.18
zern                5.36   2.09

run time
benchmark         MLton0   SML/NJ
barnes-hut        207.21   147.05
boyer             157.37   461.19
checksum          447.13  1608.18
count-graphs      192.84   473.69
DLXSimulator      197.18        *
fft               125.41   117.44
fib               220.12   177.02
hamlet            221.12   498.51
imp-for           102.83  1347.82
knuth-bendix      169.64   686.27
lexgen            197.37   354.51
life              266.13   202.97
logic             188.26   259.00
mandelbrot        179.51   208.91
matrix-multiply   263.59  1324.55
md5               906.38  7843.23
merge             217.45   672.13
mlyacc            164.42   277.58
model-elimination 321.04   708.03
mpuz              187.72   597.09
nucleic           265.22        *
peek              140.25  1544.40
psdes-random      135.71   722.52
ratio-regions     133.23  1020.24
ray               143.04   102.11
raytrace          243.68   399.93
simple            297.28   353.90
smith-normal-form 545.43        *
tailfib           186.47   149.94
tak               324.48   663.17
tensor            143.39        *
tsp               193.34   461.72
tyan              201.03   203.10
vector-concat     245.54  3306.75
vector-rev        219.93 11281.07
wc-input1         132.57  3413.07
wc-scanStream     158.84  1671.39
zebra             283.46  2671.89
zern              269.14   171.74

size
benchmark            MLton0    SML/NJ
barnes-hut          152,032   350,196
boyer               171,290   432,092
checksum             57,423   381,656
count-graphs         77,767   370,732
DLXSimulator        122,758         *
fft                  68,051   359,444
fib                  57,375   312,284
hamlet            1,365,132 1,317,060
imp-for              57,207   345,784
knuth-bendix        105,774   350,172
lexgen              185,083   413,684
life                 79,967   330,716
logic               111,890   354,268
mandelbrot           57,279   321,500
matrix-multiply      57,935   356,332
md5                  67,899   343,036
merge                59,007   345,792
mlyacc              514,263   714,788
model-elimination   741,392   901,252
mpuz                 63,775   325,596
nucleic             206,260         *
peek                 65,923   316,428
psdes-random         58,047   322,524
ratio-regions        83,695   356,332
ray                 129,001   411,716
raytrace            267,834   528,484
simple              257,210   679,980
smith-normal-form   283,276   554,052
tailfib              57,111   344,760
tak                  57,439   336,568
tensor              146,083         *
tsp                  75,612   342,004
tyan                130,718   395,276
vector-concat        58,719   353,992
vector-rev           57,823   353,992
vliw                393,099   657,484
wc-input1            80,275   345,784
wc-scanStream        81,243   346,808
zebra               146,937   335,852
zern                 64,773   365,604


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
MLton-devel mailing list
MLton-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlton-devel