[MLton] s->alignment considered harmful

Matthew Fluet fluet at tti-c.org
Sat Nov 3 08:13:24 PST 2007


Here are the results of benchmarks investigating the impact of dynamic vs 
fixed alignment and div vs. bitop implementations of align.

MLton0 -- svn HEAD, -align 4
MLton1 -- svn HEAD, -align 8
MLton2 -- svn HEAD + align-h-opt.patch, -align 4
MLton3 -- svn HEAD + align-h-opt.patch, -align 8
MLton4 -- svn HEAD + sed s/s->alignment/4/, -align 4
MLton5 -- svn HEAD + sed s/s->alignment/4/ + align-h-opt.patch, -align 4
MLton6 -- svn HEAD + sed s/s->alignment/8/, -align 8
MLton7 -- svn HEAD + sed s/s->alignment/8/ + align-h-opt.patch, -align 8

On SHADOW (with 8GB physical memory), the benchmarks are compiled with 
-runtime 'ram-slop 0.125', so that there is some GC pressure.  Although, I 
believe that for most benchmarks, the default heuristics will never try to 
grow the heap to the point where there is a difference between (the 
default) ram-slop 0.5 and ram-slop 0.125.

I also added a thread-switch benchmark to the collection, since that was 
the program where Florian saw an appreciable difference with the fixed 
alignment.

The brief conclusion is that for the majority of benchmarks, neither 
fixing an alignment nor using a bitop implementation of align makes a 
difference.  For the few benchmarks where there is a slight runtime 
improvement, we see most of the improvement just with the bitop 
implementation of align, and little to no additional improvement due to 
fixing an alignment.

To focus on the behavior of the bitop implementation of align with a 
dynamic s->alignment, I ran the benchmarks a second time for any test for 
which |MLton0 - MLton2| > 0.3 or for which |MLton1 - MLton3| > 0.3 on the 
first benchmark run.  There was quite a bit of variability between the two 
runs, but I think there is a positive effect of the bitop implementation 
for lexgen and vliw (and possibly a negative effect on flat-array) on 
SHADOW and a positive effect for vliw (and maybe md5) on FENRIR.
And, ituitively, it seems that the bitop implementation of align would be 
more efficient than the division implementation.  So, I'll commit that 
patch shortly.

As for thread-switch, there is a significant speedup with the bitop 
implementation of align (and only a slight additional speedup due to a 
fixed alignment).  The explaination for this is not that thread-switch 
does a lot of allocation or garbage collection (it does neither), but 
rather due to the implementation of 
<src>/runtime/gc/switch-thread.c:GC_threadSwitch, which makes 
multiple calls to <src>/runtime/gc/current.c:getThreadCurrent:

objptr getThreadCurrentObjptr (GC_state s) {
   return s->currentThread;
}

GC_thread getThreadCurrent (GC_state s) {
   pointer p = objptrToPointer(getThreadCurrentObjptr(s), s->heap.start);
   return (GC_thread)(p + offsetofThread (s));
}

which calls <src>runtime/gc/thread.c:offsetofThread:

size_t sizeofThread (GC_state s) {
   size_t res;

   res = GC_NORMAL_HEADER_SIZE + sizeof (struct GC_thread);
   res = align (res, s->alignment);
   if (DEBUG) { ... }
   assert (isAligned (res, s->alignment));
   return res;
}

size_t offsetofThread (GC_state s) {
   return (sizeofThread (s)) - (GC_NORMAL_HEADER_SIZE + sizeof (struct GC_thread));
}

The offsetofThread function coordinates between the ML object pointer 
(which points to the word immediately after the ML object header) and a 
strut GC_thread pointer, which has a variable amount of padding for 
alignment purposes; see <src>/runtime/gc/thread.h:

/*
  * Thread objects are normal objects with the following layout:
  *
  * header ::
  * padding ::
  * bytesNeeded (size_t) ::
  * exnStack (size_t) ::
  * stack (object-pointer)
  *
  * There may be zero or more bytes of padding for alignment purposes.
  ...
  */
typedef struct GC_thread {
   size_t bytesNeeded;
   size_t exnStack;
   objptr stack;
} __attribute__ ((packed)) *GC_thread;


On a 32-bit platform, struct GC_thread is 12 bytes, and needs no bytes of 
padding for -align 4 and 4 bytes of padding for -align 8.  On a 64-bit 
platform, struct GC_thread is 24 bytes, and needs no bytes of padding for 
-align 4 or -align 8.  Hence, we need to dynamically determine the padding 
at runtime, when the alignment of the program is known.

Note, this dynamic padding for struct GC_thread was introduced with the 
64-bit port, so thread-switch did take a big performance hit with the 
port.

For thread-switch, the runtime is dominated by the GC_switchToThread 
calls, which do a number of align calls.  As Florian observed, this ends 
up stressing integer division.  The thread-switch benchmark shows that
when the bitop implementation of align is much faster than the division 
implementation, though the other benchmarks show that align rarely 
dominates a benchmark's runtime.

Interestingly, the align call in sizeofThread is:
   res = GC_NORMAL_HEADER_SIZE + sizeof (struct GC_thread);
   res = align (res, s->alignment);
where the value being aligned is a compile time constant.  Nonetheless, 
fixing an alignment (which would make sizeofThread and offsetofThread 
evaluate to compile time constants) does not significantly improve the 
performance of the thread-switch benchmark (over using the bitop 
implementation of align with a dynamic alignment).  So, I don't think that 
there is much to be gained from compiling the runtime multiple time, for 
each fixed alignment.

The benchmark runtime ratios are below; the full benchmark results are 
attached.


SHADOW (Dual-processor single-core AMD Opteron 2.00GHz, 8GB Memory, Fedora Core 7)
Linux shadow 2.6.23.1-10.fc7 #1 SMP Fri Oct 19 14:35:28 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-27)

MLton0 -- ~/devel/mlton/mlton.svn.trunk.align/build/bin/mlton -align 4 -runtime 'ram-slop 0.125'
MLton1 -- ~/devel/mlton/mlton.svn.trunk.align/build/bin/mlton -align 8 -runtime 'ram-slop 0.125'
MLton2 -- ~/devel/mlton/mlton.svn.trunk.align-bitop/build/bin/mlton -align 4 -runtime 'ram-slop 0.125'
MLton3 -- ~/devel/mlton/mlton.svn.trunk.align-bitop/build/bin/mlton -align 8 -runtime 'ram-slop 0.125'
MLton4 -- ~/devel/mlton/mlton.svn.trunk.align4/build/bin/mlton -align 4 -runtime 'ram-slop 0.125'
MLton5 -- ~/devel/mlton/mlton.svn.trunk.align4-bitop/build/bin/mlton -align 4 -runtime 'ram-slop 0.125'
MLton6 -- ~/devel/mlton/mlton.svn.trunk.align8/build/bin/mlton -align 8 -runtime 'ram-slop 0.125'
MLton7 -- ~/devel/mlton/mlton.svn.trunk.align8-bitop/build/bin/mlton -align 8 -runtime 'ram-slop 0.125'
run time ratio
benchmark         MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
barnes-hut          1.00   0.86   0.99   0.83   0.99   0.99   0.82   0.83
boyer               1.00   1.09   1.05   1.17   1.01   1.01   1.08   1.08
checksum            1.00   1.00   1.00   1.00   1.01   1.00   1.00   1.00
count-graphs        1.00   0.70   0.99   0.69   0.99   0.98   0.69   0.69
DLXSimulator        1.00   1.24   1.07   1.24   1.08   1.00   1.23   1.24
fft                 1.00   0.90   1.00   0.90   1.00   1.00   0.90   0.90
fib                 1.00   0.92   1.00   0.90   0.91   1.00   0.92   0.92
flat-array          1.00   1.08   1.23   1.17   0.99   1.24   1.08   1.06
hamlet              1.00   0.91   1.02   0.93   0.99   1.00   0.94   0.96
imp-for             1.00   1.30   1.00   1.30   1.00   1.00   1.30   1.30
knuth-bendix        1.00   0.77   1.00   0.78   1.01   1.00   0.79   0.78
lexgen              1.00   0.79   0.90   0.73   0.91   0.92   0.72   0.74
life                1.00   0.92   0.99   0.91   0.99   0.99   0.90   0.90
logic               1.00   0.79   0.94   0.69   0.92   0.93   0.69   0.69
mandelbrot          1.00   0.98   0.98   1.00   0.98   0.98   0.98   0.98
matrix-multiply     1.00   0.98   1.00   0.99   1.00   1.01   1.02   0.98
md5                 1.00   0.96   0.98   0.95   0.98   0.98   0.95   0.95
merge               1.00   0.99   1.01   0.99   1.00   1.00   1.06   0.99
mlyacc              1.00   0.91   0.98   0.89   0.98   0.97   0.89   0.90
model-elimination   1.00   0.88   0.97   0.86   0.97   0.97   0.87   0.86
mpuz                1.00   0.77   1.00   0.77   1.00   1.00   0.77   0.77
nucleic             1.00   1.03   1.00   1.02   1.01   1.00   1.03   1.03
output1             1.00   0.56   1.00   0.56   1.00   1.00   0.56   0.56
peek                1.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00
psdes-random        1.00   0.88   1.00   0.89   1.00   1.00   0.93   0.93
ratio-regions       1.00   0.81   0.90   0.81   1.00   0.99   0.81   0.72
ray                 1.00   0.87   1.00   0.86   1.00   1.00   0.86   0.85
raytrace            1.00   0.72   1.00   0.72   1.00   1.00   0.71   0.72
simple              1.00   0.92   1.00   0.91   1.00   1.00   0.92   0.92
smith-normal-form   1.00   1.01   1.00   1.00   0.99   0.99   1.00   1.01
tailfib             1.00   1.27   0.99   1.27   1.00   1.00   1.27   1.27
tak                 1.00   0.88   1.00   0.88   1.01   1.00   0.88   0.88
tensor              1.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00
thread-switch       1.00   0.88   0.54   0.42   0.50   0.51   0.37   0.37
tsp                 1.00   1.01   1.00   1.04   1.03   1.03   1.04   1.01
tyan                1.00   0.87   1.00   0.87   0.99   0.99   0.85   0.86
vector-concat       1.00   0.98   1.00   1.00   1.00   1.01   0.99   0.99
vector-rev          1.00   0.80   1.00   0.80   1.03   1.06   0.80   0.80
vliw                1.00   0.77   0.80   0.67   0.80   0.78   0.67   0.66
wc-input1           1.00   0.80   1.00   0.80   1.00   1.00   0.80   0.80
wc-scanStream       1.00   0.97   0.98   0.96   0.99   0.99   0.97   0.97
zebra               1.00   0.65   0.99   0.64   0.99   0.99   0.64   0.64
zern                1.00   0.79   1.00   0.79   0.98   1.01   0.79   0.79

run time ratio
benchmark         MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
boyer               1.00   1.10   0.92   0.98   0.93   0.93   0.99   0.99
DLXSimulator        1.00   1.09   0.94   1.09   0.94   0.88   1.08   1.09
flat-array          1.00   1.00   1.00   1.00   1.00   1.17   1.12   1.00
lexgen              1.00   0.78   0.89   0.72   0.88   0.90   0.72   0.72
logic               1.00   0.77   1.01   0.75   1.00   1.00   0.77   0.75
ratio-regions       1.00   0.87   0.96   0.82   1.07   0.96   0.77   0.86
thread-switch       1.00   0.86   0.53   0.41   0.50   0.50   0.36   0.37
vliw                1.00   0.77   0.80   0.67   0.80   0.78   0.67   0.67


FENRIR (Dual-processor dual-core Intel Xeon 2.66GHz, 2GB Memory, Mac OS X 10.4)
Darwin fenrir.uchicago.edu 8.10.1 Darwin Kernel Version 8.10.1: Wed May 23 16:33:00 PDT 2007; root:xnu-792.22.5~1/RELEASE_I386 i386 i386
i686-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5367)

MLton0 -- ~/devel/mlton/mlton.svn.trunk.align/build/bin/mlton -align 4
MLton1 -- ~/devel/mlton/mlton.svn.trunk.align/build/bin/mlton -align 8
MLton2 -- ~/devel/mlton/mlton.svn.trunk.align-bitop/build/bin/mlton -align 4
MLton3 -- ~/devel/mlton/mlton.svn.trunk.align-bitop/build/bin/mlton -align 8
MLton4 -- ~/devel/mlton/mlton.svn.trunk.align4/build/bin/mlton -align 4
MLton5 -- ~/devel/mlton/mlton.svn.trunk.align4-bitop/build/bin/mlton -align 4
MLton6 -- ~/devel/mlton/mlton.svn.trunk.align8/build/bin/mlton -align 8
MLton7 -- ~/devel/mlton/mlton.svn.trunk.align8-bitop/build/bin/mlton -align 8

run time ratio
benchmark         MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
barnes-hut          1.00   0.89   1.00   0.89   1.00   1.00   0.89   0.89
boyer               1.00   1.24   1.01   1.24   1.01   1.01   1.25   1.25
checksum            1.00   1.00   1.00   1.00   1.00   0.98   1.00   1.00
count-graphs        1.00   1.08   1.00   1.04   0.99   1.00   1.04   1.04
DLXSimulator        1.00   1.26   1.01   1.27   1.00   1.00   1.26   1.27
fft                 1.00   0.94   1.00   0.94   1.00   1.00   0.94   0.94
fib                 1.00   0.98   1.00   0.97   1.00   1.02   0.98   0.97
flat-array          1.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00
hamlet              1.00   1.19   1.01   1.21   1.03   1.01   1.20   1.21
imp-for             1.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00
knuth-bendix        1.00   0.99   1.00   0.99   1.00   1.00   0.98   0.99
lexgen              1.00   1.06   0.97   1.04   0.94   0.93   1.00   1.00
life                1.00   1.13   1.00   1.13   1.00   1.00   1.13   1.13
logic               1.00   1.07   1.00   1.08   1.00   1.00   1.09   1.09
mandelbrot          1.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00
matrix-multiply     1.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00
md5                 1.00   1.00   0.89   0.95   0.88   0.90   0.94   0.94
merge               1.00   1.36   1.00   1.37   1.00   1.00   1.37   1.37
mlyacc              1.00   1.21   0.99   1.21   0.99   0.99   1.20   1.20
model-elimination   1.00   1.04   1.00   1.05   1.00   1.00   1.03   1.04
mpuz                1.00   0.97   0.99   1.03   0.99   0.98   0.97   0.97
nucleic             1.00   0.86   1.00   0.86   1.00   1.00   0.86   0.86
output1             1.00   1.02   1.00   1.02   1.00   1.00   1.02   1.02
peek                1.00   1.00   0.99   1.00   0.98   0.97   1.00   1.00
psdes-random        1.00   1.01   1.00   1.00   1.00   1.00   1.00   1.00
ratio-regions       1.00   1.00   1.05   1.00   1.00   1.00   1.00   1.00
ray                 1.00   0.96   1.02   0.98   1.02   0.99   0.98   0.96
raytrace            1.00   0.83   1.00   0.83   1.00   1.00   0.83   0.83
simple              1.00   1.03   1.00   1.02   1.00   1.00   1.02   1.04
smith-normal-form   1.00   0.99   1.00   0.99   1.00   1.00   0.99   1.00
tailfib             1.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00
tak                 1.00   0.99   0.98   0.96   0.98   0.99   0.95   1.02
tensor              1.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00
thread-switch       1.00   1.06   0.81   0.87   0.80   0.80   0.84   0.85
tsp                 1.00   0.97   1.00   0.97   1.00   1.00   1.00   0.97
tyan                1.00   1.10   1.01   1.12   1.01   1.01   1.10   1.11
vector-concat       1.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00
vector-rev          1.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00
vliw                1.00   1.12   0.91   1.07   0.91   0.90   1.01   1.02
wc-input1           1.00   1.02   1.01   1.01   1.00   1.00   1.02   1.01
wc-scanStream       1.00   1.01   1.00   1.01   1.00   1.00   1.01   1.00
zebra               1.00   0.98   0.99   0.97   0.99   0.99   0.97   0.97
zern                1.00   0.88   1.03   0.91   1.03   1.03   0.91   0.91

run time ratio
benchmark         MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
count-graphs        1.00   1.08   1.00   1.04   0.99   1.00   1.04   1.04
md5                 1.00   1.02   0.96   1.02   0.95   0.97   1.02   1.01
mpuz                1.00   0.97   0.99   0.97   0.99   0.99   0.97   0.97
ratio-regions       1.00   1.00   1.03   1.00   1.00   1.00   1.00   1.00
thread-switch       1.00   1.06   0.81   0.87   0.80   0.80   0.84   0.85
vliw                1.00   1.12   0.91   1.06   0.91   0.90   1.01   1.01
-------------- next part --------------
SHADOW (Dual-processor single-core AMD Opteron 2.00GHz, 8GB Memory, Fedora Core 7)
Linux shadow 2.6.23.1-10.fc7 #1 SMP Fri Oct 19 14:35:28 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-27)

MLton0 -- ~/devel/mlton/mlton.svn.trunk.align/build/bin/mlton -align 4 -runtime 'ram-slop 0.125'
MLton1 -- ~/devel/mlton/mlton.svn.trunk.align/build/bin/mlton -align 8 -runtime 'ram-slop 0.125'
MLton2 -- ~/devel/mlton/mlton.svn.trunk.align-bitop/build/bin/mlton -align 4 -runtime 'ram-slop 0.125'
MLton3 -- ~/devel/mlton/mlton.svn.trunk.align-bitop/build/bin/mlton -align 8 -runtime 'ram-slop 0.125'
MLton4 -- ~/devel/mlton/mlton.svn.trunk.align4/build/bin/mlton -align 4 -runtime 'ram-slop 0.125'
MLton5 -- ~/devel/mlton/mlton.svn.trunk.align4-bitop/build/bin/mlton -align 4 -runtime 'ram-slop 0.125'
MLton6 -- ~/devel/mlton/mlton.svn.trunk.align8/build/bin/mlton -align 8 -runtime 'ram-slop 0.125'
MLton7 -- ~/devel/mlton/mlton.svn.trunk.align8-bitop/build/bin/mlton -align 8 -runtime 'ram-slop 0.125'
run time ratio
benchmark         MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
barnes-hut          1.00   0.86   0.99   0.83   0.99   0.99   0.82   0.83
boyer               1.00   1.09   1.05   1.17   1.01   1.01   1.08   1.08
checksum            1.00   1.00   1.00   1.00   1.01   1.00   1.00   1.00
count-graphs        1.00   0.70   0.99   0.69   0.99   0.98   0.69   0.69
DLXSimulator        1.00   1.24   1.07   1.24   1.08   1.00   1.23   1.24
fft                 1.00   0.90   1.00   0.90   1.00   1.00   0.90   0.90
fib                 1.00   0.92   1.00   0.90   0.91   1.00   0.92   0.92
flat-array          1.00   1.08   1.23   1.17   0.99   1.24   1.08   1.06
hamlet              1.00   0.91   1.02   0.93   0.99   1.00   0.94   0.96
imp-for             1.00   1.30   1.00   1.30   1.00   1.00   1.30   1.30
knuth-bendix        1.00   0.77   1.00   0.78   1.01   1.00   0.79   0.78
lexgen              1.00   0.79   0.90   0.73   0.91   0.92   0.72   0.74
life                1.00   0.92   0.99   0.91   0.99   0.99   0.90   0.90
logic               1.00   0.79   0.94   0.69   0.92   0.93   0.69   0.69
mandelbrot          1.00   0.98   0.98   1.00   0.98   0.98   0.98   0.98
matrix-multiply     1.00   0.98   1.00   0.99   1.00   1.01   1.02   0.98
md5                 1.00   0.96   0.98   0.95   0.98   0.98   0.95   0.95
merge               1.00   0.99   1.01   0.99   1.00   1.00   1.06   0.99
mlyacc              1.00   0.91   0.98   0.89   0.98   0.97   0.89   0.90
model-elimination   1.00   0.88   0.97   0.86   0.97   0.97   0.87   0.86
mpuz                1.00   0.77   1.00   0.77   1.00   1.00   0.77   0.77
nucleic             1.00   1.03   1.00   1.02   1.01   1.00   1.03   1.03
output1             1.00   0.56   1.00   0.56   1.00   1.00   0.56   0.56
peek                1.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00
psdes-random        1.00   0.88   1.00   0.89   1.00   1.00   0.93   0.93
ratio-regions       1.00   0.81   0.90   0.81   1.00   0.99   0.81   0.72
ray                 1.00   0.87   1.00   0.86   1.00   1.00   0.86   0.85
raytrace            1.00   0.72   1.00   0.72   1.00   1.00   0.71   0.72
simple              1.00   0.92   1.00   0.91   1.00   1.00   0.92   0.92
smith-normal-form   1.00   1.01   1.00   1.00   0.99   0.99   1.00   1.01
tailfib             1.00   1.27   0.99   1.27   1.00   1.00   1.27   1.27
tak                 1.00   0.88   1.00   0.88   1.01   1.00   0.88   0.88
tensor              1.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00
thread-switch       1.00   0.88   0.54   0.42   0.50   0.51   0.37   0.37
tsp                 1.00   1.01   1.00   1.04   1.03   1.03   1.04   1.01
tyan                1.00   0.87   1.00   0.87   0.99   0.99   0.85   0.86
vector-concat       1.00   0.98   1.00   1.00   1.00   1.01   0.99   0.99
vector-rev          1.00   0.80   1.00   0.80   1.03   1.06   0.80   0.80
vliw                1.00   0.77   0.80   0.67   0.80   0.78   0.67   0.66
wc-input1           1.00   0.80   1.00   0.80   1.00   1.00   0.80   0.80
wc-scanStream       1.00   0.97   0.98   0.96   0.99   0.99   0.97   0.97
zebra               1.00   0.65   0.99   0.64   0.99   0.99   0.64   0.64
zern                1.00   0.79   1.00   0.79   0.98   1.01   0.79   0.79
size
benchmark            MLton0    MLton1    MLton2    MLton3    MLton4    MLton5    MLton6    MLton7
barnes-hut          167,542   168,070   167,062   167,590   165,502   165,446   166,030   165,974
boyer               213,513   213,577   213,033   213,097   211,473   211,417   211,537   211,481
checksum             93,529    93,561    93,049    93,081    91,489    91,433    91,521    91,465
count-graphs        119,561   119,785   119,081   119,305   117,521   117,465   117,745   117,689
DLXSimulator        195,972   196,580   195,492   196,100   193,932   193,876   194,540   194,484
fft                 117,319   117,335   116,839   116,855   115,279   115,223   115,295   115,239
fib                  93,417    93,401    92,937    92,921    91,377    91,321    91,361    91,305
flat-array           92,953    92,985    92,473    92,505    90,913    90,857    90,945    90,889
hamlet            1,503,617 1,516,881 1,503,137 1,516,401 1,501,577 1,501,521 1,514,841 1,514,785
imp-for              93,241    93,257    92,761    92,777    91,201    91,145    91,217    91,161
knuth-bendix        171,716   172,532   171,236   172,052   169,676   169,620   170,492   170,436
lexgen              285,107   286,611   284,627   286,131   283,067   283,011   284,571   284,515
life                117,881   117,865   117,401   117,385   115,841   115,785   115,825   115,769
logic               177,673   177,561   177,193   177,081   175,633   175,577   175,521   175,465
mandelbrot           93,129    93,145    92,649    92,665    91,089    91,033    91,105    91,049
matrix-multiply      95,113    95,129    94,633    94,649    93,073    93,017    93,089    93,033
md5                 126,804   127,364   126,324   126,884   124,764   124,708   125,324   125,268
merge                94,745    94,793    94,265    94,313    92,705    92,649    92,753    92,697
mlyacc              657,203   661,411   656,723   660,931   655,163   655,107   659,371   659,315
model-elimination   849,466   851,850   848,986   851,370   847,426   847,370   849,810   849,754
mpuz                 99,353    99,417    98,873    98,937    97,313    97,257    97,377    97,321
nucleic             269,048   269,176   268,568   268,696   267,008   266,952   267,136   267,080
output1             136,184   136,824   135,704   136,344   134,144   134,088   134,784   134,728
peek                132,340   132,884   131,860   132,404   130,300   130,244   130,844   130,788
psdes-random         96,313    96,377    95,833    95,897    94,273    94,217    94,337    94,281
ratio-regions       120,857   120,873   120,377   120,393   118,817   118,761   118,833   118,777
ray                 244,704   245,648   244,224   245,168   242,664   242,608   243,608   243,552
raytrace            372,714   373,738   372,234   373,258   370,674   370,618   371,698   371,642
simple              343,377   344,161   342,897   343,681   341,337   341,281   342,121   342,065
smith-normal-form   271,668   284,692   271,188   284,212   269,628   269,572   282,652   282,596
tailfib              92,985    93,001    92,505    92,521    90,945    90,889    90,961    90,905
tak                  93,465    93,417    92,985    92,937    91,425    91,369    91,377    91,321
tensor              162,251   162,891   161,771   162,411   160,211   160,155   160,851   160,795
thread-switch       141,476   142,100   140,996   141,620   139,436   139,380   140,060   140,004
tsp                 139,347   139,955   138,867   139,475   137,307   137,251   137,915   137,859
tyan                212,212   213,236   211,732   212,756   210,172   210,116   211,196   211,140
vector-concat        94,761    94,793    94,281    94,313    92,721    92,665    92,753    92,697
vector-rev           94,521    94,553    94,041    94,073    92,481    92,425    92,513    92,457
vliw                518,946   520,418   518,466   519,938   516,906   516,850   518,378   518,322
wc-input1           158,850   159,570   158,370   159,090   156,810   156,754   157,530   157,474
wc-scanStream       169,666   170,370   169,186   169,890   167,626   167,570   168,330   168,274
zebra               212,436   213,396   211,956   212,916   210,396   210,340   211,356   211,300
zern                132,174   132,190   131,694   131,710   130,134   130,078   130,150   130,094
compile time
benchmark         MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
barnes-hut          9.77  10.07  10.45  11.05  11.14   9.79  11.26   9.82
boyer              11.43   9.89   9.83   9.85   9.96   9.72  11.70  11.74
checksum            8.58   7.50   8.36   8.47   7.45   8.51   7.23   8.62
count-graphs        9.46   9.59   9.39   9.56   8.22   9.59   8.21   9.57
DLXSimulator       12.27  12.17  10.53  11.90  10.47  12.08  12.09  12.23
fft                 9.15   7.91   7.87   7.75   7.89   9.37   9.14   7.82
fib                 8.40   7.30   8.35   7.24   8.40   7.37   8.64   7.39
flat-array          8.42   7.32   8.37   8.48   8.54   7.35   7.25   8.70
hamlet             52.31  52.22  54.46  47.50  54.29  45.89  54.60  46.20
imp-for             8.72   8.72   8.50   8.65   7.38   8.90   7.36   8.88
knuth-bendix       10.78  10.80   9.12   9.26   9.18  10.63   9.02   8.93
lexgen             13.77  13.83  13.70  11.94  12.16  13.76  14.12  13.26
life                9.25   7.93   8.89   9.10   8.13   9.20   9.38   9.47
logic              10.50  11.04  10.65  10.65  10.56   9.27  10.69   9.25
mandelbrot          7.39   7.34   8.28   7.41   7.40   8.50   8.39   8.62
matrix-multiply     8.23   8.32   7.42   7.52   8.36   8.47   8.46   8.56
md5                 9.25   9.28   9.16   9.28   8.28   8.33   8.17   9.59
merge               7.38   8.22   8.23   7.39   7.31   8.32   8.39   7.96
mlyacc             29.83  29.57  26.78  26.54  30.07  29.82  30.17  26.75
model-elimination  28.19  25.40  27.54  24.99  28.51  24.39  24.95  27.22
mpuz                8.45   7.47   8.41   8.41   7.45   7.45   7.34   7.43
nucleic            12.52  12.53  11.26  11.20  11.16  11.22  11.02  12.82
output1             9.47   9.49   8.39   9.39   9.34   8.25   9.52   8.24
peek                9.20   9.22   8.39   9.42   8.18   9.46   8.19   8.17
psdes-random        7.57   8.45   7.55   7.47   8.47   8.51   8.60   7.38
ratio-regions       9.83   9.82   9.76   8.69   9.93   8.78   8.73  10.09
ray                11.24  12.43  12.36  11.99  12.32  11.51  12.69  11.13
raytrace           14.87  14.95  14.48  15.45  15.41  16.39  16.75  14.36
simple             13.00  13.11  14.50  12.98  14.62  12.71  13.14  14.60
smith-normal-form  11.18  11.31  11.22  11.53  12.62  12.77  13.02  11.63
tailfib             8.34   7.57   7.34   8.38   7.32   7.32   8.37   8.59
tak                 7.30   7.37   8.22   8.03   8.28   8.45   7.25   8.44
tensor              9.91  11.33  11.39  11.25  10.31  11.22  10.20   9.70
thread-switch       9.46   8.43   9.37   9.39   9.53   8.74   9.80   8.37
tsp                 8.58   9.63   9.54   8.49   8.65   8.53   8.43   8.56
tyan               10.62  10.61  10.61  10.76  11.92  10.64  10.34  12.21
vector-concat       8.40   8.30   8.31   8.33   7.46   7.36   7.46   7.46
vector-rev          8.25   7.91   7.28   7.30   7.48   7.35   7.24   7.38
vliw               21.40  19.38  21.08  18.74  19.27  21.76  21.76  21.44
wc-input1           8.77  10.06  10.08  10.06  10.03   8.96   8.66   8.84
wc-scanStream       9.28  10.30  10.31  10.34   9.40  10.69   9.26  10.76
zebra              10.66  10.67  11.78  10.61  10.65  11.89  12.19  11.85
zern                9.40   9.36   8.40   8.28   9.35   9.63   8.14   8.30
run time
benchmark         MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
barnes-hut         15.80  13.62  15.66  13.16  15.69  15.64  13.03  13.18
boyer              38.36  41.92  40.14  44.90  38.73  38.80  41.33  41.55
checksum           18.49  18.54  18.55  18.54  18.66  18.54  18.55  18.54
count-graphs       32.82  23.09  32.39  22.55  32.36  32.32  22.53  22.53
DLXSimulator       27.28  33.74  29.31  33.74  29.38  27.27  33.67  33.82
fft                15.50  13.97  15.50  13.97  15.50  15.48  13.97  13.97
fib                41.14  37.81  41.14  37.22  37.41  41.15  37.80  37.81
flat-array         28.32  30.45  34.87  33.01  28.05  35.12  30.45  30.08
hamlet             42.23  38.26  43.23  39.46  41.97  42.21  39.54  40.65
imp-for            26.67  34.64  26.67  34.65  26.67  26.67  34.64  34.64
knuth-bendix       23.79  18.42  23.88  18.58  23.97  23.85  18.89  18.51
lexgen             24.42  19.21  22.01  17.78  22.26  22.46  17.69  18.12
life               19.29  17.72  19.08  17.55  19.10  19.07  17.40  17.41
logic              29.55  23.28  27.72  20.48  27.33  27.59  20.44  20.54
mandelbrot         21.18  20.78  20.74  21.18  20.74  20.74  20.74  20.74
matrix-multiply    27.35  26.92  27.29  26.94  27.28  27.70  27.96  26.83
md5                33.74  32.48  33.20  32.02  33.20  33.20  32.05  31.99
merge              52.00  51.39  52.38  51.25  52.10  52.00  55.28  51.28
mlyacc             25.85  23.52  25.24  23.08  25.28  25.13  23.08  23.18
model-elimination  36.54  32.18  35.31  31.59  35.55  35.49  31.65  31.49
mpuz               27.20  20.85  27.20  20.86  27.21  27.21  20.85  20.85
nucleic            15.44  15.93  15.43  15.82  15.60  15.39  15.87  15.90
output1            41.55  23.35  41.54  23.33  41.53  41.53  23.31  23.30
peek               34.89  34.89  34.90  34.89  34.91  34.89  34.89  34.89
psdes-random       18.01  15.91  18.01  15.99  18.01  18.01  16.72  16.72
ratio-regions     142.15 114.67 127.38 115.31 141.84 141.09 115.23 102.65
ray                16.99  14.73  16.94  14.66  17.02  16.96  14.55  14.49
raytrace           20.50  14.71  20.52  14.71  20.53  20.48  14.65  14.69
simple             23.57  21.57  23.49  21.55  23.59  23.53  21.63  21.59
smith-normal-form   8.38   8.43   8.35   8.39   8.33   8.33   8.39   8.43
tailfib            23.67  30.14  23.41  30.14  23.68  23.71  30.14  30.14
tak                31.81  28.14  31.82  28.15  32.03  31.82  28.14  28.15
tensor             22.70  22.70  22.70  22.70  22.69  22.70  22.70  22.70
thread-switch      78.82  69.11  42.84  33.20  39.22  40.23  29.29  29.40
tsp                21.78  21.91  21.77  22.76  22.35  22.35  22.73  21.90
tyan               26.91  23.41  26.81  23.34  26.68  26.75  22.92  23.27
vector-concat      28.64  28.18  28.69  28.61  28.78  28.80  28.29  28.31
vector-rev         45.20  36.31  45.26  36.31  46.77  47.73  36.17  36.29
vliw               32.21  24.65  25.68  21.67  25.83  25.21  21.66  21.34
wc-input1          34.72  27.85  34.60  27.68  34.72  34.76  27.67  27.78
wc-scanStream      28.80  27.89  28.27  27.73  28.66  28.47  27.91  27.87
zebra              40.92  26.56  40.60  26.21  40.61  40.61  26.21  26.22
zern               24.90  19.74  24.91  19.69  24.50  25.03  19.64  19.62

MLton0 -- ~/devel/mlton/mlton.svn.trunk.align/build/bin/mlton -align 4 -runtime 'ram-slop 0.125'
MLton1 -- ~/devel/mlton/mlton.svn.trunk.align/build/bin/mlton -align 8 -runtime 'ram-slop 0.125'
MLton2 -- ~/devel/mlton/mlton.svn.trunk.align-bitop/build/bin/mlton -align 4 -runtime 'ram-slop 0.125'
MLton3 -- ~/devel/mlton/mlton.svn.trunk.align-bitop/build/bin/mlton -align 8 -runtime 'ram-slop 0.125'
MLton4 -- ~/devel/mlton/mlton.svn.trunk.align4/build/bin/mlton -align 4 -runtime 'ram-slop 0.125'
MLton5 -- ~/devel/mlton/mlton.svn.trunk.align4-bitop/build/bin/mlton -align 4 -runtime 'ram-slop 0.125'
MLton6 -- ~/devel/mlton/mlton.svn.trunk.align8/build/bin/mlton -align 8 -runtime 'ram-slop 0.125'
MLton7 -- ~/devel/mlton/mlton.svn.trunk.align8-bitop/build/bin/mlton -align 8 -runtime 'ram-slop 0.125'
run time ratio
benchmark     MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
boyer           1.00   1.10   0.92   0.98   0.93   0.93   0.99   0.99
DLXSimulator    1.00   1.09   0.94   1.09   0.94   0.88   1.08   1.09
flat-array      1.00   1.00   1.00   1.00   1.00   1.17   1.12   1.00
lexgen          1.00   0.78   0.89   0.72   0.88   0.90   0.72   0.72
logic           1.00   0.77   1.01   0.75   1.00   1.00   0.77   0.75
ratio-regions   1.00   0.87   0.96   0.82   1.07   0.96   0.77   0.86
thread-switch   1.00   0.86   0.53   0.41   0.50   0.50   0.36   0.37
vliw            1.00   0.77   0.80   0.67   0.80   0.78   0.67   0.67
size
benchmark      MLton0  MLton1  MLton2  MLton3  MLton4  MLton5  MLton6  MLton7
boyer         213,513 213,577 213,033 213,097 211,473 211,417 211,537 211,481
DLXSimulator  195,972 196,580 195,492 196,100 193,932 193,876 194,540 194,484
flat-array     92,953  92,985  92,473  92,505  90,913  90,857  90,945  90,889
lexgen        285,107 286,611 284,627 286,131 283,067 283,011 284,571 284,515
logic         177,673 177,561 177,193 177,081 175,633 175,577 175,521 175,465
ratio-regions 120,857 120,873 120,377 120,393 118,817 118,761 118,833 118,777
thread-switch 141,476 142,100 140,996 141,620 139,436 139,380 140,060 140,004
vliw          518,946 520,418 518,466 519,938 516,906 516,850 518,378 518,322
compile time
benchmark     MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
boyer          10.42  10.66   9.85  11.31   9.85   9.85  11.47  11.48
DLXSimulator   12.20  10.57  10.39  11.85  10.39  12.15  11.98  10.36
flat-array      8.47   8.46   8.30   7.35   7.44   7.36   8.63   7.33
lexgen         11.79  11.85  13.71  13.74  13.04  11.62  12.08  11.59
logic          10.06   9.40  10.71  10.69  10.90   9.12   9.41   9.13
ratio-regions  10.21   8.79   8.68  10.08  10.14   8.92  10.43  10.43
thread-switch   8.42   8.43   8.34   8.33   9.72   9.81  10.00   9.98
vliw           22.18  19.82  18.56  21.47  18.98  19.07  22.06  21.77
run time
benchmark     MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
boyer          40.66  44.66  37.55  39.89  37.65  37.72  40.22  40.14
DLXSimulator   30.34  32.94  28.52  32.98  28.60  26.81  32.91  33.03
flat-array     25.07  25.16  25.08  25.17  25.07  29.24  28.12  25.17
lexgen         23.99  18.81  21.26  17.35  21.17  21.50  17.27  17.33
logic          27.69  21.26  27.87  20.71  27.62  27.62  21.25  20.72
ratio-regions 132.97 115.51 127.43 109.44 142.49 127.32 102.73 114.90
thread-switch  80.24  69.05  42.87  33.04  40.22  40.18  29.18  29.42
vliw           32.44  24.85  25.81  21.64  26.11  25.45  21.63  21.61


FENRIR (Dual-processor dual-core Intel Xeon 2.66GHz, 2GB Memory, Mac OS X 10.4)
Darwin fenrir.uchicago.edu 8.10.1 Darwin Kernel Version 8.10.1: Wed May 23 16:33:00 PDT 2007; root:xnu-792.22.5~1/RELEASE_I386 i386 i386
i686-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5367)

MLton0 -- ~/devel/mlton/mlton.svn.trunk.align/build/bin/mlton -align 4
MLton1 -- ~/devel/mlton/mlton.svn.trunk.align/build/bin/mlton -align 8
MLton2 -- ~/devel/mlton/mlton.svn.trunk.align-bitop/build/bin/mlton -align 4
MLton3 -- ~/devel/mlton/mlton.svn.trunk.align-bitop/build/bin/mlton -align 8
MLton4 -- ~/devel/mlton/mlton.svn.trunk.align4/build/bin/mlton -align 4
MLton5 -- ~/devel/mlton/mlton.svn.trunk.align4-bitop/build/bin/mlton -align 4
MLton6 -- ~/devel/mlton/mlton.svn.trunk.align8/build/bin/mlton -align 8
MLton7 -- ~/devel/mlton/mlton.svn.trunk.align8-bitop/build/bin/mlton -align 8
run time ratio
benchmark         MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
barnes-hut          1.00   0.89   1.00   0.89   1.00   1.00   0.89   0.89
boyer               1.00   1.24   1.01   1.24   1.01   1.01   1.25   1.25
checksum            1.00   1.00   1.00   1.00   1.00   0.98   1.00   1.00
count-graphs        1.00   1.08   1.00   1.04   0.99   1.00   1.04   1.04
DLXSimulator        1.00   1.26   1.01   1.27   1.00   1.00   1.26   1.27
fft                 1.00   0.94   1.00   0.94   1.00   1.00   0.94   0.94
fib                 1.00   0.98   1.00   0.97   1.00   1.02   0.98   0.97
flat-array          1.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00
hamlet              1.00   1.19   1.01   1.21   1.03   1.01   1.20   1.21
imp-for             1.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00
knuth-bendix        1.00   0.99   1.00   0.99   1.00   1.00   0.98   0.99
lexgen              1.00   1.06   0.97   1.04   0.94   0.93   1.00   1.00
life                1.00   1.13   1.00   1.13   1.00   1.00   1.13   1.13
logic               1.00   1.07   1.00   1.08   1.00   1.00   1.09   1.09
mandelbrot          1.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00
matrix-multiply     1.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00
md5                 1.00   1.00   0.89   0.95   0.88   0.90   0.94   0.94
merge               1.00   1.36   1.00   1.37   1.00   1.00   1.37   1.37
mlyacc              1.00   1.21   0.99   1.21   0.99   0.99   1.20   1.20
model-elimination   1.00   1.04   1.00   1.05   1.00   1.00   1.03   1.04
mpuz                1.00   0.97   0.99   1.03   0.99   0.98   0.97   0.97
nucleic             1.00   0.86   1.00   0.86   1.00   1.00   0.86   0.86
output1             1.00   1.02   1.00   1.02   1.00   1.00   1.02   1.02
peek                1.00   1.00   0.99   1.00   0.98   0.97   1.00   1.00
psdes-random        1.00   1.01   1.00   1.00   1.00   1.00   1.00   1.00
ratio-regions       1.00   1.00   1.05   1.00   1.00   1.00   1.00   1.00
ray                 1.00   0.96   1.02   0.98   1.02   0.99   0.98   0.96
raytrace            1.00   0.83   1.00   0.83   1.00   1.00   0.83   0.83
simple              1.00   1.03   1.00   1.02   1.00   1.00   1.02   1.04
smith-normal-form   1.00   0.99   1.00   0.99   1.00   1.00   0.99   1.00
tailfib             1.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00
tak                 1.00   0.99   0.98   0.96   0.98   0.99   0.95   1.02
tensor              1.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00
thread-switch       1.00   1.06   0.81   0.87   0.80   0.80   0.84   0.85
tsp                 1.00   0.97   1.00   0.97   1.00   1.00   1.00   0.97
tyan                1.00   1.10   1.01   1.12   1.01   1.01   1.10   1.11
vector-concat       1.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00
vector-rev          1.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00
vliw                1.00   1.12   0.91   1.07   0.91   0.90   1.01   1.02
wc-input1           1.00   1.02   1.01   1.01   1.00   1.00   1.02   1.01
wc-scanStream       1.00   1.01   1.00   1.01   1.00   1.00   1.01   1.00
zebra               1.00   0.98   0.99   0.97   0.99   0.99   0.97   0.97
zern                1.00   0.88   1.03   0.91   1.03   1.03   0.91   0.91
size
benchmark            MLton0    MLton1    MLton2    MLton3    MLton4    MLton5    MLton6    MLton7
barnes-hut          167,936   167,936   163,840   167,936   163,840   163,840   163,840   163,840
boyer               204,800   212,992   204,800   208,896   200,704   200,704   208,896   208,896
checksum            106,496   106,496   106,496   106,496   102,400   102,400   102,400   102,400
count-graphs        126,976   126,976   126,976   126,976   122,880   122,880   122,880   122,880
DLXSimulator        196,608   196,608   192,512   196,608   192,512   192,512   196,608   196,608
fft                 126,976   126,976   126,976   126,976   122,880   122,880   122,880   122,880
fib                 106,496   106,496   102,400   102,400   102,400   102,400   102,400   102,400
flat-array          102,400   106,496   102,400   102,400   102,400   102,400   102,400   102,400
hamlet            1,335,296 1,363,968 1,335,296 1,359,872 1,335,296 1,335,296 1,359,872 1,359,872
imp-for             106,496   106,496   102,400   102,400   102,400   102,400   102,400   102,400
knuth-bendix        176,128   176,128   176,128   176,128   172,032   172,032   176,128   176,128
lexgen              274,432   278,528   274,432   274,432   270,336   270,336   274,432   274,432
life                126,976   131,072   126,976   126,976   126,976   126,976   126,976   126,976
logic               172,032   176,128   172,032   176,128   172,032   172,032   176,128   176,128
mandelbrot          106,496   106,496   102,400   102,400   102,400   102,400   102,400   102,400
matrix-multiply     106,496   106,496   102,400   102,400   102,400   102,400   102,400   102,400
md5                 139,264   139,264   135,168   139,264   135,168   135,168   135,168   135,168
merge               106,496   106,496   102,400   102,400   102,400   102,400   102,400   102,400
mlyacc              606,208   614,400   606,208   614,400   602,112   602,112   610,304   610,304
model-elimination   729,088   741,376   729,088   741,376   729,088   724,992   741,376   741,376
mpuz                114,688   114,688   110,592   110,592   110,592   110,592   110,592   110,592
nucleic             278,528   282,624   278,528   282,624   278,528   278,528   282,624   282,624
output1             143,360   143,360   143,360   143,360   139,264   139,264   143,360   143,360
peek                143,360   143,360   139,264   143,360   139,264   139,264   139,264   139,264
psdes-random        106,496   106,496   106,496   106,496   102,400   102,400   102,400   102,400
ratio-regions       131,072   131,072   126,976   126,976   126,976   126,976   126,976   126,976
ray                 237,568   237,568   233,472   237,568   233,472   233,472   237,568   237,568
raytrace            331,776   339,968   331,776   335,872   327,680   327,680   335,872   335,872
simple              307,200   311,296   303,104   311,296   303,104   303,104   307,200   307,200
smith-normal-form   262,144   278,528   262,144   274,432   262,144   262,144   274,432   274,432
tailfib             102,400   106,496   102,400   102,400   102,400   102,400   102,400   102,400
tak                 106,496   106,496   102,400   102,400   102,400   102,400   102,400   102,400
tensor              167,936   172,032   167,936   167,936   163,840   163,840   167,936   167,936
thread-switch       151,552   155,648   151,552   151,552   151,552   151,552   151,552   151,552
tsp                 143,360   147,456   143,360   143,360   143,360   143,360   143,360   143,360
tyan                208,896   212,992   208,896   208,896   204,800   204,800   208,896   208,896
vector-concat       106,496   106,496   102,400   102,400   102,400   102,400   102,400   102,400
vector-rev          106,496   106,496   102,400   102,400   102,400   102,400   102,400   102,400
vliw                466,944   475,136   462,848   471,040   462,848   462,848   471,040   471,040
wc-input1           167,936   167,936   163,840   167,936   163,840   163,840   163,840   163,840
wc-scanStream       176,128   176,128   172,032   176,128   172,032   172,032   172,032   172,032
zebra               212,992   212,992   208,896   212,992   208,896   208,896   208,896   208,896
zern                135,168   135,168   131,072   131,072   131,072   131,072   131,072   131,072
compile time
benchmark         MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
barnes-hut          5.32   5.20   5.38   5.25   5.39   5.38   5.40   5.38
boyer               5.31   5.38   5.37   5.41   5.37   5.38   5.44   5.43
checksum            4.22   4.22   4.24   4.23   4.26   4.26   4.26   4.25
count-graphs        4.56   4.57   4.58   4.58   4.59   4.57   4.59   4.58
DLXSimulator        5.62   5.61   5.66   5.67   5.67   5.66   5.75   5.67
fft                 4.50   4.50   4.50   4.49   4.52   4.52   4.53   4.54
fib                 4.24   4.24   4.25   4.25   4.27   4.27   4.27   4.27
flat-array          4.26   4.24   4.28   4.27   4.29   4.26   4.28   4.26
hamlet             21.60  21.72  23.09  22.62  21.73  23.14  21.89  23.30
imp-for             4.26   4.24   4.27   4.27   4.28   4.29   4.28   4.28
knuth-bendix        5.08   5.10   5.12   5.12   5.12   5.11   5.15   5.13
lexgen              6.41   6.42   6.47   6.46   6.50   6.47   6.50   6.49
life                4.53   4.53   4.53   4.53   4.55   4.57   4.56   4.56
logic               5.23   5.25   5.26   5.28   5.29   5.27   5.31   5.30
mandelbrot          4.30   4.31   4.31   4.32   4.32   4.31   4.33   4.30
matrix-multiply     4.31   4.31   4.34   4.32   4.35   4.34   4.36   4.34
md5                 4.67   4.67   4.67   4.68   4.69   4.69   4.70   4.70
merge               4.29   4.28   4.31   4.29   4.39   4.32   4.33   4.31
mlyacc             14.03  14.20  14.35  14.38  14.30  14.36  14.37  14.41
model-elimination  12.74  12.81  12.62  12.66  12.93  12.42  13.02  12.52
mpuz                4.36   4.35   4.36   4.37   4.38   4.39   4.38   4.38
nucleic             6.36   6.40   6.40   6.44   6.39   6.40   6.47   6.46
output1             4.66   4.66   4.68   4.68   4.68   4.68   4.70   4.69
peek                4.68   4.67   4.67   4.69   4.69   4.70   4.70   4.69
psdes-random        4.34   4.32   4.36   4.34   4.35   4.34   4.35   4.35
ratio-regions       4.95   4.94   4.96   4.95   4.95   4.94   4.96   4.93
ray                 6.09   6.12   6.14   6.15   6.18   6.16   6.19   6.16
raytrace            7.86   7.95   8.18   8.01   8.04   8.04   8.11   8.12
simple              6.78   6.81   6.84   6.84   6.86   6.84   6.88   6.85
smith-normal-form   5.95   6.07   6.02   6.13   6.07   6.03   6.19   6.13
tailfib             4.29   4.28   4.31   4.29   4.32   4.32   4.33   4.31
tak                 4.29   4.28   4.28   4.28   4.30   4.31   4.31   4.32
tensor              5.50   5.49   5.54   5.54   5.55   5.56   5.56   5.55
thread-switch       4.76   4.75   4.76   4.77   4.79   4.76   4.78   4.78
tsp                 4.86   4.87   4.86   4.86   4.89   4.88   4.89   4.90
tyan                5.81   5.82   5.89   5.89   5.87   5.89   5.90   5.91
vector-concat       4.33   4.33   4.36   4.34   4.35   4.37   4.35   4.36
vector-rev          4.33   4.30   4.35   4.32   4.34   4.32   4.34   4.32
vliw                9.74   9.77  10.11  10.12   9.90   9.87   9.93   9.92
wc-input1           5.00   5.00   5.02   5.04   5.03   5.02   5.04   5.02
wc-scanStream       5.14   5.12   5.16   5.15   5.16   5.14   5.17   5.15
zebra               6.02   5.98   6.03   6.04   6.03   6.03   6.04   6.04
zern                4.70   4.68   4.69   4.68   4.71   4.70   4.71   4.72
run time
benchmark         MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
barnes-hut         11.81  10.47  11.87  10.52  11.86  11.85  10.54  10.51
boyer              17.32  21.42  17.43  21.47  17.53  17.50  21.67  21.73
checksum           30.86  30.84  30.85  30.84  30.85  30.36  30.84  30.83
count-graphs       12.33  13.36  12.32  12.86  12.25  12.27  12.78  12.76
DLXSimulator       11.05  13.93  11.13  14.03  11.08  11.07  13.97  14.00
fft                12.40  11.69  12.40  11.70  12.41  12.41  11.70  11.70
fib                21.36  20.95  21.36  20.81  21.36  21.88  20.90  20.80
flat-array         13.88  13.86  13.87  13.86  13.85  13.86  13.87  13.85
hamlet             19.62  23.27  19.77  23.75  20.12  19.74  23.58  23.83
imp-for            13.37  13.37  13.37  13.37  13.37  13.37  13.37  13.37
knuth-bendix       11.89  11.72  11.92  11.77  11.93  11.86  11.71  11.79
lexgen              9.97  10.57   9.72  10.36   9.34   9.32  10.02  10.02
life               11.20  12.63  11.17  12.68  11.20  11.18  12.64  12.64
logic              11.53  12.31  11.59  12.40  11.55  11.53  12.53  12.52
mandelbrot         18.99  18.99  18.99  18.99  18.99  18.99  18.99  18.99
matrix-multiply    12.36  12.38  12.35  12.42  12.35  12.39  12.38  12.37
md5                20.48  20.51  18.24  19.41  18.10  18.35  19.31  19.22
merge              16.98  23.15  16.98  23.21  16.97  16.95  23.23  23.24
mlyacc             13.08  15.88  13.00  15.84  12.96  12.95  15.71  15.73
model-elimination  22.69  23.49  22.72  23.91  22.69  22.60  23.37  23.59
mpuz               13.34  12.95  13.17  13.69  13.17  13.13  12.95  12.95
nucleic            10.88   9.41  10.90   9.38  10.89  10.89   9.38   9.40
output1            14.15  14.44  14.15  14.43  14.15  14.16  14.43  14.43
peek               20.62  20.65  20.32  20.58  20.16  20.07  20.59  20.63
psdes-random       12.81  12.97  12.83  12.78  12.80  12.77  12.87  12.84
ratio-regions      47.96  47.90  50.25  47.93  48.03  48.09  47.91  47.94
ray                13.95  13.44  14.17  13.69  14.17  13.79  13.68  13.44
raytrace           11.47   9.53  11.49   9.56  11.48  11.49   9.50   9.53
simple             11.40  11.73  11.43  11.59  11.42  11.41  11.63  11.81
smith-normal-form  14.21  14.14  14.22  14.12  14.21  14.19  14.13  14.14
tailfib            13.62  13.62  13.62  13.62  13.62  13.62  13.62  13.62
tak                14.26  14.09  14.01  13.72  14.02  14.15  13.54  14.56
tensor             20.93  20.93  20.93  20.93  20.93  20.93  20.93  20.93
thread-switch      22.14  23.47  17.92  19.25  17.61  17.62  18.68  18.82
tsp                23.06  22.41  23.07  22.38  23.07  23.07  22.97  22.38
tyan               12.12  13.34  12.22  13.58  12.21  12.21  13.37  13.41
vector-concat      18.55  18.53  18.54  18.51  18.51  18.54  18.53  18.54
vector-rev         19.49  19.45  19.48  19.46  19.49  19.47  19.48  19.45
vliw               12.03  13.47  10.91  12.84  10.94  10.86  12.18  12.21
wc-input1          14.48  14.70  14.57  14.67  14.53  14.55  14.70  14.68
wc-scanStream      18.51  18.62  18.49  18.60  18.53  18.54  18.60  18.50
zebra              16.17  15.80  16.03  15.70  15.98  16.03  15.66  15.72
zern               14.56  12.81  15.02  13.26  15.01  14.99  13.29  13.30

MLton0 -- ~/devel/mlton/mlton.svn.trunk.align/build/bin/mlton -align 4
MLton1 -- ~/devel/mlton/mlton.svn.trunk.align/build/bin/mlton -align 8
MLton2 -- ~/devel/mlton/mlton.svn.trunk.align-bitop/build/bin/mlton -align 4
MLton3 -- ~/devel/mlton/mlton.svn.trunk.align-bitop/build/bin/mlton -align 8
MLton4 -- ~/devel/mlton/mlton.svn.trunk.align4/build/bin/mlton -align 4
MLton5 -- ~/devel/mlton/mlton.svn.trunk.align4-bitop/build/bin/mlton -align 4
MLton6 -- ~/devel/mlton/mlton.svn.trunk.align8/build/bin/mlton -align 8
MLton7 -- ~/devel/mlton/mlton.svn.trunk.align8-bitop/build/bin/mlton -align 8
run time ratio
benchmark     MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
count-graphs    1.00   1.08   1.00   1.04   0.99   1.00   1.04   1.04
md5             1.00   1.02   0.96   1.02   0.95   0.97   1.02   1.01
mpuz            1.00   0.97   0.99   0.97   0.99   0.99   0.97   0.97
ratio-regions   1.00   1.00   1.03   1.00   1.00   1.00   1.00   1.00
thread-switch   1.00   1.06   0.81   0.87   0.80   0.80   0.84   0.85
vliw            1.00   1.12   0.91   1.06   0.91   0.90   1.01   1.01
size
benchmark      MLton0  MLton1  MLton2  MLton3  MLton4  MLton5  MLton6  MLton7
count-graphs  126,976 126,976 126,976 126,976 122,880 122,880 122,880 122,880
md5           139,264 139,264 135,168 139,264 135,168 135,168 135,168 135,168
mpuz          114,688 114,688 110,592 110,592 110,592 110,592 110,592 110,592
ratio-regions 131,072 131,072 126,976 126,976 126,976 126,976 126,976 126,976
thread-switch 151,552 155,648 151,552 151,552 151,552 151,552 151,552 151,552
vliw          466,944 475,136 462,848 471,040 462,848 462,848 471,040 471,040
compile time
benchmark     MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
count-graphs    4.78   4.64   4.79   4.68   4.80   4.81   4.83   4.82
md5             4.81   4.70   4.82   4.71   4.77   4.72   4.75   4.77
mpuz            4.41   4.38   4.41   4.38   4.41   4.41   4.43   4.41
ratio-regions   4.98   4.95   4.98   4.96   4.99   4.99   4.98   5.00
thread-switch   4.74   4.73   4.75   4.76   4.77   4.75   4.78   4.76
vliw            9.71   9.72  10.08  10.09   9.90   9.89   9.91   9.90
run time
benchmark     MLton0 MLton1 MLton2 MLton3 MLton4 MLton5 MLton6 MLton7
count-graphs   12.33  13.36  12.32  12.86  12.25  12.27  12.78  12.76
md5            18.95  19.35  18.24  19.41  18.10  18.35  19.31  19.22
mpuz           13.33  12.95  13.17  12.95  13.17  13.13  12.96  12.95
ratio-regions  48.07  47.99  49.49  47.99  48.10  48.14  47.95  47.90
thread-switch  22.12  23.46  17.93  19.25  17.61  17.61  18.68  18.82
vliw           12.04  13.45  10.91  12.81  10.92  10.84  12.13  12.19


More information about the MLton mailing list