SML numerical benchmark and MLton

Juan Jose Garcia Ripoll jjgarcia@ind-cr.uclm.es
26 Oct 1999 10:22:13 +0200


"Stephen Weeks" <sweeks@intertrust.com> writes:

> > Secondly, I must admit that none of the previous optimizations did no better
> > to the performance of the MLton compiled code.
> 
> To be clear, I was interested in a comparison of
>   * your original code compiled by MLton
>   * your semi-automatically optimized code compiled by MLton

I have improved the timer. Now it uses 'times' from the Posix.ProcEnv
structure, just like the reference C code does. These are the results:

(* With hand-coded optimizations *)
$ ./tests
Real tensors: (+, *, /, +*, *+)
100      1     2     1    50    67
200      8     7     8   632   607
300     17    17    19  2334  2715


Real tensors: (+, *, /, +*, *+)
100      4     4     4    77   122
200     19    22    22  1325  1437
300     48    46    51  4437  5730

(* Left to the compiler. Extensive use of functors *)
$ ../smlapl3/tests
Real tensors: (+, *, /, +*, *+)
100      2     2     1    72    60
200      8     6     9   482   490
300     16    15    20  3225  2399


Complex tensors: (+, *, /, +*, *+)
100      4     4     4   199   162
200     19    20    23  2035  1902
300     46    49    52  8047  7735

(* Hand-coded optimizations. SML/NJ 110.17 *)
Real tensors: (+, *, /, +*, *+)
100      2     2     1    80    95
200     10     9    10   639   845
300     23    21    21  3022  3527


Real tensors: (+, *, /, +*, *+)
100      4     7     6   160   199
200     26    27    32  1679  2130
300     60    71    75  6627  7820

(* Reference C code *)
$ yorick -batch tests.i 
 Real tensors (add, mult, div, +*, *+)
      100        0        0        0       25       22
      200        3        4        5      227      232
      300        8        8       13     1225     1182
 Complex tensors (add, mult, div, +*, *+)
      100        1        1        4       62       87
      200        7       11       19      789     1212
      300       17       25       42     3059     4819

I post all benchmarks because I have used a newer machine to test
these procedures. Also, now that I use 'times', the difference between
one and multiple passes vanishes.

You can find a version which uses no #inline tags at this address:

http://est202.sub37.uclm.es/jjgarcia/smlapl-noinline.tgz

> On a related note, I was curious about your comments about the speed
> of Ocaml.  I was also interested if you had any code where you had
> both an SML and an Ocaml version.  I recently did some benchmarking of
> Ocaml and it did quite well on a few small benchmarks, and I am
> looking for more code to try.  Thanks.

There was a serious problem with the Ocaml version which forced me to
abandon it right at the development stage, and it is that something as
simple as this (I'll use SML notation):

fun a + b = MonoTensor.map2 RNumber.+ a b

will always involve a call to RNumber.+, no matter the limit you feed
into the inline option. In other words, ocamlopt does not know how to
inline the arguments of higher order functions and that means I should
code everything by hand to get reasonable speed and to avoid
consing. There is also the problem the complex numbers as parameters
to functions always cons, and the severe limit on the size of arrays
for the x86 architecture. And finally, the interpreted code cannot use
the native-compiled code and so it is not valid as interactive
environment.

	Juanjo

-----
Universidad de Castilla-La Mancha
Departamento de Matematicas
ETSI Industriales
c/Camilo Jose Cela, 3,			Phone:	+34-926-295300 (ext 3085)
Ciudad Real, E-13071 (Spain)		Fax:	+34-926-295369
-----
Our group:	http://www.uclm.es/dep/matematicas/nolineal/index.html
Temporal page:	http://est202.sub37.uclm.es/jjgarcia/index.html