Team PLClub ICFP entry -- comparing the performance of OCAML and SML

Stephen Weeks MLton@sourcelight.com
Thu, 12 Oct 2000 15:42:36 -0700 (PDT)


> I've done a quick de-camlification of this program, which seems to improve
> its performance under SML/NJ by a factor of 2 or more on the chess.gml
> example (733MHz PIII; SML/NJ 110.29).  My changes were:
...

John sent me a copy of his modified program, and I reran it on all of the
benchmarks.  There was a significant performance improvement under SML/NJ, with
chess showing the most improvement.  I include all of the numbers below.
 
> I think that this example illustrates the danger of making casual
> performance comparisons.  There are clearly stylistic differences
> in the most efficient style of SML programming vs the most efficient
> style of CAML programming.

I agree, although I would generalize along different lines.  The important
difference is between compilers, not languages.  I do not see any reason why an
SML compiler couldn't produce code for my original translation as good as the
code produced by the OCAML compiler (except for maybe the unsafe subscripts,
which I doubt matter much).  As a concrete example, MLton did not penalize the
programming style as much as SML/NJ did, although MLton did penalize it some --
see below.

> The CAML version of this program is still faster than the SML/NJ version,
> but the difference is 30% (instead of a factor of 3).

For the modified SML version, the ratio of SML/NJ to OCAML running time varied
between 1.35 and 3.72 across the benchmarks.

Here are the running times for all of the benchmarks, comparing the original
OCAML program, my original translation, and John Reppy's improved version.

                      my original    with jhr
		      translation   modifications
			    110.29         110.29
              OCAML  MLton  SML/NJ  MLton  SML/NJ
holes           1.8    3.5     5.0    3.2     3.9
fov             1.5    2.1     4.4    1.8     3.2
intercyl        1.6    2.4     6.0    2.1     4.3
snowgoon        2.9    4.0     8.4    3.3     5.1
dice            3.9    5.7    10.8    4.9     8.8
golf            1.5    2.5     4.2    2.4     3.1
cone-fractal    3.7    4.9     8.4    4.3     6.5
large           4.3    3.1     7.6    3.0     6.7
pipe            5.4    5.3    11.3    4.6     7.9
chess          16.0   17.8    38.6   15.5    21.6
fractal        12.2    8.7    41.1    8.5    45.4

geom-mean       3.6    4.4     9.5    4.0     7.1

> I suspect that
> there are other changes to the coding style that could be made to improve
> the performance under SML (for example, I'd use a record to represent the
> transformation matrix, instead of an array).

I agree.  Although your statement applies to the original OCAML program as well.
For example, had the original authors used records for their points and
matrices, the OCAML version would speed up.

> W.r.t. the revised numbers below, the numbers for SML/NJ 110.25 on fractal
> and large look suspect.

I reran the fractal and large benchmarks for the original translation under
SML/NJ 110.25 and got roughly the same results.  Here are the numbers.
	
         110.25
         SML/NJ
large       3.9
fractal    26.0