Team PLClub ICFP entry -- comparing the performance of OCAML and SML

Fri, 13 Oct 2000 19:28:16 -0700 (PDT)

> The question is, what is special about fractal that causes SML/NJ to be
> so much slower on it than say chess?

Here are the profile numbers for MLton on chess and fractal showing the top five
functions and the percent of time spent in them (warning: MLton's profile
information refers to one of the compiler ILs and not directly to the source).
The numbers show that chess and fractal are indeed very different (no surprise),
and that most of the time in fractal is spent in optimize_rec, which is a
recursive function that does not do any floating point math.

chess.gml
find_all                                23.80%
intersect                               18.53%
vmul                                    14.43%
eval                                     8.13%
cone                                     7.02%

fractal.gml
optimize_rec                            69.92%
intersect                                9.87%
find_all                                 4.88%
union_radius                             4.76%
trace                                    2.67%

Profiling fractal at the basic block level shows that most of the time is spent
in the opt_union function (MLton inlined it in optimze_rec so it didn't show up
at the function level of profiling).  Assuming that SML/NJ is also spending a
good bit of time there, this is another candidate for decamlification, since it
has lots of loops, ref cells, arrays, and array2s.  Based on earlier speedups
from demcalification, measurements showing the penalty for bounds checks in
SML/NJ, and Matthias's comments about the poor performance of array subscripts,
I would imagine this function causes a significant part of the slowdown for
SML/NJ.  It's not trivial for me to confirm this, since I don't know how to get
at an unsafe version of Array2 in SML/NJ.