[MLton] N-Body benchmark and the refFlatten and the deepFlatten optimizations

Tue Jul 1 16:53:15 PDT 2008

While changing my n-body benchmark implementation in mltonlib

  http://mlton.org/cgi-bin/viewsvn.cgi/mltonlib/trunk/org/mlton/vesak/toys/n-body/

to use a separate library for 3d vector arithmetic, I realized that there
was basically no reason why the benchmark should be allocating memory.
Yet, gc-summary showed that it allocated a total of about 3G bytes.

So, I spent some time reading the generated assembly code (actually, I
first tried allocation profiling, but, in retrospect, failed to interpret
the results properly to find the exact place of unwanted allocation) and
noticed that the pos field of a body (see the benchmark code) seemed to be
using the traditional "pointer to mutable pointer to value"
representation.  This meant that the line

   ; #pos a := pos a |+| vel a |* dt

allocated one vector per iteration of the advance function.  This also
meant that neither the refFlatten (http://mlton.org/RefFlatten) nor the
deepFlatten (http://mlton.org/DeepFlatten) optimizations were applied.
Either one of these optimizations should have eliminated the allocation,
AFAIU.

I'm not sure what exactly causes the optimizations to be missed.  I would
assume that refFlatten is trickier to apply and requires a more complex
analysis.  The deepFlatten page does not explain the associated analysis
in any way (and I haven't looked at the implementation of deepFlatten).  I
find it slightly surprising that the deepFlatten optimization is not
applied in this case.  The objects are finite sized and not very large (3
reals (+ overhead)).  Shouldn't this make it safe (for space) to apply
deepFlatten?  Of course there are various trade-offs here (between
allocation, copying, sharing), and it is impossible to make optimal
decisions in every case.

The comments on the refFlatten page led me to think that the problem (not
applying either optimization) might be caused by the code that creates the
system (see the benchmark code) by mapping over a list during which ref
cells are being allocated.  So, I manually eliminated the map call.  This
obviously changed the results of some analysis, eliminated the allocation
and improved performance considerably from about 32 to about 26
seconds.  If I interpret the generated code correctly, then the
deepFlatten optimization was now applied, but not refFlatten, which
should give even better results by avoiding one more indirection.

It would seem that improving the analyses of the refFlatten and deepFlatten
optimizations, so that they could be applied in more cases, could have
significant performance benefits.

Below is a patch to replace the system definition in the benchmark, which
improves performance significantly by elimination allocation as discussed.

-Vesa Karvonen


Index: n-body.sml
===================================================================

--- n-body.sml	(revision 6665)
+++ n-body.sml	(working copy)
@@ -36,41 +36,37 @@
 fun vel (b : body) = ! (#vel b)

 val system =
-    map (fn {pos, vel, mass} =>
-            {pos = ref pos,
-             vel = ref (vel |* daysPerYear),
-             mass = mass * solarMass})
-        [{pos = {x = 0.0, y = 0.0, z = 0.0},
-          vel = {x = 0.0, y = 0.0, z = 0.0},
-          mass = 1.0},
-         {pos = {x = 4.84143144246472090,
+    [{pos = ref {x = 0.0, y = 0.0, z = 0.0},
+      vel = ref {x = 0.0, y = 0.0, z = 0.0},
+      mass = solarMass},
+     {pos = ref {x = 4.84143144246472090,
                  y = ~1.16032004402742839,
                  z = ~1.03622044471123109e~1},
-          vel = {x = 1.66007664274403694e~3,
-                 y = 7.69901118419740425e~3,
-                 z = ~6.90460016972063023e~5},
-          mass = 9.54791938424326609e~4},
-         {pos = {x = 8.34336671824457987,
+      vel = ref ({x = 1.66007664274403694e~3,
+                  y = 7.69901118419740425e~3,
+                  z = ~6.90460016972063023e~5} |* daysPerYear),
+      mass = 9.54791938424326609e~4 * solarMass},
+     {pos = ref {x = 8.34336671824457987,
                  y = 4.12479856412430479,
                  z = ~4.03523417114321381e~1},
-          vel = {x = ~2.76742510726862411e~3,
-                 y = 4.99852801234917238e~3,
-                 z = 2.30417297573763929e~5},
-          mass = 2.85885980666130812e~4},
-         {pos = {x = 1.28943695621391310e1,
+      vel = ref ({x = ~2.76742510726862411e~3,
+                  y = 4.99852801234917238e~3,
+                  z = 2.30417297573763929e~5} |* daysPerYear),
+      mass = 2.85885980666130812e~4 * solarMass},
+     {pos = ref {x = 1.28943695621391310e1,
                  y = ~1.51111514016986312e1,
                  z = ~2.23307578892655734e~1},
-          vel = {x = 2.96460137564761618e~3,
-                 y = 2.37847173959480950e~3,
-                 z = ~2.96589568540237556e~5},
-          mass = 4.36624404335156298e~5},
-         {pos = {x = 1.53796971148509165e1,
+      vel = ref ({x = 2.96460137564761618e~3,
+                  y = 2.37847173959480950e~3,
+                  z = ~2.96589568540237556e~5} |* daysPerYear),
+      mass = 4.36624404335156298e~5 * solarMass},
+     {pos = ref {x = 1.53796971148509165e1,
                  y = ~2.59193146099879641e1,
                  z = 1.79258772950371181e~1},
-          vel = {x = 2.68067772490389322e~3,
-                 y = 1.62824170038242295e~3,
-                 z = ~9.51592254519715870e~5},
-          mass = 5.15138902046611451e~5}]
+      vel = ref ({x = 2.68067772490389322e~3,
+                  y = 1.62824170038242295e~3,
+                  z = ~9.51592254519715870e~5} |* daysPerYear),
+      mass = 5.15138902046611451e~5 * solarMass}]

 fun advance dt =
  fn []    => ()