[MLton-user] more optimization questions

brian denheyer briand@aracnet.com
Fri, 18 Nov 2005 22:44:11 -0800


Howdy,

(All under OS X)

I'm running triply-nested loops with lots of FP and I'm seeing about  
a 3:1 slow-down over and equivalent C program.  I'm pretty sure that  
this number is real, because if I do a rough estimate of flops I'm  
way off using ML.

I can't get time profiling to work, so I was hoping I could get some  
optimization advice from those in the know.

I'm implementing my nested loops using something like :

fun for(n, m, proc) =
     let
         fun loop(i) =
             if i >= m then
                 ()
             else
                 (
                  proc(i);
                  loop(i+1)
                  )
     in
         loop(n)
     end
end

So the loops are implemented as for(1, n (fn (i)=> (for (1, m, (fn  
(j) =>   etc...

Any reason why this might cause slowdowns ?

Surprise, surprise, my counting profiling says I'm spending a LOT of  
time in my array sub and index routines :

     fun index (A3{n1, n2, n3, ...}, i, j, k) =
         (* 2 mults/2 adds for each index count *)
         k + n3 * (j + n2 * i)

     fun sub (arr as A3{elems, ...}, i, j, k) =
         Array.sub(elems, index(arr, i, j, k))


And elems is simply a real linear array.

2 things come to mind here:

Destructuring of the record parameters being done on every call.
BOXING !  How do I know if these array accesses are unboxed ?
Also, is it possible that the integer calculations (for the index)  
might cause any problems ?  Like maybe they are boxed or converting  
back and forth to tagged and un-tagged or something similar.

I tried using -keep g to get the .c intermediate files, but I'm  
having a hard-time deciphering.  I was hoping someone could provide  
some advice on what to look for or alternative implementation ideas.   
Also I'm happy to put appropriate snippets of the intermediate files  
up, or if someone can point me to docs on how to interpret, I'll go  
through those.  Nothing obvious on the web-site.

My simplistic testing (and that of John H on c.l.f) before I started  
this effort showed that mlton should be doing much better than this.

Thank You

Brian

P.S. -O2 vs -O1 basically makes no difference.