[MLton-user] more optimization questions

Stephen Weeks sweeks@sweeks.com
Fri, 18 Nov 2005 23:46:30 -0800


> I'm implementing my nested loops using something like :
...
> So the loops are implemented as for(1, n (fn (i)=> (for (1, m, (fn  
> (j) =>   etc...
> 
> Any reason why this might cause slowdowns ?

No.  That looks like a fine way to implement for loops to me.

> Surprise, surprise, my counting profiling says I'm spending a LOT of  
> time in my array sub and index routines :
> 
>      fun index (A3{n1, n2, n3, ...}, i, j, k) =
>          (* 2 mults/2 adds for each index count *)
>          k + n3 * (j + n2 * i)
> 
>      fun sub (arr as A3{elems, ...}, i, j, k) =
>          Array.sub(elems, index(arr, i, j, k))

These also look fine.

> Destructuring of the record parameters being done on every call.

Both sub and index will be inlined, and there will be no record
allocation or destructuring.

> BOXING !  How do I know if these array accesses are unboxed ?

Integer and real arrays are always unboxed with MLton.

To confirm that there is no record construction and no boxing, you
could do allocation profiling.  Even though time profiling fails,
allocation profiling should still work because it uses a different
mechanism.

If there's not much expected allocation other than some arrays, you
could do tests with various sizes and measure total program allocation
using @MLton gc-summary -- to verify that the amount of memory
allocated is roughly what you would expect.

> Also, is it possible that the integer calculations (for the index)  
> might cause any problems ?  Like maybe they are boxed or converting  
> back and forth to tagged and un-tagged or something similar.

Integer and real computations are never boxed with MLton.  However,
SML does require integer arithmetic to detect overflow.  It is quite
possible that this is hurting your performance.  MLton only has a C
codegen on PowerPC, and overflow detection is much more costly with
the C codegen than the native codegen.  MLton detects overflow by
doing a 64-bit multiply, casting to a 32-bit result, and testing to
making sure that nothing was lost in the cast.

Because you are using the C codegen and because you're using PowerPC,
which hasn't received anything like the amount of tuning we've done on
x86, you may see different results there than what you or others have
seen on x86.

It is possible to turn off overflow detection in the entire program by
compiling with -detect-overflow false.  If this shows that the index
computation is your bottleneck, and you don't rely on overflow
detection there (presumably because you do your own bounds checking on
each index), then you could replace the index computation's Int
arithmetic with Word arithmetic, because that doesn't detect
overflow.

> I tried using -keep g to get the .c intermediate files, but I'm  
> having a hard-time deciphering. 

Yeah, there are no good docs for that stuff.  It would be great to see
a question-and-answer discussion (this probably makes more sense on
MLton than MLton-user) to explain the format to a newbie, with the
intent of cleaning up the discussion and putting it on the MLton wiki.