[MLton-user] more optimization questions

Matthew Fluet fluet@cs.cornell.edu
Sun, 20 Nov 2005 18:29:03 -0500 (EST)


> I coded up a simple but very useful 2D finite difference code.  I did it in 
> 2D to eliminate my 3D array implementation from the equation.  It makes for a 
> very nice test case.  The code is quite simple.  It has a "correct answer" to 
> test for correctness of operation.  It scales easily, i.e. you can simply 
> increase the size of the arrays to make it take longer and the answer remains 
> the same (although the iteration count goes up).
>
> The results are somewhat depressing:
>
> gcc -O2
> real    0m4.001s
> user    0m3.908s
> sys     0m0.028s
>
>
> mlton (-cc-opt -O2)
> real    0m14.784s
> user    0m14.664s
> sys     0m0.058s

MLton is still probably doing a lot of overflow checks and bounds checks.

It would be interesting to see the effect of  -detect-overflow false.

Which, by the way, is a good thing.  I know everyone means well, but it 
isn't (always) a meaningful comparison to transliterate a C program into 
SML and expect the same performance out of MLton as out of GCC.  I'd like 
to see someone transliterate Henry's count-graphs benchmark, which makes 
heavy use of higher order function and exceptions, into C and report back 
on mlton's vs gcc's performance.

Another thread along these lines starts here:
   http://mlton.org/pipermail/mlton/2005-March/026874.html

>>> As for power-pc optimization, I'm really interested in helping with that. 
>>> Although with the mac bonehead decision to go to intel I can't see that 
>>> anyone is going to be very motivated to optimize anyting for power pc.
>> 
>> Well, since a native code power-pc backend is unlikely, any improvement to 
>> the C-codegen would benefit other platforms as well.
>
> Given that the C-compiler performance is quite good on the power-pc that 
> would probably help a lot.  I'm definitely willing to invest in the time to 
> help increase the performance.  It would save me the effort of writing my own 
> compiler for a numerical computation oriented functional language (SISAL 
> anyone ?) ;-)

I seem to recall that at one point in time, we had inline assembly for 
overflow checking arithmetic in the (support code for the) C-codgen.  When 
we had the native x86-codegen, we simplified that away, but it might be 
worthwhile to see what inline PowerPC assembly for overflow checking 
arithmetic gives you.

> Also I'm just plain curious as to what is going on.  It's not obvious to me 
> that any of the optimizations being discussed are worth a factor of 3.5 in 
> performance, are they ?

It's hard to say.  There is an additional issue that, to GCC, all the 
C-code that MLton produces looks as though it is doing a lot of heap reads 
and writes, since MLton puts the ML stack on the heap.  This means that 
GCC is probably being a bit conservative in it's alias analysis, and won't 
be able to do any of the loop optimizations for us.