slow matrix multiply

Tue, 10 Jul 2001 20:24:59 -0500

What is going on with the register dance in MLton:
    movl (188*1)(%edi),%edx         # %edx = i
    movl %edx,%ecx                  # %ecx = i
    movl %ecx,%eax                  # %eax = i
    movl $30,%ecx                   # %ecx = 30
The middle 2 moves are clearly silly given the last instruction.  Also if you
look at the code, %edx is dead at this point.  Thus the above could have been
just
    movl (188*1)(%edi),%eax
    movl $30,%ecx

Also, didn't we conclude that the cltd before imull's served no purpose?

More  importantly,  and  this is probably only doable if overflow checking is
off, the conversion of multiplies to fancy lea instructions is a big big  win
on Intel chips.

The  last of these is a big deal, but requires overflow detection going away,
but the shuffle surprises me.