good in general, but bad nested loops

Matthew Fluet mfluet@intertrust.com
Tue, 10 Jul 2001 15:20:14 -0700 (PDT)


> So, it has changed the loop to count down from n to 0 instead of up from 0 to n.
> It has also kept five of the loop variables in registers, plus the counter.
> Most importantly, the hot loop, L_60, is only three instructions.
>
> We can get potentially the same as gcc, because we have the loop nest.  We
> just(!) need to do the right register allocation and instruction reordering.  Oh
> yeah, and figuring out the Overflows can't happen.

I agree that we can probably get close to gcc.  Another aspect of the gcc
code is that it's able to get the 0 test for free by looking at the result
of the decrement.  Part of that does need to be done in the backend; i.e.,
the peephole optimization that would turn
  decl %eax         dec %eax               dec %eax
  cmpl $0,%eax  or  testl %eax,%eax  into  je L_123
  je L_123          jz L_123
(and this would be very easy to add to my peephole framework).  But,
another aspect of this is the transformation that does the branch at the
end of the loop, rather than at the top.  I think(?) that MLton generally
puts these branches at the top; CPS would seem to be the right place to
unroll and reorder these loops.