good in general, but bad nested loops

Matthew Fluet mfluet@intertrust.com
Mon, 9 Jul 2001 22:10:19 -0700 (PDT)


> I see that we are comparing registers against 0 instead of doing testl of the
> register with itself.  The compare is 3 bytes vs. 2 bytes for the testl  (for
> the  %ebp register), but it doesn't seem to make any speed differences in the
> tests I performed.  Still, it is something to put in.

That's easy enough to add.  It's a trivial peephole optimization to write;
or, if the majority of the cases are coming straight from Machine IL, I
can just do the right translation.  I'll add it to my todo.

> I also saw that we were loading a  register  from  memory,  incrementing  the
> register,  and  then storing it back into the same memory location.  (This is
> with overflow detection off, otherwise there is a test for  overflow  between
> these  two.)

I see the following loop with overflow detection off:

0x804b48a:      mov    0xdc(%edi),%esp
0x804b490:      cmp    $0x0,%esp
0x804b493:      je     0x804b730
0x804b499:      dec    %esp
0x804b49a:      mov    %esp,0xdc(%edi)
0x804b4a0:      incl   0xd8(%edi)
0x804b4a6:      jmp    0x804b48a

This is what I expected -- the increment of 0xd8(%edi) should happen in
memory.  The decrement of 0xdc(%edi) could happen in memory, since we
don't modify %esp.  But, since the corresponding memory location is
already in a register, I do the dec there and then move it back.

> Perhaps for these cases it really is just a matter of not having the relevant
> variables in real registers.  Or maybe it  is  the  2  adjacent  stores  into
> memory.  (I seem to recall that this was bad for the CPU to schedule.)

I'm working on carrying stack slots around loops in registers.  For that
tight loop (even with overflow checking), I'm hoping that no memory
accesses will be necessary.  But I'm still a little ways away from that
just yet.