multiplies by powers of two

Matthew Fluet mfluet@intertrust.com
Thu, 12 Jul 2001 19:52:26 -0700 (PDT)


I added the peephole optimization to turn cmp's with 0 to test's ...
pretty trivial.  Here's the tight loop on nestedloop:

0x804ae6b:      test   %ebp,%ebp
0x804ae6d:      je     0x804b080
0x804ae73:      dec    %ebp
0x804ae74:      jo     0x804b098
0x804ae7a:      inc    %ebx
0x804ae7b:      jo     0x804b0a0
0x804ae81:      jmp    0x804ae6b

I also updated the mul-by-pow-2 optimization to take care of
multiplication by 1, ~1, 2, and ~2 in the presence of overflow detection.
(By 1 and ~1 should be caught by the CPS simplifier, but if not, I catch
them again.)  (I think I convinced myself that negation and shift by 1 for
the case of ~2 will always result in the correct overflow; when the src is
minInt, then the negation will overflow and result in minInt, which will
also overflow on the shift.  maxInt won't overflow on the negation, but
will overflow on the shift.  In either case, overflow is correct at the
end.)  Is a shift the best in the case of 2 and ~2, or is a self add
faster?  How about with 4 -- two shifts (or adds) with intervening jo's?
But that will probably be just as much Icache pressure as a mask and test.