good in general, but bad nested loops

Henry Cejtin henry@sourcelight.com
Mon, 9 Jul 2001 23:32:45 -0500


I  have  a  new version of my spy program.  This one takes the `-l' option to
mean don't look for a loop, just dump what the program is doing.  The default
window is still 1000, so if you just run
    spy -l pid
then it will show you the next 1000 instructions that process pid is running.
You can combine this with the -w option:
    spy -l -w 10000 pid
will show you the next 10,000 instructions.

Doing this on the nested loop case is quite  illustrative.   Note,  compiling
with `-detect-overflow false' makes very little difference to the time.

I see that we are comparing registers against 0 instead of doing testl of the
register with itself.  The compare is 3 bytes vs. 2 bytes for the testl  (for
the  %ebp register), but it doesn't seem to make any speed differences in the
tests I performed.  Still, it is something to put in.

I also saw that we were loading a  register  from  memory,  incrementing  the
register,  and  then storing it back into the same memory location.  (This is
with overflow detection off, otherwise there is a test for  overflow  between
these  two.)   I  thought  that perhaps just incrementing the memory location
would be faster, but it doesn't seem to be.  Again, it is smaller,  but  with
overflow  detection  you  would  have  to be clever enough to know that on an
exception it is dead so it is safe to do the increment.  (Or else have patch-
up code in the overflow true case.)

Perhaps for these cases it really is just a matter of not having the relevant
variables in real registers.  Or maybe it  is  the  2  adjacent  stores  into
memory.  (I seem to recall that this was bad for the CPU to schedule.)

Any way, here is a sample of the hot code in nestedloop:
    0x8049519:      mov    0xdc(%edi),%esp
    0x804951f:      cmp    $0x0,%esp
    0x8049522:      je     0x804a598
    0x8049528:      dec    %esp
    0x8049529:      jo     0x804a5b8
    0x804952f:      mov    0xd8(%edi),%ebp
    0x8049535:      inc    %ebp
    0x8049536:      jo     0x804a5c4
    0x804953c:      mov    %esp,0xdc(%edi)
    0x8049542:      mov    %ebp,0xd8(%edi)
    0x8049548:      jmp    0x8049519

And here is the latest spy program:


begin 600 spy.tgz
M'XL(`%*%2CL"`^U726_C-A3VU?X5',RDDAK;D3.9"6I,`J1`EP&*7KH"'A\8
MB;;92)1&I&([O[[?XV(KDW0[M(="[V"1;_G>PL?%NMX/_FU*9VEZ^>;-(`5=
MG*?V.[NXL%_0V[>O+P?IY>OS\XO+-[.+UZ1_F:8#E@[^`VJUX0UC@XU0S?Y/
M]42C!_\[>OGBK-7-V:U49T+=LWIO-I4:C6195XUA>J^'+QEOUO=CIDU>M29(
M*A+452W4F-7<;*:W7`O%2W$P-8U4:[(VE0Q,(\IZ)0L!=GE'D]$H%RM6<JGB
M9#X:DB=V16ZG-!P-@<+E>F/`_+Y28C3<2I576TS14^G!@#Z+V7P)^0;P;`:L
MH5PQ51DKH^FPU7PMX@1#L()1NCQJQL3G*B<)!.SJBD63*+'&MXW@=\[TL4,R
MMGBD7$16N1/U#'-1='6V3L<T>_L]9N0J-J5ZQ3XV"G8H=IFH#?N9%ZWXJFFJ
MQMD=TPD:[U4N=L]K/`U;%%H\+HN/*"OS6N9_&,^ST1PPD&@AE%5/V+MKMQ!'
M#]0@0`YM,'5-0*+M.K^-K7S,7`1CYBH#(62PJO34-EP<B9W(&#$GBDT^LLDM
M-]F&37;L1+-W9[FX/U-M4;#SZ\]F'U3$3I@%!A`6,2^DHAA@/@W3HR!.T)KZ
M3M;LMU9AO>&S51#<Q0'"5%9\9?.,9G.V.Y/LE0"+11#GW'`(%RCQJFJ89%*Q
M7<,5"N#3H8H\=6>CTIL*2FNV%5@I3$U5UR*'_@,@CS;'CGVP*YA+$4??H!QU
M4V5":R9VTHB<PK&6#PL7]'P9O+5**B,:H0VYJQNQDCN'&EK7]3'VH=OVTVT#
MR/@AZ78.Y3KE"%'E5N*#ZD+$JRT?([E2F@21K%""[ZJJCDM>QP42VHC=F!%.
M0LB?^J-T=4SB!7#8W`%1%Y(F91FGR6CTDOU"V@QVS&R$[8RL*DML93VU!XSK
MKK)V_66;*ZM:96@Q#'4BM1>UUD$GVE+YK,SG'G%C.-KL)'<]!91/-;0P[!55
M/.@X)X^UD&>42UT7?$^=4V=0)9X[N6!^?<52S\-)(FIYF`1X_$S8S+-1?3_Z
MV$J#P<%?5E2:V@4%NF$%RL[HKC,:1[*MDVI+T<B,%\4>XBVZ@?$\;ZB#Z!0$
ML,8V`/=6H)<%8,R&=W2P\KS!=XWC>\K8>\/R2F@5&:8$]2%O))"IBX-'+;)*
MY0"JLJS%KL@$L#..NY7I"D=#5N4"M\$>7/0[X-'&0/X1IH;"D]J%`"R`V)3`
MTB52$`V)E'4CE)'8/X6$2;4*`0L]MGEY!!B2`G`0D\9(DRYY-=)(L&A*:.2&
M#B.SL5.I(.8%;KV&*D,;B&('CFY7V$==,^]/4.,_-?$;S[7H86L<HJ7FM`MF
M-W_>9B(N):[;;CIAZ,YFJRWIG`_\J:0[(;8"*!1B9;KBA;,XM5<"#K0U<D3W
MC8:XTG$^*'^2=2[5<&TIL3/V$%R91S[8J;OR["UQ/)UNU&%1J"Q;KM$C]U@S
M+82R!]7!NQ^<,G(!P3&4,#IE+NXE1A3`8DZJDYF_C6/L?:C*VT+$G5JY5,<!
M)7'W>B-,"\@X2'U!7!`NKE5(=$%N4"DL]0\"C>#6>5LU18ZWP]-V2Z@YC\&X
M%@I)Q+Z'R0B[JZ2F"&V3^##`\AO'EMAS7;]TDK01/$T0^6V1A[^JK!(2JH^L
MH&A/[C!C+ZY<2KXW<.KZFI#ETE;-+NFW]&"F#*727(G(PSB76"SZX@;T*->.
MXV\_5W-ZT04KZW'NE)<4@P]G,7&\^?(Y4WITVI>*]7%J781VC:W0\=@[)!7<
M'_U9C3FCC\NNX_AQ>WA_P6$'>G30.6+.[=LQ9#`GSQ/B8X_:M?,W7VRW.(43
MZ\7\W'J/TEV4V&.CR_LU2HZ%_Y+GH<?F>/%$6/G%'-UOWS?:[5]Z>^"NPN-#
M+\[GRY!VE$94"DC(`SZ81%]$(5=KO)NEG]/P%*N7Q]DFP5U#(]@FX3&+R_`9
MH-5?`9V"R0,<[\+=/`/W]=^!NPEP-]&CMTE8%%@DH\<SNP3^30I=6].?:$K5
M9(M)L<3/UK\^)UH^B"5=]G2?TQ.T^V<G#O]4Z/SUR(2W*LV84;3VA>&?-:)I
M_!.`Y$"S<F01?;!'X.%-,P.2^TLT&O344T\]]=133SWUU%-//?744T\]]=13
33SWUU%-//?U3^AV\?O?$`"@```==
`
end