limit check insertion via loop forests

Thu, 20 Dec 2001 18:22:50 -0500 (EST)

> Matthew raised some comments that I also had, namely is there some notion of
> minimality on (a) the number of labels where limit checks are inserted
> (while still preserving safety), 

This formulation is not minimal in the number of labels that get limit
checks.  Consider the following CFG:

Labels = {A, B, C, D, E}
S(A) = {B, C}
S(B) = {D}
S(C) = {D}
S(D) = {B,C,E}
S(E) = {E}
E = {A}
A(A) = A(B) = A(C) = A(E) = 0
A(D) = 12 bytes

Conceptually, its two loops (B->D->B,C->D->C) but sharing D, which either
exits or re-enters a loop at either B or C.

Anyways, this algorithm will put a limit check of 12 bytes at both B and
C, because they are loop headers (at least as identified by Steensgaard; I
don't have the loop forest paper in front of me to know whether or not any
of the other loop forest algorithms choose D as the loop header).  But, we
could really get by with just a single limit check of 12 bytes at D. 

> and
> (b) the size of required space demanded by V, again that is safe?  

Interestingly, (a) and (b) aren't independent; depending on how much slop
you want to allow at the end, you can get by with fewere limit checks.

> I think it's a neat formulation, and is a cool application of
> loop forests in that limit checks are inserted at the beginning of loop
> headers.  I wonder if this idea could be further refined to coalesce limit
> checks.  Do loop forests help here in any way?

It is automatically doing a form of coalescing.  The loop headers give a
minimal (in some sense) set of program points that require limit checks;
the algorithm then coalsces everything between one limit check and the
next.  Steve adds in those extra program points to keep limit checks from
being pushed back into loops.  Acutally, what we don't want to do is to
push limit checks back into non-allocating loops.  If the loop is already
allocating, then it would actually be better to do the limit check in the
loop, rather than doing another one after it, because the cost (in
instructions and code size) of a limit check is (almost) independent of
the limit check amount.  Almost, because we've got 512 bytes of limit
slop, which let us do a limit check for <= 512 bytes without needing a
constant.  But, the size of that constant is miniscule compared to the
code needed for a second limit check, so even if removing the limit check
from the nodes after the exit of an allocating loop increases the limit
check at the loop header to > 512 bytes, I think it would be a win.