limit check insertion via loop forests

Matthew Fluet Matthew Fluet <fluet@CS.Cornell.EDU>
Thu, 20 Dec 2001 17:16:27 -0500 (EST)


> Here's my latest attempt at formalizing a framework for limit check
> insertion, leading to an approach for inserting limit checks based on
> loop forests.  Let me know your thoughts (and please check my
> proofs!).

I think it looks pretty good.  

> Input:
> A four tuple
>       Label                           set of blocks
>       S       Label -> P(Label)       successors of a block
>       E       P(Label)                entry labels
>       A       Label -> Nat            number of bytes allocated in block

I think it is worth pointing out that E includes both the function entry
label and all conts and handlers at non-tail calls, correct?  This makes
sense with the requirement for F below, because upon return from a
non-tail call, we can't have any assumptions about space available.

How do you propose handling of array allocations?  Note, I don't suggest
trying to incorporate them directly into this framework right now, but I
was just wondering how you intend to define A when the block has an array
allocation in it; eg. 

L(x,y,z)
 a = (x,y)
 b = (y,z)
 c = Array_array(x)
 d = (z,x)
 K()

Assuming x,y,z, are ints, then A(L) = 36 bytes seems right, but after
doing the forced limit check for x bytes to do the array allocation, there
may not be 12 bytes left to allocate d.

One solution might be to force a block split at array allocations and make
the introduced label an entry label.

> Definition [safe]:
> The insertion is safe if 
> 1. For all l in E, F(l) = 0.
> 2. For all l in Label, max(F(l), C(l)) - A(l) >= 0.
> 3. For all l in Label, for all l' in S(l), max(F(l), C(l)) - A(l) >= F(l'). 

If you want to simplify it even more, you can combine 2 and 3:

2'. For all l in Label, max(F(l), C(l)) - A(l) >= max {F(l') | l' in S(l)}

where the maximum of an empty set of natural numbers is 0.

> Theorem: (C, F) is safe.
> Proof:
> 1. For l in E, F(l) = 0 by the definition of F.
> 2. For all l in Label
> 	max(F(l), C(l)) - A(l) 
> 	= R(l) - A(l)
> 	= max { R(l') | l' in S(l) - (E U L)}
> 	>= 0
> 3. For all l in Label, for all l' in S(l),
>    	max(F(l), C(l)) - A(l) 
>         = R(l) - A(l)
>         = max { R(l'') | l'' in S(l) - (E U L)}
>    If l' in E U L, then F(l') = 0
>         >= 0
>    otherwise F(l') = R(l')
> 	>= R(l')
2'. For all l in Label
       max(F(l), C(l)) - A(l)
       = R(l) - A(l)
       = max { R(l') | l' in S(l) - (E U L)}
       = max { F(l') | l' in S(l) - (E U L)}
       = max { F(l') | l' in S(l) - (E U L)} + 0
       = max { F(l') | l' in S(l) - (E U L)} +
         max { 0 | l' in S(l) /\ (E U L)}
       = max { F(l') | l' in S(l) - (E U L)} +
         max { F(l') | l' in S(l) /\ (E U L)}
       = max { F(l') | l' in S(L)}
> QED

Nice thing here is that it's equalities all the way through, which shows
that the limit checks inserted are "tight" in some appropriate sense.

> So, all we need is an L that decycles the input, and we automatically
> get a sound insertion.  One can use loop forests to build such an L.
> I won't repeat the definitions from page 3 of the Ramalingam paper,
> but there he defines loop forests, and shows that taking L to be the
> set of loop headers decycles the input.  That gives one way of doing
> limit check insertion.  It's not great because it still moves limit
> checks forward into loops.  

Really, it moves limit checks back into loops; the limit check will get
everything in the loop and everything after the loop until the next loop.
Hmm, that seems to raise an issue -- more below.

Using headers, we'll always have limit checks at the top of loops, which
is kind of nice, because that's exactly where we need limit checks for
threads.  It might be worthwhile to have the limit check insertion also
return a  N : Label -> {No|Yes|Stack}  to indicate that a limit check is
needed even if C(l) = 0.  N(l) = Stack for l in E, to do the stack check
upon function entry (do we need this at conts?).  N(l) = Yes (or Stack)
for l in Headers if the program uses threads; cool, here's another place
where we can use multi, I think.  We only need N(l) = Yes for l in Headers
if the label l is multi-threaded; if we figure out a way to do a new,
empty thread (i.e., without the thread copy at the very beginning of the
program) then we might not need to penalize non-allocating loops in the
initialization of a threaded program (i.e., before the first thread is
copied).

> Define two nodes to be in the same loop if they appear in the same
> notInLoop vector in our representation of loop forests.  Basically,
> that means that deepest loop that each of the nodes appears in is the
> same.  Then define L by
> 
> 	L = Headers U { l | exists l' not in the same loop as l with l in S(l') }
> 
> This will place limit checks at loop headers and post-exits, i.e. at
> nodes that will prevent limit checks being pushed across loop
> boundaries.

This is very nice.  At first I was concerned that we would need to add
blocks at loop exits, but no.  I was confusing allocation occuring in
blocks to allocation occuring on control-flow-edges.  And this extra set
of nodes is really easy to calculate from the predecessor list and the
equivalence classes of notInLoop.


O.k.  The issue raised above is that we might ask for too much space when
it's not going to be used:

A(x)
 b = x = 0
 case b of true => B | false => c
B()
 ... 4 bytes allocated ...
 y
C()
 ... 1MB allocated ...
 z

Coming from something like

fun f(x,y) = if x = 0 then nil else y::y::y::...::y::nil

The algorithm above will make A do a check for 1MB, even if x = 0.  That
seems wasteful to me, but maybe it's o.k.  I guess I want some condition
like 
forall l in E.
forall p starting at l and ending at l' s.t. S(l') = {}.
N >= V(p,0) >= 0

That says we don't finish a function with too much excess space.
Likewise, we might make S(l') = {} for any l' that makes a non-tail call.