limit check insertion via loop forests

Stephen Weeks MLton@sourcelight.com
Thu, 20 Dec 2001 12:54:40 -0800


Here's my latest attempt at formalizing a framework for limit check
insertion, leading to an approach for inserting limit checks based on
loop forests.  Let me know your thoughts (and please check my
proofs!).

Input:
A four tuple
	Label				set of blocks
	S	Label -> P(Label)	successors of a block
	E	P(Label)		entry labels
	A	Label -> Nat		number of bytes allocated in block

Output (limit check insertion):
A two tuple
	C	Label -> Nat		limit check amount (beginning of block)
	F	Label -> Nat		free amount required before block

Definition [path, starts]:
A path in (L, S) is a sequence of labels l0, l1, ... such that for all
i >= 0, l_i+1 in S(li).  p starts at l0.

Definition [V]:
V: Path x Int -> Int, defined by induction on the length of the path.

  V ([], n) = n
  V (l :: p, n) = V (p, max (n, C(l)) - A(l))

Intuitively, V (p, n) gives the amount free at the end of the path,
given that n is free at the beginning.

Definition [sound]:
The insertion is sound if for all l in E, for all p starting at l, 
V (p, 0) >= 0.

Soundness depends on C, but not on F.

Definition [safe]:
The insertion is safe if 
1. For all l in E, F(l) = 0.
2. For all l in Label, max(F(l), C(l)) - A(l) >= 0.
3. For all l in Label, for all l' in S(l), max(F(l), C(l)) - A(l) >= F(l'). 

Safety depends on both C and F.

Theorem:
If the insertion is safe, then for all l, for all p starting at l,
if n >= F(l), then V (p, n) >= 0.

Proof: by induction on the length of the path.
Base:
 V ([], n) = n >= F(l) >= 0.
   where the first inequality follows by assumption and the second by
   the range of F.
 V ([l], n) = max (n, C(l)) - A(l) >= max (F(l), C(l)) - A(l) >= 0
   where the first inequality follows by the assumption n >= F(l) and
   the second follows from condition 2 in the definition of safety.
Ind:
  V (l :: l' :: p, n) = V (l' :: p, max (n, C(l)) - A(l))
Notice that
     max (n, C(l)) - A(l) 
  >= max (F(l), C(l)) - A(l)
  >= F(l')
where the first inequality follows from the assumption that n >= F(l)
and the second follows from condition 3 in the definition of safety.
Hence we can apply the induction hypothesis to conclude V (l' :: p,
max (n, C(l)) - A(l)) >= 0.  
QED

Corrollary [safe => sound]:
If the insertion is safe, then it is sound.

Proof: 
Let l in E and p starting at l be given. By condition 1 in the
definition of safety F(l) = 0.  Since 0 >= F(l), the above theorem
applies.  Hence, V (p, 0) >= 0.

The rest of this document is about producing a safe insertion.

Definition [cycle]:
A cycle is a path whose start and end labels are the same.

Definition [decycles]:
A set of labels L decycles the input if every cycle contains a label
in L.

Henceforth, assume L decycles the input.  We will place limit checks
at labels in E U L that need them.

Define the following function inductively

	R: Label -> Nat
	R(l) = A(l) + max { R(l') | l' in S(l) - (E U L)}

R is well defined because L decycles the input.

Define C and F by
	C(l) = if l in E U L
	       then R(l)
	       else 0
	F(l) = if l in E U L
               then 0
               else R(l)

Notice that by definition, for all l in Label, max(F(l), C(l)) = R(l).

Theorem: (C, F) is safe.
Proof:
1. For l in E, F(l) = 0 by the definition of F.
2. For all l in Label
	max(F(l), C(l)) - A(l) 
	= R(l) - A(l)
	= max { R(l') | l' in S(l) - (E U L)}
	>= 0
3. For all l in Label, for all l' in S(l),
   	max(F(l), C(l)) - A(l) 
        = R(l) - A(l)
        = max { R(l'') | l'' in S(l) - (E U L)}
   If l' in E U L, then F(l') = 0
        >= 0
   otherwise F(l') = R(l')
	>= R(l')
QED

So, all we need is an L that decycles the input, and we automatically
get a sound insertion.  One can use loop forests to build such an L.
I won't repeat the definitions from page 3 of the Ramalingam paper,
but there he defines loop forests, and shows that taking L to be the
set of loop headers decycles the input.  That gives one way of doing
limit check insertion.  It's not great because it still moves limit
checks forward into loops.  I propose to do the following.

Define two nodes to be in the same loop if they appear in the same
notInLoop vector in our representation of loop forests.  Basically,
that means that deepest loop that each of the nodes appears in is the
same.  Then define L by

	L = Headers U { l | exists l' not in the same loop as l with l in S(l') }

This will place limit checks at loop headers and post-exits, i.e. at
nodes that will prevent limit checks being pushed across loop
boundaries.