limit check insertion via loop forests

Stephen Weeks MLton@sourcelight.com
Thu, 20 Dec 2001 15:12:26 -0800


> > Input:
> > A four tuple
> >       Label                           set of blocks
> >       S       Label -> P(Label)       successors of a block
> >       E       P(Label)                entry labels
> >       A       Label -> Nat            number of bytes allocated in block
> 
> I think it is worth pointing out that E includes both the function entry
> label and all conts and handlers at non-tail calls, correct?

Yes.  In fact, the graph (Label, S) is not the control flow graph - it
is a subset.  All edges into nodes in E are removed.  Since we can
make no assumptions about the available space at those nodes, there's
no point including edges into them, which would just make the loop
forest stuff identify more loops than it needs to.

> How do you propose handling of array allocations?
...
> One solution might be to force a block split at array allocations and make
> the introduced label an entry label.

This seems fine to me (it's essentially what we do now with
-limit-check-per-block false).  It forces a limit check right before
the allocation that includes enough for the allocation and up to any
subsequent limit checks.  In your example,

> L(x,y,z)
>  a = (x,y)
>  b = (y,z)
>  c = Array_array(x)
>  d = (z,x)
>  K()

this would introduce block L' starting at c and make A(L) = 24, A(L')
= 12, and L' in E.  Of course, we then patch up the actual limit check
amount at L' to include x.  Also, there is no edge from L to L'
because L' in E, as I explained above.

> If you want to simplify it even more, you can combine 2 and 3:
> 
> 2'. For all l in Label, max(F(l), C(l)) - A(l) >= max {F(l') | l' in S(l)}
> 
> where the maximum of an empty set of natural numbers is 0.

Thanks, that's much better.

> 2'. For all l in Label
>        max(F(l), C(l)) - A(l)
>        = R(l) - A(l)
>        = max { R(l') | l' in S(l) - (E U L)}
>        = max { F(l') | l' in S(l) - (E U L)}
>        = max { F(l') | l' in S(l) - (E U L)} + 0
>        = max { F(l') | l' in S(l) - (E U L)} +
>          max { 0 | l' in S(l) /\ (E U L)}
>        = max { F(l') | l' in S(l) - (E U L)} +
>          max { F(l') | l' in S(l) /\ (E U L)}
>        = max { F(l') | l' in S(L)}
                                 l
> > QED
> 
> Nice thing here is that it's equalities all the way through, which shows
> that the limit checks inserted are "tight" in some appropriate sense.

Yep.  Cool.

> Really, it moves limit checks back into loops; the limit check will get
> everything in the loop and everything after the loop until the next loop.

Agreed.  I was thinking forward in time during program execution,
which of course equals backward in the graph.

> Using headers, we'll always have limit checks at the top of loops, which
> is kind of nice, because that's exactly where we need limit checks for
> threads.

Sort of.  Except that as I argue above, it's not loops in the control
flow graph that matter.  It's loops in the modified control flow graph
with edges into nodes in E removed.  This is a mistake that MLton
currently makes and causes it to add to many signal limit checks.

> It might be worthwhile to have the limit check insertion also
> return a  N : Label -> {No|Yes|Stack}  to indicate that a limit check is
> needed even if C(l) = 0.  N(l) = Stack for l in E, to do the stack check
> upon function entry (do we need this at conts?). 

Stack limit checks are only needed at function entry.

I would like do signal check insertion after limit check insertion on
a control flow graph with all edges into limit checks removed.
Because limit check insertion may add limit checks at nodes other than
headers, removing edges into limit checks (which implicitly do a
signal check) will eliminate some loops, thus reducing the number of
places where signal checks are pleced.

> N(l) = Yes (or Stack) for l in Headers if the program uses threads; cool,
> here's another place where we can use multi, I think.  We only need N(l) = Yes
> for l in Headers if the label l is multi-threaded; if we figure out a way to
> do a new, empty thread (i.e., without the thread copy at the very beginning of
> the program) then we might not need to penalize non-allocating loops in the
> initialization of a threaded program (i.e., before the first thread is
> copied).

It's not multi-threaded that matters.  What matters is whether or not
signal handler has been installed.  But I guess since signal handlers
are threads, that amounts to the same thing.  OK.  This seems fine
(but very very minor :-).

> O.k.  The issue raised above is that we might ask for too much space when
> it's not going to be used:
> 
> A(x)
>  b = x = 0
>  case b of true => B | false => c
> B()
>  ... 4 bytes allocated ...
>  y
> C()
>  ... 1MB allocated ...
>  z
...
> The algorithm above will make A do a check for 1MB, even if x = 0.  That
> seems wasteful to me, but maybe it's o.k. 

I think it's OK as long as we don't move array limit checks backward.
All other limit checks are for amounts proportional to program size, I
think.  Those don't bother me, because of the way I order quantifiers
in the definition of safe-for-space.

> I guess I want some condition
> like 
> forall l in E.
> forall p starting at l and ending at l' s.t. S(l') = {}.
> N >= V(p,0) >= 0
> 
> That says we don't finish a function with too much excess space.

I am happy with the following minimality notion, which I think we
achieve: given our choice of points for placement of checks, none of
the checks could be for a smaller amount and still yield a safe
insertion.

> Likewise, we might make S(l') = {} for any l' that makes a non-tail call.

I definitely think we should do this from the beginning, since all
limit info is lost across the control-flow edge.