cps & contification

Mon, 29 Jan 2001 11:06:49 -0500 (EST)

> It also makes it even
> clearer to me why dominators is the right notion.
> 
> Are you working through the proofs of safety and minimality?

Yup, and they all work out without any problems.  They are also a little
more uniform, I think: the pattern of reasoning for the lemmas supporting
the proof of minimality is repeated both when B(f) = j in Jump and B(f) =
h in Fun (for some arbitrary safe analysis B).  Also, without taking a
fixed point, we can actually drop the ordering on RCont; I guess there's
no harm keeping it, but because it can't be incorporated into the
minimality proof, I don't think it buys us much.

Here are the proofs and formal description of the analysis.  Note that the
proof of minimality proves that this analysis subsumes the continuation
based analysis (because A' from the old dominator based analysis was a
safe analysis).

*****************************************************************************
Dominator based contification
*****************************************************************************

The new analysis ("dominator based contification") determines the set
of continuations with which a function is called or the set of
functions to which a function returns.  If that set is a singleton,
then the function can either be prefixed onto the continuation with
which it is called or made local to the function to which it returns.

Added Jan. 2001

Notes:
Abstractly, a program consists of a main function and a set of tail and nontail
calls.  A tail call consists of a caller and a callee.  A nontail call consists
of a caller, callee, and jump.

        j in    Jump
        f in    Fun
                TailCall = Fun x Fun
                NonTailCall = Fun x Fun x Jump
        p in    Program = Fun x P(TailCall) x P (NonTailCall)

An analysis is a map from functions to the continuation that they return to.

                RCont = {Uncalled, Unknown} U Jump U Fun
        A in    Analysis = Fun -> RCont

Here is the intended meaning of A.

        A(f)            meaning
        --------        --------------------------------------------------
        Uncalled        f is never called during evaluation of the program
        j in Jump       f always returns to continuation j
        g in Fun        f always returns to function g
        Unknown         f returns to many places

An analysis A is safe for program p = (fm, T, N) if 
        *   A(fm) = Unknown
        **  For all nontail calls (f, g, j) in N,
                A(f) = Uncalled 
                or A(g) in {j, Unknown}
        *** For all tail calls (f, g) in T,
                A(f) = Uncalled
                or A(g) in {f, A(f), Unknown}

Theorem 1: Let B be a safe analysis.
           If there exists a path of calls from fm to f, 
            then B(f) <> Uncalled.
Proof:
Suppose there exists a path of calls from fm to f.
 Proceed by induction on the length of the path.
 n = 0: Then f = fm.
        B(f) = Unknown (by *, because B is safe). 
        Hence B(f) <> Uncalled.
 n > 0: Then there exists g 
         such that there exists a path of n-1 calls from fm to g
         and (g, f, j) in N or (g, f) in T.
        By induction hypothesis, B(g) <> Uncalled.
        Suppose (g, f, j) in N.
         B(f) in {j, Unknown} (by **, because B is safe).
         Hence B(f) <> Uncalled.
        Suppose (g, f) in T.
         B(f) in {g, B(g), Unknown} (by ***, because B is safe).
         Hence B(f) <> Uncalled.

Define A'(f) = Unknown   if there exists a path of calls from fm to f
             = Uncalled  otherwise

Theorem 2: A' is a safe analysis.
Proof:
* A'(fm) = Unknown.
** Let (F, G, J) be an arbitrary element of N.
   Suppose A'(F) = Uncalled.
    Then done.
   Suppose A'(F) = Unknown.
    Then there exists a path of calls from fm to F (by definition of A').
    Then there exists a path of calls from fm to G.
    Hence, A'(G) = Unknown in {J, Unknown}.
*** Let (F, G) be an arbitrary element of T.
    Suppose A'(F) = Uncalled.
     Then done.
    Suppose A'(F) = Unknown.
     Then there exists a path of calls from fm to F (by definition of A').
     Then there exists a path of calls from fm to G.
     Hence, A'(G) = Unknown in {F, A'(F), Unknown}.

Now define a directed graph G = (Node, Edge) where
        Node = Jump U Fun U {Root}
        Edge = {(Root, fm)}
               U {(Root, j) | j in Jump}
               U {(Root, f) | A'(f) <> Unknown}
               U {(f, g) | (f, g) in T and A'(f) <> Uncalled
                                       and A'(g) = Unknown}
               U {(j, g) | (f, g, j) in N and A'(f) <> Uncalled
	                                  and A'(g) = Unknown}

Theorem 3: For all f in Fun, there is a path from Root to f in G.
Proof:
If A'(f) <> Unknown, then (Root, f) in Edge (by third clause).
If A'(f) = Unknown,
 then there is a path of calls from fm to f (by definition of A').
 Consider the longest (possibly empty) suffix of this path that consists
  of all tail calls: (f0, f1) (f1, f2) .... (fn-1, f).
 Then there is a path from f0 to f in G (by fourth clause),
  because forall i, A'(fi) = Unknown 
  because fi is on a path of calls from fm to f.
 If f0 = fm,
  then (Root, f0) in Edge (by first clause)
  and there is a path from Root to f in G.
 If f0 <> fm,
  then there exists (g, f0, j) in N
  and A'(g) = Unknown (because (g, f0, j) is on a path of calls from fm to f)
  and (Root, j) (by second clause)
  and (j, f0) in Edge (by fifth clause)
  and there is a path from Root to f in G.

Let D be the dominator tree of G rooted at Root.
For f in Fun, let parent(f) be the parent of f in D.

Define an analysis, A, based on A' and D as follows:
    A(f) = 
         if parent(f) = Root
           then A'(f)
         else the ancestor l of f in D such that parent(l) = Root

Lemma 4: If A'(f) = Uncalled, then A(f) = Uncalled.
Proof:
Suppose A'(f) = Uncalled.
Then (Root, f) in Edge (by third clause).
Then A(f) = A'(f) = Uncalled.

Lemma 5: If A(f) <> Uncalled, then A'(f) = Unknown.
Proof:
Suppose A(f) <> Uncalled.
Then A'(f) <> Uncalled (by contrapositive of Lemma 5).
Hence A'(f) = Unknown.

Theorem 6: A is a safe analysis.
Proof:
* A(fm) = if parent(fm) = Root
            then A'(f)
          else ...
        = A'(fm)
        (because (Root, fm) in Edge (by first clause))
        = Unknown
        (because A' is safe).
** Let (F, G, J) be an arbitrary element of N.
   Suppose A(F) = Uncalled.
    Then done.
   Suppose A(F) <> Uncalled.
    Then A'(F) = Unknown (by Lemma 5).
    Then there exists a path of calls from fm to F (by definition of A').
    Then there exists a path of calls from fm to G.
    Then A'(G) = Unknown.
    Then (Root, J) in Edge (by second clause)
     and (J, G) in Edge (be fifth clause).
    Then parent(G) in {Root, J} and parent(J) = Root.
    Hence A(G) in {J, A'(G)} = {J, Unknown}.
*** Let (F, G) be an arbitrary element of T.
    Suppose A(F) = Uncalled.
     Then done.
    Suppose A(F) <> Uncalled.
     Then A'(F) = Unknown (by Lemma 5).
     Then there exists a path of calls from fm to F (by definition of A').
     Then there exists a path of calls from fm to G.
     Then A'(G) = Unknown.
     Then (F, G) in Edge (by fourth clause).
     Suppose parent(G) = Root.
      Then A(G) = A'(G) = Unknown.
      Hence A(G) in {F, A(F), Unknown}.
     Suppose parent(G) <> Root.
      Then A(G) = L, the ancestor of G in D with parent(L) = Root.
      Suppose L = F.
       Then A(G) = L = F.
       Hence A(G) in {F, A(F), Unknown}.
      Suppose L <> F.
       Then A(F) = L (because L dominates F).
       Then A(G) = L = A(F).
       Hence A(G) in {F, A(F), Unknown}.

Lemma 7: Let B be a safe analysis.
         Let Root -> l -> f0 ... -> fn be a path in G.
         Then B(fn) in {fn-1, ..., f0, l, Unknown}.
Proof:
Let B be a safe analysis.
Let Root -> l -> f0 -> ... -> fn be a path in G.
Proceed by induction on n.
 n = 0: Suppose l in Fun.
         Then (Root, l) in Edge (by first or third clause)
          and (l, f) in Edge (by fourth clause).
         Then (l, f) in T and A'(l) <> Uncalled.
         Then there exists a path of calls from fm to l
          (by definition of A').
         Then B(l) <> Uncalled (by Theorem, because B is safe).
         Then B(f) in {l, B(l), Unknown}.
         Note (Root, l) in Edge implies l = fm
                                     or A'(l) <> Unknown.
         Suppose l = fm.
          Then B(l) = Unknown (by *, because B is safe).
         Suppose A'(l) <> Unknown.
          Then A'(l) = Uncalled.
          Then (l, f0) notin Edge (by fourth clause). ==><==
         Hence, B(l) = Unknown.
         Therefore, B(f) in {l, Unknown}.
        Suppose l in Jump.
         Then (Root, l) in Edge (by second clause)
          and (l, f) in Edge (by fifth clause).
         Then (g, f, l) in N and A'(g) <> Uncalled.
         Then there exists a path of calls from fm to g
          (by definition of A').
         Then B(g) <> Uncalled (by Theorem 1, because B is safe).
         Then B(f) in {l, Unknown}.
 n > 0: Then (fn-1, fn) in Edge (by fourth clause).
        Then (fn-1, fn) in T and A'(fn-1) <> Uncalled.
        Then there exists a path of calls from fm to fn-1 
         (by definition of A').
        Then B(fn-1) <> Uncalled (by Theorem 1, because B is safe).
        Then B(fn) in {fn-1, B(fn-1), Unknown}.
        Then B(fn-1) in {fn-2, ..., f0, l, B(l), Unknown} 
         (by induction hypothesis).
        Hence, B(fn) in {fn-1, ..., f0, l, B(l), Unknown}.

Lemma 8: Let B be a safe analysis.
         If B(f) = Uncalled, then A(f) = Uncalled.
Proof:
Let B be a safe analysis.
Suppose B(f) = Uncalled.
 Then there does not exist a path of calls from fm to f 
  (by contrapositive of Theorem 1, because B is safe).
 Then A'(f) = Uncalled (by definition of A').
 Hence A(f) = Uncalled (by Lemma 4).

Lemma 9: Let B be a safe analysis.
         If B(f) = j in Jump, then A(f) in {Uncalled, j}.
Proof:
Let B be a safe analysis.
Suppose B(f) = j in Jump.
 Suppose A'(f) = Uncalled.
  Then done.
 Suppose A'(f) = Unknown.
  Then (Root, f) notin Edge (by third clause).
  Suppose f = fm.
   Then B(f) = Unknown (by *, because B is safe).
   But B(f) = j in Jump. ==><==
  Hence f <> fm and (Root, f) notin Edge (by first clause).
  Thus, every path from Root to f in G has length greater than 1.
  Suppose Root -> l -> f0 -> ... -> fn-1 -> f is a path from Root to f in G.
   Then B(f) = h in {fn-1, ..., f0, l, Unknown} (by Lemma 7).
   Thus, l = j (because fi in Fun).
  Hence, any path from Root to f in G passes through j.
  Therefore, j dominates f in G.
  Then, parent(f) <> Root.
  Hence, A(f) = j (because (Root, j) in Edge (by second clause)).

Lemma 10: Let B be a safe analysis.
          If B(f) = h in Fun, then A(f) in {Uncalled} U Jump U Fun.
Proof:
Let B be a safe analysis.
Suppose B(f) = h in Fun.
 Suppose A'(f) = Uncalled.
  Hence A(f) = Uncalled (by Lemma 4).
 Suppose A'(f) = Unknown.
  Then (Root, f) notin Edge (by third clause).
  Suppose f = fm.
   Then B(f) = Unknown (by *, because B is safe).
   But B(f) = h in Fun. ==><==
  Hence f <> fm and (Root, f) notin Edge (by first clause).
  Thus, every path from Root to f in G has length greater than 1.
  Suppose Root -> l -> f0 -> ... -> fn-1 -> f is a path from Root to f in G.
   Then B(f) = h in {fn-1, ..., f0, l, Unknown} (by Lemma 7).
   Thus, l = h or there exists i such that fi = h.
  Hence, any path from Root to f in G passes through h.
  Therefore, h dominates f in G.
  Then, parent(f) <> Root.
  Hence, A(f) in Jump U Fun.

Theorem 11: Let B be a safe analysis.
            If B(f) = Uncalled, then A(f) = Uncalled.
            If B(f) = j in Jump, then A(f) in {Uncalled, j}.
            If B(f) = h in Fun, then A(f) in {Uncalled} U Jump U Fun.
Proof:
Let B be a safe analysis.
Suppose B(f) = Uncalled.
 Then A(f) = Uncalled (by Lemma 8).
Suppose B(f) = j in Jump.
 Then A(f) <= j (by Lemma 9).
Suppose B(f) = h in Fun.
 Then A(f) in {Uncalled} U Jump U Fun (by Lemma 10).

Corollary 12: Let B be a safe analysis.
              If B(f) <> Unknown, then A(f) <> Unknown.
Proof: Immediate from Theorem 11.

*****************************************************************************
Examples
*****************************************************************************

fun f () = let ... in loop () end
fun loop () = let ... in loop () end
fun g () = ... K1 (f ()) ... K2 (f ()) ...

(g, 
 {(g, f, K1), (g, f, K2)}, 
 {(f, loop), (loop, loop)})

A'(f) = Unknown
A'(loop) = Unknown
A'(g) = Unknown

Edge = 
{(Root, g)}
U {(Root, K1), (Root, K2)}
U {}
U {(f, loop), (loop, loop)}
U {(K1, f), (K2, f)}

A(f) = Unknown
A(loop) = f
A(g) = Unknown

*****************************************************************************

fun f () = ... g () ... g () ...
fun g () = ... f () ...
fun h () = ... K (f ()) ...

(h,
 {(h, f, K)},
 {(f, g), (g, f)})

A'(f) = Unknown
A'(g) = Unknown
A'(h) = Unknown

Edge =
{(Root, h)} 
U {(Root, K)} 
U {}
U {(f, g), (g, f)}
U {(K, f)}

A(f) = K
A(g) = K
A(h) = Unknown

*****************************************************************************

(Note: this is essentially the concat_1 example that is always missed
by the present call-based and continuation-based analyses.)

fun f () = ... g () ... g () ...
fun g () = ... f () ....
fun h () = ... K1 (f ()) ... K2 (f ()) ...

(h,
 {(h, f, K1), (h, f, K2)},
 {(f, g), (g, f)})

A'(f) = Unknown
A'(g) = Unknown
A'(h) = Unknown

Edge =
{(Root, h)} 
U {(Root, K1), (Root, K2)}
U {}
U {(f, g), (g, f)}
U {(K1, h), (K2, h)}

A(f) = Unknown
A(g) = f
A(h) = Unknown

*****************************************************************************

fun f () = ... h () ... g () ...
fun g () = ... h () ... f () ...
fun h () = ...
fun i () = ... K (f ()) ...

(i,
 {(i, f, K)},
 {(f, h), (f, g), (g, h), (g, f)})

A'(f) = Unknown
A'(g) = Unknown
A'(h) = Unknown
A'(i) = Unknown

Edge =
{(Root, i)}
U {(Root, K)}
U {}
U {(f, h), (f, g), (g, h), (g, f)}
U {(K, f)}

A(f) = K
A(g) = K
A(h) = K
A(i) = Unknown

*****************************************************************************

fun h () = ... K (i ()) ... L (f ()) ...
fun f () = ... g () ...
fun g () = ...
fun i () = ... f () ...

(h,
 {(h, i, K), (h, f, L)},
 {(f, g), (i, f)})

A'(h) = Unknown
A'(f) = Unknown
A'(g) = Unknown
A'(i) = Unknown

Edge =
{(Root, h)}
U {(Root, K), (Root, L)}
U {}
U {(f, g), (i, f)}
U {(K, i), (L, f)}

A(h) = Unknown
A(f) = Unknown
A(g) = f
A(i) = K

*****************************************************************************

fun u () = ... K (g ()) ...
fun g () = ...
fun f () = ... g() ...
fun fm () = ... L1 (f ()) ... L2 (f ()) ...

(fm,
 {(u, g, K), (fm, f, L1), (fm, f, L2)},
 {(f, g)})

A'(fm) = Unknown
A'(f) = Unknown
A'(g) = Unknown
A'(u) = Uncalled

Edge =
{(Root, fm)}
U {(Root, K), (Root, L1), (Root, L2)}
U {(Root, u)}
U {(f, g)}
U {(L1, f), (L2, f)}

A(fm) = Unknown
A(f) = Unknown
A(g) = f
A(u) = Uncalled

*****************************************************************************

fun u () = ... g () ...
fun g () = ...
fun f () = ... g () ...
fun fm () = ... L1 (f ()) ... L2 (f ()) ...

(fm,
 {(fm, f, L1), (fm, f, L2)},
 {(u, g), (f, g)})

A'(fm) = Unknown
A'(f) = Unknown
A'(g) = Unknown
A'(u) = Uncalled

Edge =
{(Root, fm)}
U {(Root, L1), (Root, L2)}
U {(Root, u)}
U {(f, g)}
U {(L1, f), (L2, f)}

A(fm) = Unknown
A(f) = Unknown
A(g) = f
A(u) = Uncalled

*****************************************************************************

fun fm () = ... h () ...
fun h () = ... f () ... g () ...
fun f () = ... g () ...
fun g () = ... f () ...

(fm,
 {},
 {(fm, h), (h, f), (h, g), (f, g), (g, f)})

A'(fm) = Unknown
A'(h) = Unknown
A'(f) = Unknown
A'(g) = Unknown

Edge =
{(Root, fm)
U {}
U {}
U {(fm, h), (h, f), (h, g), (f, g), (g, f)}
U {}

A(fm) = Unknown
A(h) = fm
A(f) = fm
A(g) = fm

But no way of nesting h, f, and g to mimic mutual recursion.