cps & contification

Matthew Fluet fluet@CS.Cornell.EDU
Thu, 18 Jan 2001 11:31:17 -0500 (EST)


Don't worry, I didn't find any new bugs.  Here's an updated description of
the whole analysis.  I updated the proof that any fixed point of C is safe
with the new definition of C.  I also added a proof of the claim that
A'(f) = Uncalled iff there does not exist a path from fm to f.  Finally, I
ran through all of the problematic contification examples from the past
emails, and they all work out the way we would like.

*****************************************************************************
*****************************************************************************

Abstractly, a program consists of a main function and a set of tail and nontail
calls.  A tail call consists of a caller and a callee.  A nontail call consists
of a caller, callee, and jump.

        j in    Jump
        f in    Fun
                TailCall = Fun x Fun
                NonTailCall = Fun x Fun x Jump
        p in    Program = Fun x P(TailCall) x P (NonTailCall)

An analysis is a map from functions to the continuation that they return to.

                RCont = {Uncalled, Unknown} U Jump U Fun
        A in    Analysis = Fun -> RCont

Here is the intended meaning of A.

        A(f)            meaning
        --------        ------------------------------------------------
        Uncalled        f is never called during evaluation of the program
        j in Jump       f always returns to continuation j
        g in Fun        f always returns to function g
        Unknown         f returns to many places

An analysis A is safe for program p = (fm, T, N) if 
        *   A(fm) = Unknown
        **  For all nontail calls (f, g, j) in N,
                A(f) = Uncalled 
                or A(g) in {j, Unknown}
        *** For all tail calls (f, g) in T,
                A(f) = Uncalled
                or A(g) in {f, A(f), Unknown}

Turn RCont into a lattice with the following order
        Uncalled <= x
        x <= x
        x <= Unknown
where x is an arbitrary element of Fun U Jump.

Extend <= pointwise to make A into a lattice.

Define the function C: Analysis -> Analysis by
    C(A)(g) =
	 if g = fm
           then Unknown
         else lub ({j | (f, g, j) in N and A(f) <> Uncalled}
                   U {A(f) | (f, g) in T})

Claim: C is monotone and continuous.
Proof: Clear.

Claim: if A = C(A) then A is safe.
Proof:
*  A(fm) = C(A)(fm) 
          = if fm = fm     
              then Unknown
            else ...
          = Unknown
** Let (F, G, J) be an arbitrary element of N
   Suppose A(F) = Uncalled.
    Then done.
   Suppose A(F) <> Uncalled.
    Then A(G) = C(A)(G)
              = if G = fm
                  then Unknown
                else lub ({j | (f, g, j) in N and A(f) <> Uncalled}
                          U {A(f) | (f, g) in T})
              = if G = fm
                  then Unknown
                else J \/ lub ({j | (f, g, j) in N and A(f) <> Uncalled}
                               U {A(f) | (f, g) in T})
              in {J, Unknown} 
              subset {J, Unknown}
*** Let (F, G) be an arbitrary element of T
    Suppose A(F) = Uncalled.  
     Then done.
    Suppose A(F) <> Uncalled.
     Then A(G) = C(A)(G)
               = if G = fm
                   then Unknown
                 else lub ({j | (f, g, j) in N and A(f) <> Uncalled}
                           U {A(f) | (f, g) in T})
               = if G = fm
                   then Unknown
                 else A(F) \/ lub ({j | (f, g, j) in N and A(f) <> Uncalled}
                                   U {A(f) | (f, g) in T})
               in {A(F), Unknown}
               subset {F, A(F), Unknown}

Corollary: if A = C(A) then A satisfies:
           ***' for all tail calls (f, g) in T,
                    A(f) = Uncalled
                    or A(g) in {A(f), Unknown}
Proof: By examination of case *** of previous proof.

Let A' = lfp(C).

Fact: A' is safe (because A' = C(A')).

Claim: A'(f) = Uncalled iff 
       there does not exist a path of calls from fm to f.
Proof:
==> (prove contrapositive: If there exists a path of calls from fm to f,
                            then A'(f) <> Uncalled.)
Suppose there exists a path of calls from fm to f.
Proceed by induction on the length of the path.
n = 0: Then f = fm.
       A'(f) = Unknown (because A' is safe). 
       Hence A'(f) <> Uncalled.
n > 0: Then there exists g 
        such that there exists a path of n-1 calls from fm to g
        and (g, f) in T or (g, f, j) in N.
       By induction hypothesis, A'(g) <> Uncalled.
       Suppose (g, f) in T.
        A'(f) in {g, A'(g), Unknown} because A' is safe.
        Hence A'(f) <> Uncalled.
       Suppose (g, f, j) in N.
        A'(f) in {j, Unknown} because A' is safe.
        Hence A'(f) <> Uncalled.
<==
Suppose there does not exist a path of calls from fm to f.
 Let Caller*(f) be the transitive closure of the set of callers of f.
 Then forall g in Caller*(f).
  there does not exist a path of calls from fm to g
  and
  forall (h, g, j) in N. h in Caller*(f)
  and
  forall (h, g) in T. h in Caller*(f).
 Let B(g) = if g in Caller*(f) then Uncalled else Unknown.
 Suppose g in Caller*(f).
 Then C(B)(g) = if g = fm
                 then Unknown
                else lub ({j | (h, g, j) in N and B(h) = Uncalled}
                          U {A(h) | (h, g) in T})
              = lub ({j | (h, g, j) in N and B(h) = Uncalled}
                     U {A(h) | (h, g) in T})
              (because there does not exist a path of calls from fm to g)
              = Uncalled
              (because forall (h, g, j) in N. h in Caller*(f) 
                                          and B(h) = Uncalled
                   and forall (h, g) in T. h in Caller*(f) 
                                       and B(h) = Uncalled)
              = B(g)
              (because g in Caller*(f)).
 Suppose g notin Caller*(f).
 Then C(B)(g) <= Unknown
              (upper bound) 
              = B(g)
              (because g notin Caller*(f)).
 Hence, C(B) <= B.
 Therfore, lfb(C) <= B (by Fixed-Point Theorem).
 Hence, forall g in Caller*(f). A'(f) = lfb(C)(g) = Uncalled.
 Hence, A'(f) = Uncalled.

Now define a directed graph G = (Node, Edge) where 
        Node = Fun U {Root}
        Edge = {(f, g) | (f, g) in T and A'(g) = Unknown}
                U {(Root, f) | A'(f) <> Unknown}
                U {(Root, f) | A'(f) = Unknown and exists (g, f, j) in N}
                U {(Root, fm)}

Claim: For all f in Fun, there is a path from Root to f in G.
Proof: 
If A'(f) = <> Unknown, then (Root, f) in Edge (by second clause).
If A'(f) = Unknown,
 then there is a path of calls from fm to f 
 (because A' = lfp(C) and A'(f) <> Uncalled).
 If the last call on this path is a nontail call (g, f, j),
  then (Root, f) in Edge (by third clause).
 Otherwise, consider the longest suffix of this path that consists
 of all tail calls: (f0, f1) (f1, f2) ... (fn-1, f)
 If forall i, A'(fi) = Unknown,
  then there is a path from f0 to f in G (by first clause).
  If f0 = fm,
   then (Root, f0) in Edge (by fourth clause)
   and there is a path from Root to f in G.
  If f0 <> fm,
   then there exists (g, f0, j) in N
   and (Root, f0) in Edge (by third clause)
   and there is a path from Root to f in G.
 If exists k, A'(fk) <> Unknown and forall i > k, A'(fi) = Unknown
  then there is a path from fk+1 to f in G (by first clause)
  and (Root, fk) in Edge (by second clause)
  and there is a path from Root to f in G.

Let D be the dominator tree of G rooted at Root.
For f in Fun, let parent(f) be the parent of f in D.

Define an analysis, A, based on A' and D as follows:
    A(f) = 
         if A'(f) <> Unknown
           then A'(f)
          else if parent(f) = Root
                 then Unknown
               else the ancestor g of f in D such that parent(g) = Root

Claim: A is safe.
Proof:
* A(fm) = if A'(fm) <> Unknown
            then A'(fm)
          else if parent(fm) = Root
                 then Unknown
               else the ancestor g of f in D such that parent(g) =  Root
        = if parent(fm) = Root
            then Unknown
            else the ancestor g of f in D such that parent(g) = Root
        = Unknown
** Let (f, g, j) be an arbitrary element of N
   if A'(g) = Unknown
     then A(g) = Unknown because (Root, g) in Edge (by third clause)
       hence A(g) in {j, Unknown}
   else A(g) = A'(g)
     if A'(f) = Uncalled
       then A(f) = Uncalled 
       else A'(g) = j (because A' is safe and A'(f) <> Uncalled)
         hence A(g) = A'(g)
         hence A(g) in {j, Unknown}
*** Let (f, g) be an arbitrary element of T
    if A'(g) = Unknown
      then (f, g) in Edge (by first clause)
        if parent(g) = Root
          then A(g) = Unknown 
            hence A(g) in {f, A(f), Unknown}
          else A(g) = h, the ancestor of g in D with  parent(h) = Root.
            if h = f
              then A(g) = h = f
                hence A(g) in {f, A(f), Unknown}
              else A'(f) = Unknown (because h dominates f)
                hence A(f) = h
                hence A(g) = h = A(f)
                hence A(g) in {f, A(f), Unknown}
      else A'(g) <> Unknown
        hence A'(g) in {A'(f)} (by corollary)
        hence A(g) = A'(g) = A'(f) = A(f)
        hence A(g) in {f, A(f), Unknown}


*****************************************************************************
*****************************************************************************

fun f () = let ... in loop () end
fun loop () = let ... in loop () end
fun g () = ... K1 (f ()) ... K2 (f ()) ...

(g, 
 {(g, f, K1), (g, f, K2)}, 
 {(f, loop), (loop, loop)})

A'(f) = Unknown
A'(loop) = Unknown
A'(g) = Unknown

Edge = 
{(f, loop), (loop, loop)}
U {}
U {(Root, f)}
U {(Root, g)}

A(f) = Unknown
A(loop) = f
A(g) = Unknown

*****************************************************************************

fun f () = ... g () ... g () ...
fun g () = ... f () ...
fun h () = ... K (f ()) ...

(h,
 {(h, f, K)},
 {(f, g), (g, f)})

A'(f) = K
A'(g) = K
A'(h) = Unknown

Edge =
{} U
{(Root, f), (Root, g)} U
{}
{(Root, h)}

A(f) = K
A(g) = K
A(h) = Unknown

*****************************************************************************

(Note: this is essentially the concat_1 example that is always missed
by the present call-based and continuation-based analyses.)

fun f () = ... g () ... g () ...
fun g () = ... f () ....
fun h () = ... K1 (f ()) ... K2 (f ()) ...

(h,
 {(h, K1, f), (h, K2, f)},
 {(f, g), (g, f)})

A'(f) = Unknown
A'(g) = Unknown
A'(h) = Unknown

Edge =
{(f, g), (g, f)} 
U {}
U {(Root, f)} 
U {(Root, h)}

A(f) = Unknown
A(g) = f
A(h) = Unknown

*****************************************************************************

fun f () = ... h () ... g () ...
fun g () = ... h () ... f () ...
fun h () = ...
fun i () = ... K (f ()) ...

(i,
 {(i, f, K)},
 {(f, h), (f, g), (g, h), (g, f)})

A'(f) = K
A'(g) = K
A'(h) = K
A'(i) = Unknown

Edge =
{} 
U {(Root, f), (Root, g), (Root, h)} 
U {}
U {(Root, i)}

A(f) = K
A(g) = K
A(h) = K
A(i) = Unknown

*****************************************************************************

fun h () = ... K (i ()) ... L (f ()) ...
fun f () = ... g () ...
fun g () = ...
fun i () = ... f () ...

(h,
 {(h, i, K), (h, f, L)},
 {(f, g), (i, f)})

A'(h) = Unknown
A'(f) = Unknown
A'(g) = Unknown
A'(i) = K

Edge =
{(f, g), (i, f)} 
U {(Root, i)}
U {(Root, f)}
U {(Root, h)}

A(h) = Unknown
A(f) = Unknown
A(g) = f
A(i) = K

*****************************************************************************