cps & contification

Stephen Weeks MLton@sourcelight.com
Mon, 29 Jan 2001 11:52:38 -0800 (PST)


I read through the proofs.  Very nice.  Things are getting simpler and simpler.
I have one proposal for further simplification.  I think it is easier to avoid
defining A' entirely.  Instead, define a predicate Reach: Fun -> Bool so that
Reach(f) iff there is a path of calls from fm to f.  Then, instead of writing
A'(f) = Unknown, write Reach(f) and instead of writing A'(f) = Uncalled, write
not(Reach(f)). 

I find it easier to reason directly in terms of reach than going through A'.
Following is a reworking of the framework.  I also simplified the proof of
Theorem 3 and combined Lemmas 4 and 5.

I guess you're working on the implementation as we speak?

I'm still observing the weird space problems with the old implementation using
dominators.  I will investigate those further, becuase they might be due to the
compiler being not safe-for-space.

--------------------------------------------------------------------------------

Abstractly, a program consists of a main function and a set of tail and nontail
calls.  A tail call consists of a caller and a callee.  A nontail call consists
of a caller, callee, and jump.

        j in    Jump
        f in    Fun
                TailCall = Fun x Fun
                NonTailCall = Fun x Fun x Jump
        p in    Program = Fun x P(TailCall) x P (NonTailCall)

An analysis is a map from functions to the continuation that they return to.

        A in    Analysis = Fun -> {Uncalled, Unknown} U Jump U Fun

Here is the intended meaning of A.

        A(f)            meaning
        --------        --------------------------------------------------
        Uncalled        f is never called during evaluation of the program
        j in Jump       f always returns to continuation j
        g in Fun        f always returns to function g
        Unknown         f returns to many places

An analysis A is safe for program p = (fm, T, N) if 
        *   A(fm) = Unknown
        **  For all nontail calls (f, g, j) in N,
                A(f) = Uncalled 
                or A(g) in {j, Unknown}
        *** For all tail calls (f, g) in T,
                A(f) = Uncalled
                or A(g) in {f, A(f), Unknown}

Define Reach: Fun -> Bool by Reach(f) iff there is a path of calls from fm to f.

Theorem 1: Let B be a safe analysis.
           If Reach(f) then B(f) <> Uncalled.
Proof:
If Reach(f), then there exists a path of calls from fm to f.
 Proceed by induction on the length of the path.
 n = 0: Then f = fm.
        B(f) = Unknown (by *, because B is safe). 
        Hence B(f) <> Uncalled.
 n > 0: Then there exists g 
         such that there exists a path of n-1 calls from fm to g
         and (g, f, j) in N or (g, f) in T.
        By induction hypothesis, B(g) <> Uncalled.
        Suppose (g, f, j) in N.
         B(f) in {j, Unknown} (by **, because B is safe).
         Hence B(f) <> Uncalled.
        Suppose (g, f) in T.
         B(f) in {g, B(g), Unknown} (by ***, because B is safe).
         Hence B(f) <> Uncalled.

Now define a directed graph G = (Node, Edge) where
        Node = Jump U Fun U {Root}
        Edge = {(Root, fm)}
               U {(Root, j) | j in Jump}
               U {(Root, f) | not(Reach(f))}
               U {(f, g) | (f, g) in T and Reach(f) and Reach(g)}
               U {(j, g) | (f, g, j) in N and Reach(f) and Reach(g)}

Theorem 3: For all f in Fun, there is a path from Root to f in G.
Proof:
Either Reach(f) or not(Reach(f).

If Reach(f), then there is a path of calls from fm to f.  Every g in Fun on
this path also has Reach(g).  Hence every call on the path corresponds to an
edge in Edge by fourth or fifth clause.  Since (Root, fm) in Edge, there is a
path from Root to f in G. 

If not(Reach(f)), then (Root, f) in Edge by third clause.  Hence there is a path
from Root to f in G.

Let D be the dominator tree of G rooted at Root.
For f in Fun, let parent(f) be the parent of f in D.

Define an analysis, A, based on D as follows:
    A(f) = 
         if parent(f) = Root
           then if Reach(f) then Unknown else Uncalled
         else the ancestor l of f in D such that parent(l) = Root

Lemma 4: Reach(f) iff A(f) <> Uncalled.
  (equivalently, not(Reach(f)) iff A(f) = Uncalled)
Proof:
By definition of A, A(f) = Uncalled iff parent(f) = Root and not(Reach(f)).
Hence, if Reach(f) then either A(f) = Unknown or A(f) in Fun U Jump.
Also, if not(Reach(f)), then (Root, f) in Edge (by third clause).
Then parent(f) = Root and A(f) = Uncalled.

Theorem 6: A is a safe analysis.
Proof:
* A(fm) = if parent(fm) = Root
            then if Reach(f) then Unknown else Uncalled
          else ...
        = if Reach(f) then Unknown else Uncalled
          (because (Root, fm) in Edge by first clause)
        = Unknown
          (because Reach(fm))
** Let (f, g, j) be an arbitrary element of N.
   Suppose A(f) = Uncalled.
    Then done.
   Suppose A(f) <> Uncalled.
    Then Reach(f), by Lemma 4.
    Then Reach(g), by definition of Reach and (f, g, j) in N.
    Then (Root, j) in Edge (by second clause)
     and (j, g) in Edge (be fifth clause).
    Then parent(g) in {Root, j} and parent(j) = Root.
    Hence A(g) in {j, Unknown}.
*** Let (f, g) be an arbitrary element of T.
   Suppose A(f) = Uncalled.
    Then done.
   Suppose A(f) <> Uncalled.
    Then Reach(f), by Lemma 4.
    Then Reach(g), by definition of Reach and (f, g) in T.
    Then (f, g) in Edge (by fourth clause).
    Suppose parent(g) = Root.
     Then A(g) = Unknown in {f, A(f), Unknown}.
    Suppose parent(g) <> Root.
     Then A(g) = L, the ancestor of g in D with parent(L) = Root.
     Suppose L = f.
      Then A(g) = L = f in {f, A(f), Unknown}.
     Suppose L <> f.
      Then A(f) = L (because L dominates f).
      Then A(g) = L = A(f) in {f, A(f), Unknown}.

Lemma 7: Let B be a safe analysis.
         Let Root -> l -> f0 ... -> fn be a path in G.
         Then B(fn) in {fn-1, ..., f0, l, Unknown}.
Proof:
Let B be a safe analysis.
Let Root -> l -> f0 -> ... -> fn be a path in G.
Proceed by induction on n.
 n = 0: Suppose l in Fun.
         Since (Root, l) in Edge, either by the first clause l = fm or 
           by the third clause not(Reach(l)).
         Since (l, f) in Edge, by the fourth clause (l, f) in T and Reach(l) and
           Reach(f) .
	 Hence l = fm.
	 Hence B(l) = Unknown (because B is safe).
	 Hence B(f) in {l, Unknown} (because B is safe and (l, f) in T).
        Suppose l in Jump.
         Then (Root, l) in Edge (by second clause)
          and (l, f) in Edge (by fifth clause).
         Then (g, f, l) in N and Reach(g) and Reach(f).
         Then B(g) <> Uncalled (by Theorem 1, because B is safe).
         Then B(f) in {l, Unknown} (by ** because B is safe).
 n > 0: Since (fn-1, fn) in Edge, by thge fourth clause, (fn-1, f) in T and
	  Reach(fn-1) and Reach(f).
        Then B(fn-1) <> Uncalled (by Theorem 1, because B is safe).
        Then B(fn) in {fn-1, B(fn-1), Unknown} (by *** because B is safe)
        Then B(fn-1) in {fn-2, ..., f0, l, B(l), Unknown} 
         (by induction hypothesis).
        Hence, B(fn) in {fn-1, ..., f0, l, B(l), Unknown}.

Lemma 8: Let B be a safe analysis.
         If B(f) = Uncalled, then A(f) = Uncalled.
Proof:
Let B be a safe analysis.
Suppose B(f) = Uncalled.
 Then not(Reach(f)), by the contrapositive of Theorem 1, because B is safe.
 Then A(f) = Uncalled, byt Lemma 4.

Lemma 9: Let B be a safe analysis.
         If B(f) = j in Jump, then A(f) in {Uncalled, j}.
Proof:
Let B be a safe analysis.
Suppose B(f) = j in Jump.
 Suppose not(Reach(f))
  Then A(f) = Uncalled by Lemma 4.
 Suppose Reach(f).
  Since B(f) <> Unknown, we have f <> fm (by * and B is safe).
  Then (Root, f) notin Edge (by third clause and first clause (f <> fm)).
  Thus, every path from Root to f in G has length greater than 1.
  Suppose Root -> l -> f0 -> ... -> fn-1 -> f is a path from Root to f in G.
   Then B(f) in {fn-1, ..., f0, l, Unknown} (by Lemma 7).
   Thus, l = j (because fi in Fun).
  Hence, any path from Root to f in G passes through j.
  Therefore, j dominates f in G.
  Then, parent(f) <> Root.
  Also, parent(j) = Root (because (Root, j) in Edge by second clause).
  Hence, A(f) = j.
  
Lemma 10: Let B be a safe analysis.
          If B(f) in Fun, then A(f) in {Uncalled} U Jump U Fun.
Proof:
Let B be a safe analysis.
Suppose B(f) = h in Fun.
 Suppose not(Reach(f))
  Hence A(f) = Uncalled by Lemma 4.
 Suppose Reach(f)
  Since B(f) <> Unknown, we have f <> fm (by * and B is safe).
  Then (Root, f) notin Edge (by third clause and first clause (f <> fm)).
  Thus, every path from Root to f in G has length greater than 1.
  Suppose Root -> l -> f0 -> ... -> fn-1 -> f is a path from Root to f in G.
   Then B(f) = h in {fn-1, ..., f0, l, Unknown} (by Lemma 7).
   Thus, l = h or there exists i such that fi = h.
  Hence, any path from Root to f in G passes through h.
  Therefore, h dominates f in G.
  Then, parent(f) <> Root.
  Hence, A(f) in Jump U Fun.

Theorem 11: Let B be a safe analysis.
            If B(f) = Uncalled, then A(f) = Uncalled.
            If B(f) = j in Jump, then A(f) in {Uncalled, j}.
            If B(f) = h in Fun, then A(f) in {Uncalled} U Jump U Fun.
Proof:
Let B be a safe analysis.
Suppose B(f) = Uncalled.
 Then A(f) = Uncalled (by Lemma 8).
Suppose B(f) = j in Jump.
 Then A(f) <= j (by Lemma 9).
Suppose B(f) in Fun.
 Then A(f) in {Uncalled} U Jump U Fun (by Lemma 10).

Corollary 12: Let B be a safe analysis.
              If B(f) <> Unknown, then A(f) <> Unknown.
Proof: Immediate from Theorem 11.