cps & contification

Wed, 31 Jan 2001 10:31:52 -0500 (EST)

> > Given an analysis A, define A* : Fun -> 2^RCont as follows:
> >     A*(f) = 
> >          {f} U (if A(f) in Fun
> >                   then A*(A(f))
> >                   else {A(f)})
> 
> By this defn I assume you mean the least such A* under the obvious ordering on
> Fun -> P(RCont).

Yes.  

> 
> > Lexical scoping will still be possible (although maybe harder to compute).
> ...
> 
> I didn't really follow this argument.

I didn't write that up very well.  I was trying to approach it from a
different direction, but I haven't quite worked it out yet.  I hadn't
thought of it before, but it seems that we require a "minimal" analysis to
be able to do the transformation.  For example,

fun fm () = ...
fun f () = ... g () ...
fun g () = ... f () ...
(i.e., no path of calls from fm to f or g).

Then 
A(fm) = Unknown
A(f) = g
A(g) = f
is a safe analysis, but the transformation doesn't make sense (can't both
contify f in g and g in f).  What's the approach for arguing that the
intended transformation can be applied?  It seems that a "minimal"
analysis somehow ensures that there are no "cycles" in A; this is
certainly the case in the dominator approach.

Well, if A has no cycles, then it should be clear that A* is inductive(?).
By this I mean that if A(f) = h in Fun, then A*(f) = {f} U A*(h) and f
notin A*(h).  This means we can do induction on |A*(f)| and I would argue
that an analyis A which is safe (under the new def) and acyclic will allow
the intended transformation:

Assume that A is an acyclic, safe analysis. 

Also, assume that {f | A(f) = l}, the set of functions to be contified at
l in Fun U Jump, are contified as a mutual recursive set.

Contification occurs as follows:
 If l in Fun, then {f | A(f) = l} are contified as a mutual recursive set
  as the first declaration in the body of l.
 If j in Jump, then {f | A(f) = l} are contified as a mutual recursive set
  as the first declaration after the declaration of j.

Claim:
if l in Jump U Fun, l in A*(f), and A(g) = l,
 then f and g are contified such that (the body of) f is in the lexical
 scope of g.

Proof: by induction on |A*(f)|
 |A*(f)| = 2: Then A*(f) = {f, A(f)} and A(f) notin Fun.
              If A(g) = f, g is contified as the first dec in the body of
               f; now the original body of f will be in the scope of g.
              If A(g) = A(f), then f and g are contified as a mutual
               recursive set.
 |A*(f)| > 2: Then A*(f) = {f} U A*(A(f)) and A(f) = h in Fun.
              If A(g) = f, then like previous case.
              If A(g) in A*(A(f)) = A*(h),
               then h and g can be contified such that the body of h is in
               the lexical scope of g (by induction hypothesis, because
               |A*(h)| = |A*(f)| - 1). Further, f will be contified in
               (the body of h), so the body of f is in the lexical scope 
               of g.

Now the safety constraints guarantee that all contification is done
lexically (ignoring mutual recursion).

If (f, g) in T and A(f) <> Uncalled, then A(g) in A*(f) U {Unknown}.
 By the previous argument, f and g will be contified such that the body of   
 f is in the lexical scope of g.  Therefore the instance of the call 
 (f, g) will be in the scope of g.

If (f, g, j) in N and A(f) <> Uncalled, then A(g) in {j, Unknown}.
 If A(g) = j, then g is contified in f (immediately) after the declaration
  of j.  The instance of the call (f, g, j) must have occured after
  the declaration of j; hence the instance of the call (f, g, j) will be
  in the scope of g.

I know there is a little more to contification than what I mentioned above
(in particular, when A(f) = j in Jump, we need to change transfers from
tail-calls to non-tail calls, and returns to jumps).  But I think for the
purposes of arguing that lexical scoping is retained, the description
above is sufficient.  The remaining transformations will simply add
occurences of calls to the continuation j in the functions which were
contified immediately after the declaration of j, so scoping works out
there.

> It does this because it doesn't introduce as many sccs, right?

Well, it splits up the functions which into different return
continuations.  Here are a few examples:

fun fm () = ... h () ...
fun h () = ... f () ... g () ...
fun f () = ... g () ...
fun g () = ... f () ...

(fm,
 {},
 {(fm, h), (h, f), (h, g), (f, g), (g, f)})

Edge =
{(Root, fm)
U {}
U {}
U {(fm, h), (h, f), (h, g), (f, g), (g, f)}
U {}

Using the "ancestor" version of A, we have
A(fm) = Unknown
A(h) = fm
A(f) = fm
A(g) = fm

There is no way of nesting f and g to mimic mutual recursion, so nothing
is contified in fm.

Using the "parent" version of A, we have
A(fm) = Unknown
A(h) = fm
A(f) = h
A(g) = h

Although there is no way of nesting f and g to mimic mutual recursion, we
can still contify h in fm.

*****************************************************************************

This is the essential portion of the "logic" benchmark problem.

fun fm () = ... h () ...
fun h () = ... f () ... g () ...
fun f () = ... g () ... i () ...
fun g () = ... f () ...
fun i () = ...

(fm,
 {},
 {(fm, h), (h, f), (h, g), (f, g), (f, i), (g, f)})

Edge =
{(Root, fm)
U {}
U {}
U {(fm, h), (h, f), (h, g), (f, g), (f, i), (g, f)}
U {}

Using the "ancestor" version of A, we have
A(fm) = Unknown
A(h) = fm
A(f) = fm
A(g) = fm
A(i) = fm

This shows why we can't just "throw out" sccs with more than one outside
caller; we would throw out f and g, but then we can't contify i in fm,
because i won't be in the scope of f. 

Using the "parent" version of A, we have
A(fm) = Unknown
A(h) = fm
A(f) = h
A(g) = h
A(i) = f

Same situation as above, but with i now contifiable in f.  

This problem can get nastier:

fun fm () = ... h () ... i () ...
fun h () = ... f () ... g () ...
fun f () = ... g () ... i () ...
fun g () = ... f () ...
fun i () = ...

(fm,
 {},
 {(fm, h), (h, f), (h, g), (f, g), (f, i), (g, f)})

Edge =
{(Root, fm)
U {}
U {}
U {(fm, h), (h, f), (h, g), (f, g), (f, i), (g, f)}
U {}

Using the "parent" version of A, we have
A(fm) = Unknown
A(h) = fm
A(f) = h
A(g) = h
A(i) = fm

Mutual recursion prevents us from contifying f and g in h.  Since f and g
remain top level functions, we also need to revise our decision to contify
i in fm, otherwise i won't be in the scope of f.  You're right that
contifying into different functions is going to get complicated.

Another issue is with contifying into a continuation.  Consider something
simple like:

fun fm () = ... K (f ()) ...
fun f () = ... g () ...
fun g () = ...

Using the "parent" version of A, we have
A(fm) = Unknown
A(f) = K
A(g) = f

(Note: this is also a safe analysis using the original definition of
safety.)

Conceptually, no problem.  But for an implementation, there is an issue. 
Because A(f) = K, in f, we replace all tail calls h(...) with non-tail
calls K(h(...)) and all returns (...) with jumps K(...).  But replacing
g() with K(g()) doesn't make sense after g is made a local function in f. 
Instead, in f, we need to turn tail calls to g() into jumps to g() and, in
g, turn tail calls into non-tail calls and returns into jumps.  It's not
clear to me if this is being handled by the current transformation
implementation.