contification paper

Matthew Fluet Matthew Fluet <fluet@CS.Cornell.EDU>
Thu, 15 Feb 2001 15:25:10 -0500 (EST)


I've started trying to gather my thoughts for the contification paper.
Some general thoughts and questions:

1. presentation of the CPS IL

- I assume that sticking to the level of detail that Steve used in the
messages to John Reppy would be good: i.e., drop the HandlerPush and
HandlerPop declarations, don't worry about the globals and datatypes, etc. 
Also, introduce mutual recursion among continuation delclarations. 

- If we'd like to  use the paper as a vehicle to introduce the MLton CPS IL, 
someone will need to fill in some of the motivation for the IL, e.g. what
(if anything) it's based on, why it developed the way it did, what
benefits it provides, etc.

2. Analysis and proofs

- I'd like to make the following formal definitions:

A safe analysis A is minimal if
 for all safe analyses B, B(f) = Uncalled => A(f) = Uncalled 
                      and B(f) <> Unknown => A(f) <> Unknown.

An analysis A is cyclic if
 there exists f0, ..., fn in Fun such that fi_1 = A(fi) and f0 = fn.
An analysis A is acyclic if A is not cyclic.

I've also got a proof that any minimal safe analysis is acyclic.  This
proof motivates the additional minimality requirement on Uncalled
functions: any cycle in an analysis should really have all those functions
as Uncalled.

- It will probably be good to develop an intuitive explaination for the
definition of safety.  Obviously this is tied in with the transformation
and the IL.  What I'm wondering is if allowing mutual recursion among
local functions makes it more or less clear that we want a set of
functions to all be contified at the same location.

- Next, it's a question of what level of detail to provide.  Certainly the
proofs of safety and minimality should be included.  Dropping one or two
of the lemmas and supporting theorems would probably be fine.

3. Transformation

- What is the best way to present the transformation?  An informal
description like this
 If l in Fun, then {f | A(f) = l} are contified as a mutual recursive set
  as the first declaration in the body of l.
 If j in Jump, then {f | A(f) = l} are contified as a mutual recursive set
  as the first declaration after the declaration of j.
would seem to capture the intended transformation.  Can we be more formal?
Also, there is the rewriting of the calls.

- There is still a question of ordering: once I contify a function f in
another function g, what if I later find a function h that is to be
contified in f?  The acyclic nature of the analysis will give us the right
order to perform the contification, although we need to be a little more
careful about contification in a jump j in order to contify functions in j
before the j's parent function is contified somewhere else.  But maybe
these are just implementation details. 

- I think it is worth arguing that the resulting program is still
lexically scoped.  Is there anything else short of a full proof of
correctness that would be worth mentioning?

4. Motivation & Examples

- The general motivation for contification is that the intended
code-generation is for each top-level function will require one
stack-frame for itself and all of it's local functions.  By contifying
functions (i.e., by making top-level functions into a local functions), we
save stack frames and reduce the overhead for calls.  Are there more
specific motivations?  Steve notes that it was the first optimization
added to MLton's CPS IL, so are any other optimizations enabled, or
benefit significantly, by contification?  (Unfortunately, the only direct
correlation between optimizations that I know of is that additional
contification inhibits inlining.)

- Personally, I like to see examples worked out in full detail (probably
evident from some of my past posts).  Something like a small finite-state
machine would be easy.  But I would like to see some mixing of
contification in jumps and funcs.  But, I don't quite have an real
understanding of how tail calls and non-tail calls are generated from
inital programs.

5. Related analyses

- Should we mention the call-based analysis?  It might be a good lead in /
base point for discussion.  It's very easy to explain and makes a lot of
intuitive sense.  We can give a short example of where the call-based
analysis works (e.g., the nested loops), which motivates the
transformation and gives an intutive analysis.  Then present the even-odd
example where the call-based analysis fails, but it's fairly obvious that
the same "trick" could be pulled if only our analysis were better.  The
call based analysis also is a good sanity check: does our definition of
safety make sense?

- Obviously the Local CPS Conversion paper is mentioned in the related
work.  Does it belong elsewhere as well?  It was a prime motivation for
the development of this analysis.  In any event, one comment I think is
work making is that this analysis is helped by the first-order nature of
the CPS IL; without escaping functions, we can be much more accurate about
the call graph and hence get a more accurate analysis.

- There is also the A*/parent analysis.  We could present this, but some
of the proofs are a little trickier.  It might just make a good closing
remark: "Suppose we were interested in getting the "best" nesting
possible.  Here's an example [...] where this analysis says one set of
contifications, but intutitively, this other set of contifications would
yield better nesting.  The oroblem is that this other set of
contifications is not safe with respect to our definition of safety.
However, it is safe with respect to this new definition of safety [...].
Likewise, if we change our analysis to the following [...], we get an
analysis that yields the better set of contifications.  All of the
previous results, can be proved with these definition of safety and
analysis.  On the other hand, the original analysis is also minimal with
respect to this definition of safety."  What we're missing is a definition
of "best nesting" -- which is going to be program dependent.  Since that
looks to be like another arduous road (but maybe not), it's probably best
left as a brief note.

6. Results and Reality

- Compile vs. run-time: We have good counts of the number of functions
contifiable by all the analyses.  For mlyacc, 472 out of 817 functions are
marked as contifiable (and 2 are marked Uncalled).  That seems pretty
impressive, but I don't know if a lot of that arises from the fact that
mlyacc is itself comprised of lexing/parsing machine generated code, which
might be prone to contification.  The last numbers I have for a
self-compile are 3733 out of 9180 marked contifiable, with 968 Uncalled
(still, just over 50%).

- The runtimes aren't as impressive:
                                                dom/    dom/    dom/
                none    call    both    dom     none    call    both
barnes-hut      9.97    7.46    7.43    8.52    0.85    1.14    1.15
checksum        15.53   7.81    7.81    7.81    0.50    1.00    1.00
count-graphs    14.63   11.02   10.69   10.70   0.73    0.97    1.00
fft             32.12   31.07   31.08   29.60   0.92    0.95    0.95
fib             6.52    6.52    6.52    6.52    1.00    1.00    1.00
knuth-bendix    11.39   12.19   12.31   12.29   1.08    1.01    1.00
lexgen          27.60   21.50   21.37   21.63   0.78    1.01    1.01
life            44.78   34.77   34.77   38.42   0.86    1.10    1.11
logic           35.95   35.57   35.59   35.56   0.99    1.00    1.00
mandelbrot      56.24   13.02   13.02   13.01   0.23    1.00    1.00
matrix-multiply 14.14   6.81    6.81    6.74    0.48    0.99    0.99
(had to stop it here, I'll rerun these overnight)

Relative to no contification, pretty good.  I think I can explain
barnes-hut, but life (and I think tensor shows the same effect) I don't
know.  If dom always did better than call, I'd say that's another reason
to introduce the call-based analysis as a stepping ground.

- Should we mention the fact that the real CPS IL doesn't support mutual
recursion among local functions?  The only advantage would be to point out
that it's not harmful in practice.  I think that it's interesting that
mlyacc and raytrace have no problems, and those benchmarks have an
intuitive feel that there is a lot of mutual recursion going on.

7. Misc.

- Title suggestions

- terminology: Any thoughts on how to introduce/use "contification."
Informally, I tend to use it as a verb ("to contify") and as an adjective.  

- also, how important is the distinction between the Func and Jump
(alternatively Cont) sets.  It's actually important in some of the proofs:
knowing that Func and Jump are disjoint is used in the proof of
minimality.  

- Anything else that I seem to be missing?