contification paper

Stephen Weeks MLton@sourcelight.com
Fri, 16 Feb 2001 13:10:23 -0800 (PST)


> 1. presentation of the CPS IL
> 
> - I assume that sticking to the level of detail that Steve used in the
> messages to John Reppy would be good: i.e., drop the HandlerPush and
> HandlerPop declarations, don't worry about the globals and datatypes, etc. 
> Also, introduce mutual recursion among continuation delclarations. 

Sounds good to me.  If it's not too bad, I wouldn't mind seeing handler push and 
pop, because I think it's neat how MLton handles handlers.  I've not seen it
done that way before.  But that probably belongs in a separate paper that
describes the CPS IL, handler inference, and the raise-to-jump optimization.

> - If we'd like to  use the paper as a vehicle to introduce the MLton CPS IL, 
> someone will need to fill in some of the motivation for the IL, e.g. what
> (if anything) it's based on, why it developed the way it did, what
> benefits it provides, etc.

No problem.  I can write that.

> 2. Analysis and proofs
> 
> - I'd like to make the following formal definitions:

Would like to use the definition of safety using A* from the start?  I'm
guessing based on what you said later in your mail that you were thinking of
putting that off to the end.  We only need A* because of the mutual recursion
problem, right?  Assuming so, I agree A* should be put off

> A safe analysis A is minimal if
>  for all safe analyses B, B(f) = Uncalled => A(f) = Uncalled 
>                       and B(f) <> Unknown => A(f) <> Unknown.

Fine.

> An analysis A is cyclic if
>  there exists f0, ..., fn in Fun such that fi_1 = A(fi) and f0 = fn.
> An analysis A is acyclic if A is not cyclic.
> 
> I've also got a proof that any minimal safe analysis is acyclic.  This
> proof motivates the additional minimality requirement on Uncalled
> functions: any cycle in an analysis should really have all those functions
> as Uncalled.

Since you have that proof, perhaps the framework would be simpler to understand
if we just defined cyclic up front and imposed it as a condition of safety.
After all, the whole idea is that safety is supposed to imply the transformation 
can work.
So I'm thinking the presentation of the framework goes like this
	A in Analysis = ...
	A is cyclic if ...
	A is acyclic if A is not cyclic
	A is safe if
		* A is acyclic
		**, ***, **** as before

> - It will probably be good to develop an intuitive explaination for the
> definition of safety.  Obviously this is tied in with the transformation
> and the IL.  What I'm wondering is if allowing mutual recursion among
> local functions makes it more or less clear that we want a set of
> functions to all be contified at the same location.

I don't understand the question.  Allowing mutual recursion among functions in
the target language makes the transformation easy.  Mutual recursion among
functions in the source is irrelevant.

> 3. Transformation
> 
> - What is the best way to present the transformation?  An informal
> description like this
>  If l in Fun, then {f | A(f) = l} are contified as a mutual recursive set
>   as the first declaration in the body of l.
>  If j in Jump, then {f | A(f) = l} are contified as a mutual recursive set
>   as the first declaration after the declaration of j.
> would seem to capture the intended transformation.  Can we be more formal?
> Also, there is the rewriting of the calls.
> 
> - There is still a question of ordering: once I contify a function f in
> another function g, what if I later find a function h that is to be
> contified in f?  The acyclic nature of the analysis will give us the right
> order to perform the contification, although we need to be a little more
> careful about contification in a jump j in order to contify functions in j
> before the j's parent function is contified somewhere else.  But maybe
> these are just implementation details. 

I'd like to see the transformation spelled out in more detail, including the
call rewriting and the ordering.  Since the acyclicity gives us the ordering, I
think we should be able to write a one pass recursive translation in a
mathematical enough style.

> - I think it is worth arguing that the resulting program is still
> lexically scoped.  Is there anything else short of a full proof of
> correctness that would be worth mentioning?

Maybe type correctness of resulting program.

> 4. Motivation & Examples
> 
> - The general motivation for contification is that the intended
> code-generation is for each top-level function will require one
> stack-frame for itself and all of it's local functions.  By contifying
> functions (i.e., by making top-level functions into a local functions), we
> save stack frames and reduce the overhead for calls.  Are there more
> specific motivations?  Steve notes that it was the first optimization
> added to MLton's CPS IL, so are any other optimizations enabled, or
> benefit significantly, by contification?  (Unfortunately, the only direct
> correlation between optimizations that I know of is that additional
> contification inhibits inlining.)

Sure.  Contification lets you recognize loops, because it turns tail recursive
functions into loops.  Once you have loops, you can do all the usual dragon book 
things like loop invariant code removal and induction variable elimination.
MLton currently only does a very simple kind of the former.  Another thing
that putting things in the same function does is expose them to the local shrink 
reductions, e.g. #1 (a, b) --> a.  This can have a big win already when the
contified function communicates return results to it's callee.  One other thing
I can think of is that contification exposes more opportunities for the
raise-to-jump transformation.  

> 5. Related analyses
> 
> - Should we mention the call-based analysis?  It might be a good lead in /
> base point for discussion.  It's very easy to explain and makes a lot of
> intuitive sense.  We can give a short example of where the call-based
> analysis works (e.g., the nested loops), which motivates the
> transformation and gives an intutive analysis.  Then present the even-odd
> example where the call-based analysis fails, but it's fairly obvious that
> the same "trick" could be pulled if only our analysis were better.  The
> call based analysis also is a good sanity check: does our definition of
> safety make sense?

This is a good idea.  It's also historically accurate since that's what was in
MLton for quite a while.

> - Obviously the Local CPS Conversion paper is mentioned in the related
> work.  Does it belong elsewhere as well? 

We probably will need to explain it in the benchmark section, since we are going 
to compare to it.  In fact, it's simple enough to formalize on the CPS IL that
maybe we should just do that.  I think it makes a nice illustration of how easy
it is to define analyses on the IL.

In fact, if we start with the definition of safety and the transformation, then
we can present all our analyses as safe analyses within the framework.  This is
even consistent with the implementation, right?  They all use the same
transformation.  Doing it this way also makes the minimality theorem more
understandable.

> In any event, one comment I think is
> work making is that this analysis is helped by the first-order nature of
> the CPS IL; without escaping functions, we can be much more accurate about
> the call graph and hence get a more accurate analysis.

We should pound on this point several times.  It was one of the motivations for
the IL in the first place -- do control flow analysis (0CFA) once, and then
expose the information (directly in the program, not via annotations) to
subsequent optimizations so that all optimizations can use control-flow
information without having to recompute it and without having to punt on
escaping functions (which is what's done with a lot of analyses out there).

> - There is also the A*/parent analysis.  We could present this, but some
> of the proofs are a little trickier.  It might just make a good closing
> remark: "Suppose we were interested in getting the "best" nesting
> possible.  Here's an example [...] where this analysis says one set of
> contifications, but intutitively, this other set of contifications would
> yield better nesting.  The oroblem is that this other set of
> contifications is not safe with respect to our definition of safety.
> However, it is safe with respect to this new definition of safety [...].
> Likewise, if we change our analysis to the following [...], we get an
> analysis that yields the better set of contifications.  All of the
> previous results, can be proved with these definition of safety and
> analysis.  On the other hand, the original analysis is also minimal with
> respect to this definition of safety."  What we're missing is a definition
> of "best nesting" -- which is going to be program dependent.  Since that
> looks to be like another arduous road (but maybe not), it's probably best
> left as a brief note.

I agree.  Let's wait and see where we are space wise before saying too much
about A*/parent.

> 6. Results and Reality

Maybe the right thing to compare is the five analyses

	none	call	cont	call+cont	dom

I'm wavering on including call+cont.

> - The runtimes aren't as impressive:
>                                                 dom/    dom/    dom/
>                 none    call    both    dom     none    call    both
> barnes-hut      9.97    7.46    7.43    8.52    0.85    1.14    1.15
> checksum        15.53   7.81    7.81    7.81    0.50    1.00    1.00
> count-graphs    14.63   11.02   10.69   10.70   0.73    0.97    1.00
> fft             32.12   31.07   31.08   29.60   0.92    0.95    0.95
> fib             6.52    6.52    6.52    6.52    1.00    1.00    1.00
> knuth-bendix    11.39   12.19   12.31   12.29   1.08    1.01    1.00
> lexgen          27.60   21.50   21.37   21.63   0.78    1.01    1.01
> life            44.78   34.77   34.77   38.42   0.86    1.10    1.11
> logic           35.95   35.57   35.59   35.56   0.99    1.00    1.00
> mandelbrot      56.24   13.02   13.02   13.01   0.23    1.00    1.00
> matrix-multiply 14.14   6.81    6.81    6.74    0.48    0.99    0.99
> (had to stop it here, I'll rerun these overnight)

Is this for the dom based on A*/parent, right?  Should we instead report the
dom based on ancestor, with a note that it's actually weaker due to the mutual
recursion problem?

> Relative to no contification, pretty good.  I think I can explain
> barnes-hut, but life (and I think tensor shows the same effect) I don't
> know.  If dom always did better than call, I'd say that's another reason
> to introduce the call-based analysis as a stepping ground.

It's weird that dom isn't consistently call.  We need to talk somewhere about
the three rounds of contification and the fact that although dom will contify
more than call on any given input, that after three rounds it doesn't matter.

> - Should we mention the fact that the real CPS IL doesn't support mutual
> recursion among local functions?  The only advantage would be to point out
> that it's not harmful in practice.

We should mention this somewhere.