Your ICFP 2001 submission #48 (fwd)

Matthew Fluet fluet@CS.Cornell.EDU
Fri, 18 May 2001 08:58:19 -0400 (EDT)


Here are the reviews of the contification paper.  Major points seem to be
say a little more about how Moby really does its transformation, be
careful about using "CPS", and discuss some other related work.  Reviewer
#3 found some typos in the definition of the contification transformation
(which I also came across a few weeks ago when I was rereading the paper).

Here are the scores in a nutshell:

                  #1   #2   #3   #4  tot.  avg.
appropriateness:   9    9    9    9    36  9.00
originality:       6    6    7    7    26  6.50
tech_strength:     8    5    9    6    28  7.00
presentation:      7    6    9    8    30  7.50
overall:           7    5    9    5    26  6.50
confidence:        1  0.9  0.8  0.7   3.4  0.85

BTW, I tried the "tighter alternate style" suggested and it does give us a
little more room (modulo all the spacing hacks we made), so we shouldn't
have too much problem incorporating a little more text to handle some of
the criticisms.

---------- Forwarded message ----------
Date: Fri, 18 May 2001 13:40:03 +0200 (MET DST)
From: Xavier Leroy <Xavier.Leroy@inria.fr>
To: fluet@CS.Cornell.EDU
Subject: Your ICFP 2001 submission #48

Dear Matthew Fluet:

You will find below the detailed comments by the reviewers of your paper:

     Contification using Dominators

Please follow closely the suggestions of the reviewers in revising
your paper.

The final, camera-ready version of your paper is due on June 29th, 2001.
It must be prepared using one of the ACM SIG proceedings templates
available at

     http://www.acm.org/sigs/pubs/proceed/template.html

We recommend the use of LaTeX2e with the tighter alternate style,
since this is the one that looks best.

The final paper is limited to 12 (twelve) pages.  We regret that we
cannot offer any page extension.

You will also need to fill out, sign, and send the ACM copyright form,
available at

        http://www.acm.org/pubs/copyright_form.html

Instructions on where and how to send the final manuscript and the 
copyright form will be e-mailed to you no later than June 18th.
(We are waiting for instructions from ACM concerning this.)

If you have any additional questions, please feel free to get in
touch.

Thanks again for your contribution to ICFP 2001, and I'm looking
forward to seeing you in Florence.

- Xavier Leroy
  ICFP 2001 program chairman

============================================================================ 
ICFP 2001 Reviews for Paper #48
============================================================================ 

Title: Contification using Dominators

Authors: Matthew Fluet
Stephen Weeks

============================================================================ 

REVIEWER #1
============================================================================ 

Numerical Scores:

     appropriateness: 9
     originality: 6
     tech_strength: 8
     presentation: 7
     overall: 7
     confidence: 1


Comments:

In Section 1: "the the Moby" ==> "the Moby".

Is it the case that the sets Cont and Var are disjoint?  I think that it
would clarify the presentation to explain that when you say "continuation"
you mean "label."  Your IR seems related in this aspect to the RML compiler's
IR (Oliva & Tolmach; JFP).

In Section 3.1: I do not understand why jumps do not create a new environment.
The only explanation that I can think of is to support local imperative variables.
If that is the case, you should say something about it in Section 3, since it
goes against expectations.

In Section 4.2, the *1 property seems backwards.  Later we learn that this is
for important technical reasons (Theorem 2), but a word of explanation would
help.

Strictly speaking, the analysis of 5.2 is only half the picture.  The approach
of [20] also involved a subsequent analysis of the transformed program
(after LCPS and closure conversion) to group functions into larger clusters.
This latter phase subsumes the of the A_{Call} analysis.  I would
prefer to use the term "contification" to refer to the idea of converting
tail-calls into gotos (as in [13]) and "LCPS" to refer to the introduction
of explicit return continuations, which are then candidates for contification.

In the actual Moby implementation, the analysis that determines when LCPS
can be applied (i.e., A(f) in Cont) and the analysis that determines when
a function f can be merged into the cluster of a function g (i.e., A(f) = g)
are both performed before ny transformation.  The actual implementation in
the Moby compiler is not well described in [20], but is described in a more
recent version of the paper.

I suspect that the analyses being used by the Moby compiler, which work on the
call graph, are fairly similar to the A_Dom analysis of this paper.  The main
differences are that MLton is a whole program compiler, it works on a first-order
representation w/o nested function definitions, and MLton uses more sophisticated
CFA to construct the call graph.

The discussion of related work is pretty thin.

============================================================================
REVIEWER #2
============================================================================ 

Numerical Scores:

     appropriateness: 9
     originality: 6
     tech_strength: 5
     presentation: 6
     overall: 5
     confidence: 0.9


Comments:

This paper describes an optimization dubbed "contification" used
within the MLTON ML compiler.  This compiler's intermediate language
has a construct for locally-scoped continuations, which can be
compiled into a local jump that preserves the environment, rather than
into a full procedure call.  Contification converts top-level
functions meeting certain criteria into such local continuations,
which can improve the runtime of programs by avoiding procedure call
overhead and enabling further intraprocedural optimizations.  The
paper defines three different analyses, of varying power, for
identifying "contifiable" functions.  One of these analyses, based on
computing dominators, is proven to find as many contifiable functions
as any safe analysis possibly could.  An experimental comparison of
the three analyses in the context of the MLTON compiler is presented,
which shows that contification speeds up programs, but that a lot of
contification is not always as good as a little.

The paper strikes me as a very nice, clean piece of work -- *but* on a
very minor topic, which may be scarcely relevant outside of MLTON.
The problem is that contification is essentially a "clean-up"
optimization; it is needed to undo the bad effects of early
lambda-lifting by the compiler -- typically of functions representing
inner loops, since it is exactly these functions that are likely to
have sufficiently restricted sets of calling contexts to make
contification legal. (Of course, users *might* write source programs
to which contification applies directly, but probably not very often.)
So contification is essentially a solution to a problem that the
compiler writers have brought on themselves.  This is nicely
illustrated by the first three versions of <sum> on p. 3: the
contified version has the same structure as the original
(pre-lambda-lifted) version and could have been obtained from it
*directly* by observing that <loop> is tail-recursive and
non-escaping.

Now it may well be that lambda-lifting first and contifying afterwards
makes for a cleaner and more powerful optimizer structure -- but the
authors don't really attempt to make this point.  On its own,
contification just doesn't seem that interesting.  Moreover, the
recent publication by Reppy (cited and discussed by these authors) has
already exposed the basic idea in a similar context -- though by no
means so elaborately.

It can also be quite misleading to discuss individual small
optimizations like this outside of the context of the optimizer they
live in.  This point is implicit in the experimental numbers in the
paper.  The authors' new dominator-based algorithm does indeed contify
more functions than previous algorithms, but the resulting code often
run *slower*, probably -- the authors hypothesize -- because the
inlining framework is not properly tuned for the larger functions that
result from contification.  So again, this work is really only
meaningful if explained in the full context of the optimizer. I would
encourage the authors to make an attempt to tell a larger story about
their (very impressive) compiler, preferably in the form of a journal
paper or monograph.

Smaller points for the authors:

p.1 Introduction.  You should give the source code for the running
example involving f and g, presumably something like:

fun g y = y - 1
fun f b = (if b then g 13 else g 15) - 1

p. 2, col 1., line -5: "each monotype at which it is used [5]".  Do you
really use the algorithm from [5]? 

p. 2, col. 2, para. 3: "MLton's structure differs from..."  Yes, but
it is very similar to that of RML (Tolmach & Oliva, "From ML to Ada",
J. Func. Prog.  8(4),367-412, July 1998) which you might appropriately
cite here.  That paper also describes a direct transformation from
local functions to "jump points" (essentially the same as your
continuations) that would do the Right Thing on your page 3 example.

Section 3: Doubtless some people, including Kelsey, would agree that
what you've defined here should be called "CPS", but it makes no sense
to me.  It's bad enough that you corrupt the term "continuation" by
using it for a watered-down "locally scoped" or "static" continuation.
(Hence, the term "contification" grates too -- something like
"localisation" would be better.)  But in any case, these continuations
aren't *passed* anywhere! So you confuse the reader with "traditional
CPS" vs. your "CPS," etc., etc. Enough said.

p. 5. It would help the reader if you explained how A(f) is used to
guide transformation (the first para. of 4.2) *before* giving the detailed
definition of safety in 4.1.  

Section 4: One thing you don't prove formally is that the result of the
transformation obeys scoping rules -- or, more generally, is well typed.
This is not all that obvious --- it depends critically on safety property 
*_4.  In particular, one cannot just take a safe analysis and arbitrarily
change some values to Unknown before doing the transformation -- a point
you might make explcit.

p. 8. Section 5.3. The description of the dominator analysis would benefit
greatly from an example of a G and associated D.

p. 10. Which backend (C or X86 assembler) did you use for these numbers?

p. 10. It would be interesting to know the measured cost of function call
overhead.  


Trivia:

p. 1 Abstract and Intro.  It would be simpler, if less precise, to talk of
turning individual functions, rather than "collections" or "sets" of
functions, into continuations.

p. 1 col. 2, line -12: "Since g always returns to the continuation k'..."
This is pretty confusing; syntactically g *calls* k, after all.

p. 2, para.2, line -2: "to into" --> "into"

p. 5, column 2, program fragment: Here and at similar spots later, the
reader needs reminding that "g ()" is a *non-tail* call, because the
syntax doesn't really make this clear (there might be a surrounding
k()).

p. 6, column 1, para. 2: "*_4" -> "*_3".

p. 10, column 1, para. 1, line -2: "Taran" -> "Tarjan".

p. 11, column 1, para 2, line -1: "functions of an uncontified
program".  Reiterate that you mean "in these examples" not "for all
programs."

============================================================================
REVIEWER #3
============================================================================ 

Numerical Scores:

     appropriateness: 9
     originality: 7
     tech_strength: 9
     presentation: 9
     overall: 9
     confidence: 0.8


Comments:

The paper presents a framework for `contification', splitting the process
into analysis and transformation phases.  Notions of safe and of maximal
contification analysis are defined in terms of an abstract call-graph
representation of a program where an analysis maps functions to abstract
return points (a function, a continuation, or Unknown).

A contification transformation parameterized by an analysis is defined 
and shown to be well-defined for safe analyses by showing that 
safe analyses are `acyclic'.  

Three analyses (Call, Cont, and Dom) are then defined and
shown to be safe.  Call and Cont are incomparable and correspond to
analyses used in existing ML compilers (MLton and Moby).  Dom is maximal
and new.  

Experiments comparing the three analyses on a number of ML bench marks
are summarized.  

The paper begins with a brief summary of the MLton compiler and its IL.  The
concepts and issues are well illustrated by examples.  The definitions and
results are clearly stated and the proofs are easy to follow.

In short an excellent paper.  Very well written and (to a non-complier expert)
a very nice piece of work.  

ps some minor typos

p2 optimzations
S4 definitions of Tail and NonTail are interchanged 
Figure 3 defn calF contains calF(f_i,e_i)underbar{k} which is not
well formed.  I'm guessing it should be calF(f_i,underbar{k})
in the let clause of calE I think there is a missing underbar on k 
in a number of places the font for the A  (analysis) is roman not cal

============================================================================
REVIEWER #4
============================================================================ 

Numerical Scores:

     appropriateness: 9
     originality: 7
     tech_strength: 6
     presentation: 8
     overall: 5
     confidence: 0.7


Comments:

The paper presents a code transformation called "contification" which
looks for functions whose continuation is "known" and rewites the
code to make this knowledge syntactically obvious.

First, let me say that I find the presentation mostly good, except
for the fact that I just can't stand the pseudo-CPS form used here.
This variant of CPS makes several continuations completely implicit
making the presentation difficult to follow.

An example of the lack of clarity that derives from this is in
section 4.1 where the authors talk about things like "function f returns
to function g".  Such statements don't make much sense in a CPS setting
and as far as I can tell what is really meant is that f returns to
the continuation of g (i.e. f was tail-called from g).

I understand that such a form of CPS might be convenient in some cases,
but when we're talking about analysing code to detect that some
continuation is constant, I think it makes a lot more sense to make
all the continuation arguments completely explicit.

I believe that making all the continuation arguments explicit
will show that the optimization can be generalized to eliminating
constant arguments, whether continuations or not.

I found it very odd that the paper does not mention anything about
lambda-dropping, although the two transformations share a lot of
characteristics.  At least, it seems that lambda-dropping should
deal with both exmaples and would even eliminate the v' argument
to `loop' in the second example (after all, this v' is passed around
in exactly the same way as the continuation).
I feel like lambda-dropping might even subsume contification.

As for this `loop' example, the paper seems to claim that contification
ends up enabling the loop-invariant code motion of `length', whereas that is
actually enabled by the elimination of v' which is not done by contification
and is already made possible by introducing loop preheaders (I assume
here that loop preheaders are introduced even if you use contification.
The reason for it being that loop preheaders will allow code motion of
`length' even if `loop' is called from more than one place).

The paper could also mention Appel's loop headers (which are really loop
preheaders) whose purpose was mostly to do the kind of optimization
shown in the second example (i.e. elimination of the continuation argument
in tail-recursive loops).

============================================================================