Scope rules for CPS and other questions...

Stephen Weeks sweeks@wasabi.epr.com
Wed, 1 Dec 1999 18:10:42 -0800 (PST)


> In particular CPS seems to have
> nested functions, are these lexically scoped i.e. can a nested function
> contain refrences to variables in the enclosing scope? 

Yes.  Furthermore the scope rules ensure that local functions (Jump)
can only be referenced within the scope of their definition.  All
toplevel functions (Func) are mutually recursive.  The precise scope
rules are implemented in about 100 lines of code in the "checkScopes"
function in cps/type-check.fun.

> Is there a phase that lifts everything out?

No.  backend/backend.fun translates directly to the language in
backend/machine.sig.

> Another
> question I have is that I'll need to really turn CPS into  honest to
> goodness CPS (i.e. not the current ANF version).

Why?

> I'll be doing
> this externally and feeding the results back to the MLton to use MLton
> as a backend.

Notice that Cps has *no* unknown procedures.  That is all Call and
Jump Transfer's explicitly specify their destination.

> Is this going to produce decent C code? i.e. do you guys
> really handle tail calls right? 

Yes.  A tail call is represented as a Call Transfer (see cps-tree.sig)
where the cont = NONE.  In backend.fun, this is translated into a jump
(after setting up the arguments), and will not push a stack frame.

> > The paper simplifies things a little bit from the current implementation,
> > but I don't think there's any significant deviation.  CPS local functions
> > have limited scope (namely the enclosing top-level function), but because
> > they are continuations they don't escape fro the function in which they
> > are defined.  There's no explicit lambda lifting phase.  MLton handles
> > tail-calls properly.

I agree with all this.  I think maybe that the confusion arose partly
because in cps-tree.sig local functions are declared by "Fun" and in
the paper they are declared by "Jump" (IIRC).

> e.g.
> 
>  fun f (x) =
>   let
>     fun g(y)  = (x,y)
>   in g(1) 
>   end	
> 
> should I interpert the above equivalent to
> 
>  fun f (x) = g(x,1)
>  and g (x,y)  = (x,y)
>  
> or the *bogus* declaration
> 
>  fun f(x) = g(x)
>  and g(y) = (x,y) 

I don't think either of these is the right interpretation.  The right
interpretation is to think of Cps as a handy notation for SSA, where
local functions just define labels on basic blocks that one can jump
to (from within the current stack frame) and pass args.  Hence local
functions can refer to any register or stack slot for the currently
executing function.  It is fairly straightforward to turn a toplevel
Cps function into a control flow graph, where there is one basic block
per local Fun dec.  Then, the scoping rules correspond exactly to the
constraint that the definition of a variable must dominate all of its 
uses.

> Also how are the jumps implemented in the C code produced by gcc?

Both Jump and Call transfers are implemented by C goto's, possibly
with some munging of the MLton stack (for nontail calls) and possibly
with an intervening trampoline (if the destination is in a different C 
procedure).

> There seems to be some magic involved otherwise you'd end up with one very
> big C procedure that held the entire program, which would make gcc run dog
> slow. (Not that I care about gcc's compile time or anything..)

The magic is in backend/chunkify.fun, which takes the Cps program and
partitions all of the Funcs and Jumps into disjoint sets, each of
which corresponds to a C procedure.  There is an internal control (not 
exposed as a command line switch) called Control.chunk which specifies 
how aggressive the chukifier should be.

> I just want to
> know if I'll be able to compile big programs if I CPS convert everything in
> a resonable time.

I don't understand what it means to CPS convert the Cps language.  Can 
you tell me how the result wouldn't be isomorphic to Cps?