[MLton] leaf inlining

Mon Jun 25 21:08:51 PDT 2007

On Mon, 2007-06-25 at 14:37 -0500, Matthew Fluet wrote:

> In the SSA IL, the program is represented by a (mutually-recursive) 
> collection of first-order (and second-class) functions.  So, no function 
> has child functions.

Ok .. so the display is passed as explicit arguments, each
a closure of the parent function? 

> A leaf function is simply a function with no calls in its body.

Ok.

> > During inlining, many other rewrite rules are applied,
> > for example self-tail-call-to-loop optimisation *must* be done
> > during inlining. See the example above.
> 
> MLton takes the opposite approach.  Each optimization pass does one 
> specific optimization, without trying to integrate multiple 
> optimizations at once.  The inlining optimization just inlines; latter 
> passes take care of contracting the resulting code.
> 
> Yes, combining optimizations in one pass can sometimes result in more 
> optimization opportunities than applying the optimizations in sequence, but
> 1) combining optimizations can often be difficult to do correctly
> 2) it often isn't clear that the combined optimizations will be 
> significantly more effective.

On (b):

It's quite clear to me that this is very slow and potentially
very ineffective. If the number of passes is fixed, then 
some optimisations will be missed for sure, 
depending on depth. If passes are applied until a fixpoint
is reached, you can be analysing the whole program just
to finish optimising one fragment.

For example if you have a call chain depth 8, two functional 
passes will really make a mess inlining and still won't 
complete the job. If the passes are mutable and ordered
on functions in reverse of the call order, all 8 steps
can be done in one pass.

However, as in my example, this will make it impossible
to properly handle the tail-rec-to-loop optimisation.

In Felix this is absolutely mandatory, because for
'almost' any functional code, there cannot be any self-tail
calls. The reason is a non-infinite recursion requires a
conditional .. and conditionals are represented as matches
embedded in a nested child function. If I didn't apply the
tail optimisation algorithm 'in the middle' of inlining
a recursive child it could never be done at all.

Of course Felix uses a different IL representation, and
this might not impact Mlton as badly.

Most optimisations CAN be done in phases .. it's just
extremely inefficient.

However point (a) is correct: it took me a year to
get the Felix inliner to work and it still isn't clear
that it does. The latest bug forced me to abandon
some crucial rescanning and I haven't got around to
replacing that with recursion (although it is definitely
possible).

My study reveals that some optimisations work well
in separate passes functional passes over the set
of functions, some work with separate passes but
must both be ordered over the functions AND modify
them (imperative), and some must be applied recursively
and repeatedly at the fragment level during inlining.

This suggests Mlton might change its optimisation strategy,
by creating a new category of fragment rewriting rules.

You then construct a pass as a higher order function,
passing a fragment rewrite rule function as an argument.
This fragment rewrite rule is a convolution of several
pre-defined rules, created by defining a function which
combines those rules.

This retains the existing structure of multiple passes,
but provides a new way to construct the semantics of
a pass from a combination of lower level primitives.

Here is one combinator: given a list of rewrite rules

	pattern -> replacement

for each term and subterm of an expression, going down
AND up the tree once, apply every rewrite rule in the
list in order. After this is done on expression e
producing expression e', compare e and e'. If they're
equal return e, otherwise repeat using e'.

This combinator has the invariant "no pattern of
any rewrite rule matches any subterm of the result".
That's a very important invariant.

This combinator IS USED in Felix .. for user defined
term rewriting rules. It is of course extremely expensive,
but it, or some variation of it, is the ONLY way to combine 
arbitrary rewrite rules.

User term rewrite rules are applied in a single pass,
and the pass is called twice, once before inlining,
and once afterwards. It isn't done inside the inliner
because it could cause radical changes to the call graph,
which the inliner depends on.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net