flattening

Wed, 12 May 1999 16:08:02 -0700 (PDT)

> We don't want to do that, so we need some  notion  of  why.   One  notion  is
> sharing.  This coming from, for instance, the classic
> 
>         let val x0 = 3
>             val x1 = (x0, x0)
>             val x2 = (x1, x1)
>             ...
...
> So, does any of this revive any thoughts?  I remember us talking about  these
> things, but not any final conclusion.

I remember the idea (of the 1998-8-26 version, src/fol/flatten.fun)
being something like the following.

Label every tuple type (including ones appearing nested within
other tuple types) with a flag, Flat or Tuple.  Write a labeled tuple 
type as t1 *F t2 or t1 *T t2.  For simplicity, I will only deal with
twoples :-).  Define a (reflexive, transitive) "subtype" relation on
labeled types as follows.

t1 <= t1'  t2 <= t2'  X in {T,F}
--------------------------------
    t1 *X t2 <= t1' *F t2'

This relation reflects the idea that it is cheap to coerce from flat
or tupled to flat, but not vice versa.  Also, note that there is no
rule that allows subtyping underneath Tuple'd tuples like the
following.

t1 <= t1'  t2 <= t2'  X in {T,F}
--------------------------------
    t1 *T t2 <= t1' *T t2'

The reason for the absence of such a rule is that to implementing would
imply selecting the components of a tuple, coercing them, and then
*reallocating* the tuple.  This is bad.  The whole idea of the system
is to cut down on allocation.

Now, set up a system of constraints on the whole program, where each
variable x is assigned a labeled type t_x (corresponding to its
actual type) and there are constraints wherever there is data flow in
the program.  For example, if the program contains
  fun f(x, y) = ...
  ... 
  f(a, b) 
  ... 
then there are constraints a <= x and b <= y.  Which constraints to
add is pretty obvious -- see the calls to Rep.coerce: there are
constraints for applications, cases and handles (to join the
branches).  There also needs to be a labeled type for each constructor
so that you can set up equations for constructor application and
selection.  The only other constraints are for types that must be
tuplized.  For example, Fol did not allow multiple value return, so
the type of any variable appearing in a return position had to be a *T
at the outermost level.  The places for these constraints are also
pretty obvious -- see the calls to Rep.tuplize.

Now, solve the constraints for a minimal solution, where *F is
smaller than *T.  There is obviously a solution, where every label is
*T.  I'm not sure if there is a minimum solution.

Now, define a translation || from labeled types to lists of unlabeled
types (see Rep.toTypes).

  |t1 *F t2| = |t1| @ |t2|		(append)
  |t1 *T t2| = [|t1| * |t2|]		(singleton)

Use the constraint solution and || to walk over the program and
convert every type to the translated || version of the solution type.
In order for the translated program to be well typed, you will also
need to insert coercions along data flow paths wherever the constraint 
solution uses <= and not =.  The coercions are fairly
straightforward.

The code pretty much follows this line.  The first pass traverses the
program and sets up the constraints.  The constraints are actually
solved on the fly during this traversal, but underneath the hood,
unbeknownst to the traverser.  The second pass then walks the program
and translates the types, expressions, and introduces coercions.

Much of the complexity comes from nested tuples, some components of
which may be flattened, and keeping all of the indexes straight so
that selects and tuple expressions can be translated.  The complexity
is also due to the fact that the translator does not just insert the
simplest possible coercions and then rely on a shrinker to sort things
out.  An earlier version did that, but encountered a 5x blowup in IL
size, which was death for self compiles.  Nevertheless, if I were to
write a new flattener, I would go back to the relying on the shrinker.
This should be feasible in the Cps world, because the shrinker
operates on a per expression, rather than per program basis.  Thus,
the translator can be arranged to translate and shrink a function at a 
time.  This should prevent undo IL bloat.

At some point during the development of the Fol flattener, I noticed
that if one went to a Cps IL (or allowed multiple value returns and a
few other things) then the need for all of the "tuplize" constraints
went away, and there was always a solution where every tuple was *F.
This was surprising.  After a bit of discussion, we discovered the
case you mentioned above.  So, I added a hack to solve the problem.  I
put in a counter in every labeled tuple that was bumped whenever the
tuple appeared as an argument in a Tuple, Select, Known, Con, or Prim
application expression (see the call to Rep.incCount).  This counter
took on the values Zero, One, Many.  If the count ever reached Many,
then the label was forced to be *T (possibly cascading to force a
bunch of other labels to be *T in order to meet the constraints).