[MLton] ref flattening and backpatching

Matthew Fluet fluet@cs.cornell.edu
Fri, 17 Sep 2004 15:46:00 -0400 (EDT)


One not so uncommon use of ref's in SML is to backpatch circular
data-structures.  My understanding of the ref flattening analysis is that
it will not capture this case, because the constructed ref is both put
into a tuple and assigned to directly.  Consider the following program:

structure CList =
   struct
      datatype 'a clist' = Cons of 'a * 'a clist ref
      withtype 'a clist = 'a clist' option

      fun cnil () = NONE
      fun ccons (h, t) = SOME (Cons (h, ref t))

      fun match cl nilCase consCase =
	 case cl of
	    NONE => nilCase ()
	  | SOME (Cons (h, t)) => consCase (h, !t)

      fun fromList l =
	 case l of
	    [] => cnil ()
	  | h::t => ccons (h, fromList t)

      fun repeat x =
	 let
	    val r = ref NONE
	    val cl = SOME (Cons (x, r))
	    val () = r := cl
	 in
	    cl
	 end

      local
	 val max = 1000
	 fun length' (cl, n) =
	    if n >= max
	       then NONE
	       else match cl
		          (fn () => SOME n)
			  (fn (_,t) => length' (t, n + 1))
      in
	 fun length cl = length' (cl, 0)
      end
   end

val cl = CList.repeat #"x"
val n = CList.length cl
val () =
   case n of
      NONE => print "NONE\n"
    | SOME n => print (concat ["SOME ", Int.toString n, "\n"])

val cl = CList.fromList [1,2,3,4,5,6,7,8,9]
val n = CList.length cl
val () =
   case n of
      NONE => print "NONE\n"
    | SOME n => print (concat ["SOME ", Int.toString n, "\n"])



Looking at refFlatten.pre and refFlatten.post, I see the following:

refFlatten.pre:
option_0 = NONE_2 of (unit) | SOME_0 of ((option_0 ref))
option_1 = NONE_0 of (unit) | SOME_1 of ((option_1 ref))

refFlatten.post:
option_0 = NONE_2 of (unit) | SOME_0 of (option_0 ref)
option_1 = NONE_0 of (unit) | SOME_1 of ((option_1 ref))

This makes sense -- monomorphisation duplicated the code for the
char CList.clist and the int CList.clist, but since the list elements are
unused, they are both simplified.  (Interestingly, while one might
consider it advantageous to unify isomorphic types, by keeping them
separate, the refFlatten analysis that works per datatype is more
accurate.)  The int CList.clist doesn't do anything interesting with the
ref component, so no surprise that it is flattened.

It should also be the case that the int CList.clist never mutates the ref
component.  I suspect it's fairly trivial to extend the refFlatten
analysis to also determine when the ref can be flattened as an immutable
field (and, less interesting, the converse, when the ref is only written
and never read, in which case the field is essentially useless).
Although, I'm not sure whether this buys much -- as far as I know, the
only additional cost we pay for a flattened ref is the card mark when it
is mutated, which would be zero in the case I'm describing.

Unfortunately, the char CList.clist is not flattened.  As I understand it,
this arises because of the  repeat  function, which creates the ref, puts
it in the datatype, and also directly updates the ref.  The ssa2 for this
sequence looks like:

    r_0: (option_1 ref) = (global_24)
    x_671: SOME_1 = SOME_1 (r_0)
    cl_0: option_1 = x_671: option_1
    r_0 := cl_0
    length'_1 (global_4, cl_0) NonTail {cont = L_317, handler = Dead}

Note that cl_0 (hence, x_671) is live after the update, so we'd be well
within space safety to update the (flattened) ref component of x_671.

I suspect that it would also be possible to accomodate this in the
refFlatten analysis.  A simple option would be to be a little more liberal
with a ref in the block in which it is created -- i.e., as in the case
above, if the ref were flattened, we could still "get at it" in the
data-structure.