[MLton-devel] Minamide's POPL '98 paper

Thu, 17 Apr 2003 18:37:26 -0700

I was having a discussion with Brian Rogoff about implementation
strategies for List.@ in ML.  There is the usual nontail recursive
implementation, the double-reversing tail recursive implementation,
and the C-style imperative version (also known as destination-passing
style or tail recursion modulo cons) that pre-allocates the cons cell
before the recursive call and mutates the cdr later.  Anyways, he
pointed me to Minamide's paper "A Functional Representation of Data
Structures with a Hole"
	http://www.score.is.tsukuba.ac.jp/~minamide/hole.popl98.ps
which shows how to achieve the effect of the imperative version in a
functional language while hiding the mutation.  I am thinking about
what kind of support we might add for this style of programming in
MLton -- should we do an optimization or add a user-level library?

This email describes my attempts to work through Minamide's paper and
to begin to see if there is a way to do a user-lever library.  It also
raises a question about defunctionalization.

Minamide's paper introduces a new type that represents a data
structure with a hole.  Here's a signature capturing the idea.

------------------------------------------------------------
signature HOLE =
   sig
      (* 'a is the type of the hole.
       * 'b is the type of the resulting data structure.
       *)
      type ('a, 'b) t

      val compose: ('a, 'b) t * ('c, 'a) t -> ('c, 'b) t
      val fill: ('a, 'b) t * 'a -> 'b
      val new: ('a -> 'b) -> ('a, 'b) t
   end
------------------------------------------------------------

The idea is that one can fill in the hole in a data structure with a
value to get a complete value, or that one can fill in the whole (via
compose) with another data structure with a hole to get yet a third
data structure with a whole.

The semantics is simple.

------------------------------------------------------------
functor Hole (): HOLE =
   struct
      type ('a, 'b) t = 'a -> 'b

      fun compose (h, h') = h o h'
      fun fill (h, a) = h a
      fun new f = f
   end
------------------------------------------------------------

The goal is to get a much more efficient implementation, so he
represents a hole as a pair of a pointer to the data structure and a
pointer to the hole.  Then "fill" or "compose" is just a couple of
pointer assignments.  He places restrictions on the function passed to
new and uses a linear type system to ensure that holes aren't filled
more than once.

He also presents a compilation strategy from a high-level language
with cons cells and holes to a low level language with only a few
primitives for constructing holes as well as compose.  Here's a
signature for the low level IL.

------------------------------------------------------------
signature HOLE_PRIM =
   sig
      (* 'a is the type of the hole.
       * 'b is the type of the resulting data structure.
       *)
      type ('a, 'b) t

      val compose: ('a, 'b) t * ('c, 'a) t -> ('c, 'b) t
      val fill: ('a, 'b) t * 'a -> 'b
      val head: 'a list -> ('a, 'a list) t
      val id: unit -> ('a, 'a) t
      val tail: 'a -> ('a list, 'a list) t
   end
------------------------------------------------------------

Again, the semantics is simple.

------------------------------------------------------------
functor HolePrim1 (): HOLE_PRIM =
   struct
      datatype ('a, 'b) t = T of 'a -> 'b

      val fill: ('a, 'b) t * 'a -> 'b =
	 fn (T h, a) => h a

      val compose: ('a, 'b) t * ('c, 'a) t -> ('c, 'b) t =
	 fn (h, h') =>
	 T (fn x => fill (h, fill (h', x)))

      fun head l = T (fn x => x :: l)
      fun id () = T (fn x => x)
      fun tail x = T (fn l => x :: l)
   end
------------------------------------------------------------

To be closer to his two-pointer implementation, here is a more
reasonable implementation of HOLE_PRIM.

------------------------------------------------------------
functor HolePrim2 (): HOLE_PRIM =
   struct
      datatype ('a, 'b) t = T of {hole: (unit -> 'a) option ref,
				  res: unit -> 'b}

      fun fill (T {hole, res}, a) =
	 (hole := SOME (fn () => a)
	  ; res ())

      fun compose (T {hole = h, res = r}, T {hole = h', res = r'}) =
	 (h := SOME r'
	  ; T {hole = h', res = r})

      fun get hole = valOf (!hole) ()

      fun head l =
	 let
	    val hole = ref NONE
	    fun res () = get hole :: l
	 in
	    T {hole = hole,
	       res = res}
	 end

      fun id () =
	 let
	    val hole = ref NONE
	    fun res () = get hole
	 in
	    T {hole = hole,
	       res = res}
	 end

      fun tail x =
	 let
	    val hole = ref NONE
	    fun res () = x :: get hole
	 in
	    T {hole = hole,
	       res = res}
	 end
   end
------------------------------------------------------------

At this point, I decided that I would gain further understanding by
defunctionalizing HolePrim2.  After many failed attempts (try it
yourself if you want), I finally made some progress.  I was a bit
confused why I had so much trouble defunctionalizing, since after all
that's how MLton compiles, so it must be possible.  The problem came
from trying to defunctionalize a polymorphic program, while MLton only
defunctionalizes monomorphic ones.  I had difficulty with type
variables that had been hidden in closure existential types being
exposed.  I also had trouble with the (manual) flow analysis confusing
types.  In the end, the solution that I came up with is this.  I
analyzed HOLE_PRIM and found that there were only two possible kinds
of type ('a, 'b) t.  I split those into two types in the following
modification of HOLE_PRIM.

------------------------------------------------------------
signature HOLE_PRIM2 =
   sig
      (* 'a is the type of the hole.
       * 'b is the type of the resulting data structure.
       *)
      type 'a u (* = ('a, 'a list) t *)
      type 'a v (* = ('a list, 'a list) t *)

      val compose1: 'a v * 'a u -> 'a u
      val compose2: 'a v * 'a v -> 'a v
      val fill1: 'a u * 'a -> 'a list
      val fill2: 'a v * 'a list -> 'a list
      val head: 'a list -> 'a u
      val tail: 'a -> 'a v
   end
------------------------------------------------------------

Here is the modification of HolePrim2 to meet this signature.

------------------------------------------------------------
functor HolePrim3 (): HOLE_PRIM2 =
   struct
      datatype 'a u = U of {hole: (unit -> 'a) option ref,
			    res: unit -> 'a list}
      datatype 'a v = V of {hole: (unit -> 'a list) option ref,
			    res: unit -> 'a list}

      fun fill1 (U {hole, res}, x) =
	 (hole := SOME (fn () => x)
	  ; res ())

      fun fill2 (V {hole, res}, l) =
	 (hole := SOME (fn () => l)
	  ; res ())

      fun compose1 (V {hole = h, res = r}, U {hole = h', res = r'}) =
	 (h := SOME r'
	  ; U {hole = h',
	       res = r})

      fun compose2 (V {hole = h, res = r}, V {hole = h', res = r'}) =
	 (h := SOME r'
	  ; V {hole = h',
	       res = r})

      fun get r = valOf (!r) ()

      fun head l =
	 let
	    val hole = ref NONE
	    fun res () = get hole :: l
	 in
	    U {hole = hole,
	       res = res}
	 end

      fun tail x =
	 let
	    val hole = ref NONE
	    fun res () = x :: get hole
	 in
	    V {hole = hole,
	       res = res}
	 end
   end
------------------------------------------------------------

Now, a flow analysis of HolePrim3 doesn't confuse any types.  Here is
the defunctionalized version after a little cleanup.

------------------------------------------------------------
functor HolePrim4 (): HOLE_PRIM2 =
   struct
      structure Thunk =
	 struct
	    datatype 'a t = (* unit -> 'a list *)
	       Head of 'a option ref * 'a list
	     | Tail of 'a * 'a t option ref
	     | Value of 'a list

	    fun thaw (th: 'a t) =
	       case th of
		  Head (r, l) => valOf (!r) :: l
		| Tail (x, r) => x :: thaw (valOf (!r))
		| Value l => l
	 end

      datatype 'a u = U of {hole: 'a option ref,
			    res: 'a Thunk.t}
      datatype 'a v = V of {hole: 'a Thunk.t option ref,
			    res: 'a Thunk.t}

      fun fill1 (U {hole, res}, x) =
	 (hole := SOME x
	  ; Thunk.thaw res)

      fun fill2 (V {hole, res}, l) =
	 (hole := SOME (Thunk.Value l)
	  ; Thunk.thaw res)

      fun compose1 (V {hole = h, res = r}, U {hole = h', res = r'}) =
	 (h := SOME r'
	  ; U {hole = h',
	       res = r})

      fun compose2 (V {hole = h, res = r}, V {hole = h', res = r'}) =
	 (h := SOME r'
	  ; V {hole = h',
	       res = r})

      fun get r = valOf (!r) ()

      fun head l =
	 let
	    val hole = ref NONE
	 in
	    U {hole = hole,
	       res = Thunk.Head (hole, l)}
	 end

      fun tail x =
	 let
	    val hole = ref NONE
	 in
	    V {hole = hole,
	       res = Thunk.Tail (x, hole)}
	 end
   end
------------------------------------------------------------

This is pretty much the data structure that you'd expect for "lists
with a single hole".  Now that the flow analysis is done, we can
combine back the u and v types in the implementation.

------------------------------------------------------------
functor HolePrim6 (): HOLE_PRIM2 =
   struct
      structure Thunk =
	 struct
	    datatype 'a t = (* unit -> 'a list *)
	       Head of 'a option ref * 'a list
	     | Tail of 'a * 'a t option ref
	     | Value of 'a list

	    fun thaw (th: 'a t) =
	       case th of
		  Head (r, l) => valOf (!r) :: l
		| Tail (x, r) => x :: thaw (valOf (!r))
		| Value l => l
	 end

      datatype ('a, 'b) t = T of {hole: 'a option ref,
				  res: 'b Thunk.t}
      type 'a u = ('a, 'a) t
      type 'a v = ('a Thunk.t, 'a) t

      fun fill1 (T {hole, res}, x) =
	 (hole := SOME x
	  ; Thunk.thaw res)

      fun fill2 (T {hole, res}, l) =
	 (hole := SOME (Thunk.Value l)
	  ; Thunk.thaw res)

      fun compose (T {hole = h, res = r}, T {hole = h', res = r'}) =
	 (h := SOME r'
	  ; T {hole = h',
	       res = r})
      val compose1 = compose
      val compose2 = compose

      fun get r = valOf (!r) ()

      fun head l =
	 let
	    val hole = ref NONE
	 in
	    T {hole = hole,
	       res = Thunk.Head (hole, l)}
	 end

      fun tail x =
	 let
	    val hole = ref NONE
	 in
	    T {hole = hole,
	       res = Thunk.Tail (x, hole)}
	 end
   end
------------------------------------------------------------

Well, that's all I have for now.  The most interesting question to me
was if the defunctionalization that I did could be automated --
something like doing as little duplication of types and splitting of
code as possible so that the flow analysis is type respecting and the
resulting defunctionalization is type correct.

As to how to support this style in MLton, I'm leaning towards a pure
compiler optimization hidden from users, but I'm still thinking about
it.

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
MLton-devel mailing list
MLton-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlton-devel