[MLton-user] Pickling typed programs written in HOAS

Vesa Karvonen vesa.a.j.k at gmail.com
Thu Oct 11 08:16:00 PDT 2007


The desire to send programs (or just closures) over a wire seems to pop up
every now and then.  The problem is, of course, that such a thing is not
supported directly in Standard ML.  Now, I'm not claiming that the
approach described in this post would be the ultimate solution to the
problem, but it might actually serve some people's needs and some other
people might find this otherwise interesting (or even amusing).

Here is what I'll briefly describe in this post.  I'll first describe a
simple toy interpreter for a simple untyped language using a simple
first-order AST.  The significance of using a first-order AST is that it
makes it possible to pickle programs.  Then I'll show how to implement a
simple HOAS [HOAS] "skin" on top of the first-order AST.  The significance
of the HOAS skin is that it gives bindings that are easier to use
correctly as well as static typing.  A program written using the HOAS
combinators can then be "reified" to the first-order representation, which
can be pickled.  So, you can write your program in SML (well, using
combinators), have it type checked, send it over the wire, and even
evaluate it.

[HOAS] http://en.wikipedia.org/wiki/Higher-order_abstract_syntax

I'll make use of a number of libraries from mltonlib.  Here is a SML/NJ
"script" for getting the used library stuff:

val mltonlib = "PATH-TO-MLTONLIB-ROOT" ; (* <-- You'll have to fix this *)

use (mltonlib^"/com/ssh/extended-basis/unstable/public/export/infixes.sml") ;
CM.make (mltonlib^"/com/ssh/extended-basis/unstable/basis.cm") andalso
CM.make (mltonlib^"/com/ssh/generic/unstable/lib.cm") ;
val w = mltonlib^"/com/ssh/generic/unstable/with" ;
use (w^"/generic.sml") ;
use (w^"/type-info.sml") ;
use (w^"/type-hash.sml") ;
use (w^"/hash.sml") ;
use (w^"/eq.sml") ;
use (w^"/some.sml") ;
use (w^"/data-rec-info.sml") ;
use (w^"/pickle.sml") ;
use (w^"/close.sml") ;
use (w^"/extra.sml") ;
open TopLevel ;
open Generic ;

Here is the signature of the simple toy interpreter:

signature TOY_CORE = sig
   structure Id : sig
      eqtype t
      val t : t Rep.t
      val new : t Thunk.t
   end
   structure Lit : sig
      datatype t = BOOL of Bool.t
                 | INT of IntInf.t
      val t : t Rep.t
   end
   structure Val : sig
      datatype t = BOOL of Bool.t
                 | FUN of t UnOp.t
                 | INT of IntInf.t
   end
   structure Prim : sig
      type t = String.t
      val t : t Rep.t
      type f = Val.t List.t -> Val.t
      val add : (t * f) Effect.t
      val find : t -> f
   end
   structure Term : sig
      datatype 't f = APP of 't * 't
                    | ERR of String.t
                    | FUN of Id.t * 't
                    | IF of 't * 't * 't
                    | IND of 't Ref.t
                    | LIT of Lit.t
                    | PAP of Prim.t * 't List.t
                    | VAR of Id.t
      datatype t = IN of t f
      val t : t Rep.t
   end
   structure Env : sig
      type t
      val empty : t
      val add : (Id.t * Val.t) * t -> t
   end
   val eval : Env.t -> Term.t -> Val.t
end

It is fairly straightforward except for one unusual thing, which is the
IND constructor.  We'll use it for fixpoints.  One issue worth pointing
out is that the representation of terms has been carefully made
first-order.  In particular, primitive functions are stored in a primitive
environment.  Another detail worth pointing out is that the signature
specifies type representation constructors (the Rep.t values) for terms.
We'll later use those for pickling terms.

Below is a simple implementation of the toy interpreter:

structure ToyCore :> TOY_CORE = struct
   fun assoc i = #2 o valOf o List.find (fn (i', _) => i' = i)

   structure Id = struct
      type t = Unit.t Ref.t
      val t = refc unit
      val new = ref
   end
   structure Lit = struct
      datatype t = BOOL of Bool.t
                 | INT of IntInf.t
      val t = iso (data (C1' "BOOL" bool +` C1' "INT" largeInt))
                  (fn BOOL ? => INL ? | INT ? => INR ?,
                   fn INL ? => BOOL ? | INR ? => INT ?)
   end
   structure Val = struct
      datatype t = BOOL of Bool.t
                 | FUN of t UnOp.t
                 | INT of IntInf.t
   end
   structure Prim = struct
      type t = String.t
      val t = string
      type f = Val.t List.t -> Val.t
      local
         val prims : (t * f) List.t Ref.t = ref []
      in
         val add  = List.push prims
         fun find x = assoc x (!prims)
      end
   end
   structure Term = struct
      datatype 't f = APP of 't * 't
                    | ERR of String.t
                    | FUN of Id.t * 't
                    | IND of 't Ref.t
                    | IF of 't * 't * 't
                    | LIT of Lit.t
                    | PAP of Prim.t * 't List.t
                    | VAR of Id.t
      fun f t =
          iso (data (C1' "APP" (sq t)
                  +` C1' "ERR" string
                  +` C1' "FUN" (tuple2 (Id.t, t))
                  +` C1' "IND" (refc t)
                  +` C1' "IF" (tuple3 (t, t, t))
                  +` C1' "LIT" Lit.t
                  +` C1' "PAP" (tuple2 (Prim.t, list t))
                  +` C1' "VAR" Id.t))
              (fn APP ? => INL (INL (INL (INL (INL (INL (INL ?))))))
                | ERR ? => INL (INL (INL (INL (INL (INL (INR ?))))))
                | FUN ? => INL (INL (INL (INL (INL (INR ?)))))
                | IND ? => INL (INL (INL (INL (INR ?))))
                | IF  ? => INL (INL (INL (INR ?)))
                | LIT ? => INL (INL (INR ?))
                | PAP ? => INL (INR ?)
                | VAR ? => INR ?,
               fn INL (INL (INL (INL (INL (INL (INL ?)))))) => APP ?
                | INL (INL (INL (INL (INL (INL (INR ?)))))) => ERR ?
                | INL (INL (INL (INL (INL (INR ?))))) => FUN ?
                | INL (INL (INL (INL (INR ?)))) => IND ?
                | INL (INL (INL (INR ?))) => IF ?
                | INL (INL (INR ?)) => LIT ?
                | INL (INR ?) => PAP ?
                | INR ? => VAR ?)
      datatype t = IN of t f
      val t = Tie.fix Y (fn t =>
                            iso (data (C1' "IN" (f t)))
                                (fn IN ? => ?, IN))
   end
   structure Env = struct
      type t = (Id.t * Val.t) List.t
      val empty = []
      val add = op ::
      val find = assoc
   end
   local
      open Term
   in
      fun eval e (IN t) =
          case t
           of APP (f, x)   => (case eval e f
                                of Val.FUN f => f (eval e x)
                                 | _         =>
                                   fail "Not a function")
            | ERR m        => fail m
            | FUN (x, b)   => Val.FUN (fn v => eval (Env.add ((x, v), e)) b)
            | IF (b, c, a) => (case eval e b
                                of Val.BOOL true  => eval e c
                                 | Val.BOOL false => eval e a
                                 | _              => fail "Condition not bool")
            | IND r        => eval e (!r)
            | LIT x        => (case x
                                of Lit.INT i  => Val.INT i
                                 | Lit.BOOL b => Val.BOOL b)
            | PAP (p, a)   => Prim.find p (map (eval e) a)
            | VAR x        => Env.find x e
   end
end

Now, one could write programs as first-order AST terms, but that would be
quite error prone.  One would have to make sure that bindings are used
properly and one wouldn't have the benefit of static typing.  Let's fix
that by implementing a HOAS skin on top of the interpreter.  First the
signature of the HOAS skin:

signature TOY = sig
   type 'a t
   val F : ('a t -> 'b t) -> ('a -> 'b) t
   val \> : ('a -> 'b) t * 'a t -> 'b t
   val I : IntInf.t -> IntInf.t t
   val B : Bool.t -> Bool.t t
   val reify : 'a t -> ToyCore.Term.t
   val iff : Bool.t t * 'a t * 'a t -> 'a t
   val Y : ('a -> 'b) t Tie.t
   val function : String.t
                  -> ('dom -> 'cod)
                  -> (ToyCore.Val.t -> 'dom)
                  -> ('cod -> ToyCore.Val.t)
                  -> 'dom t -> 'cod t
   val binOp : String.t
               -> 'a BinOp.t
               -> ('a, ToyCore.Val.t) Iso.t
               -> 'a t BinOp.t
end

A few words about the signature are perhaps in order.  First of all, the
type constructor t is for HOAS terms.  The type parameter is a phantom
type variable and serves to make HOAS terms statically typed.  The F
combinator is for introducing anonymous functions.  The \> combinator is
the function application operator.  The I and B combinators are used for
introducing constant integers and booleans to programs.  The iff
combinator is for conditionals.  Y is for computing fixpoints.  You can
use it to define mutually recursive functions.  The reify function
converts a HOAS value into a first-order term.  The function and binOp
procedures are for registering primitive operations.

Here is an implementation of the HOAS skin:

structure Toy :> TOY = struct
   open ToyCore
   open Term
   type 'a t = Term.t Thunk.t
   fun reify t = t ()
   fun Y ? = Tie.pure (fn () => let
          val r = ref (IN (ERR "fix"))
          val t = IN (IND r)
       in
          (const t,
           fn t' => let val t' = reify t' in r := t' ; const t end)
       end) ?
   fun F f () =
       let val i = Id.new () in IN (FUN (i, f (const (IN (VAR i))) ())) end
   fun f \> x = const (IN (APP (f (), x ())))
   fun I i = const (IN (LIT (Lit.INT i)))
   fun B b = const (IN (LIT (Lit.BOOL b)))
   fun iff (b, c, a) = const (IN (IF (reify b, reify c, reify a)))
   fun function name f from to =
       (Prim.add (name, fn [x] => to (f (from x)) | _ => fail "Impossible")
      ; fn x => const (IN (PAP (name, [reify x]))))
   fun binOp name binOp (to, from) =
       (Prim.add (name, fn [x, y] => to (binOp (from x, from y))
                         | _      => fail "Impossible")
      ; fn (x, y) => const (IN (PAP (name, [reify x, reify y]))))
end

The implementation isn't actually very complicated.  An interesting detail
is the conversion of binders to first-order terms in the F combinator.
This is done by allocating a new identifier and then instantiating the
body of the HOAS term with the identifier.  Another interesting detail is
the fixpoint witness Y, which makes use of the previously mentioned IND
constructor.  Admittedly this is a somewhat unusual technique, but it is easy
to implement and gives what we want.

Now, let's register a few primitives:

structure ToyPrims = struct
   local
      open ToyCore Toy
      val int = (Val.INT, fn Val.INT i => i)
   in
      val op +   = Toy.binOp "+"   op +   int
      val op -   = Toy.binOp "-"   op -   int
      val op *   = Toy.binOp "*"   op *   int
      val op div = Toy.binOp "div" op div int
      val op mod = Toy.binOp "mod" op mod int
   end
   local
      open ToyCore Toy
   in
      val isZero = Toy.function "isZero" (op = /> 0) (fn Val.INT i =>
i) Val.BOOL
   end
end

And then write a simple program:

local
   open ToyCore Toy ToyPrims
in
   val term = let
      val fact =
          (Tie.fix Y)
             (fn fact =>
                 F (fn n =>
                       iff (isZero n,
                            I 1,
                            n * (fact \> (n - I 1)))))
   in
      fact \> I 4
   end
end

The above term, when evaluated, computes the factorial of 4.

In order to pickle the above term, we'll need to convert it to the
first-order representation:

val firstOrderTerm = Toy.reify term

Let's then pickle it:

val pickledTerm = pickle ToyCore.Term.t firstOrderTerm

Unpickle it:

val unpickledTerm = unpickle ToyCore.Term.t pickledTerm

And evaluate it:

val result = ToyCore.eval ToyCore.Env.empty unpickledTerm

And the result is:

val result = INT 24 : ToyCore.Val.t

Voilà!

Now, you'll probably find the syntax of the above factorial program ugly.
I agree.  However, I think that in some settings it could be improved upon
substantially --- perhaps to an acceptable level.  In particular, this
technique would probably make most sense when you have lots of primitives
and your programs are simple --- rarely using new recursive functions.
Notice that the syntax for calling primitives, like isZero, *, and - is
actually quite acceptable.  Also, instead of I (and B), we could use
symbolic combinators like ` for literals, which might feel less obtrusive.

-Vesa Karvonen


More information about the MLton-user mailing list