[MLton] implement _address and _symbol

Wesley W. Terpstra wesley@terpstra.ca
Mon, 18 Jul 2005 15:16:52 +0200


So, I've started poking at the MLton source to implement this.
Trying to resolve how it will all work I hit some snags.

It seems there was consensus on these points:
  FFI that uses MLton.Pointer.t should be pointer-type transparent
  a function is preferred for 'getting' values of a symbol
  _import and _export should not be changed wrt. functions

Ok. However, there seems to be a contradiction here wrt _import *.

'_import *: int -> int;' right now gives MLton.Pointer.t -> int -> int.
Where should the type of the pointer go?
'_import *: MLton.Pointer.t -> int -> int;' ?
That would break compatibility.

Ditto for _symbol *. It seems the right types are:

_symbol "x": int;	==>                    (unit -> int) * (int -> unit)
_symbol *: int;		==> MLton.Pointer.t -> (unit -> int) * (int -> unit)

However, where does the pointer get specified?
'_symbol *: MLton.Pointer.t * int;' seems bogus to me as it is completely
unrelated to the resulting type, despite appearing to be.

In fact, all of the ': ....;' syntax seems bogus to me.
Where's the point in specifying all of this? 
It's not even the actual type! It's just confusing (to me).

One approach might be to deprecate ': ...;'. Continue to accept it for
_export, _import, and _import *, but issue a warning. Then, using type
inference, determine the pointer type used for _import *.

If I am going for the type-inference route, then the _symbol keyword becomes
less appealing. This is because you have to write a big mess for actual type
signature if you want to document the val's type (a good habit IMO). 

In fact, if you are going to keep the ': ...;' syntax, this documentation
becomes all the more important because it is (non-trivially) related to the
':...;' you have to specify, and is therefore quite confusing to read.

eg:
(* What is the type of out? Only hard-core MLton hackers know. *)
val out = _symbol *: MLton.Pointer.t * int;
(* a function?:  MLton.Pointer.t -> (unit -> int) * (int -> unit) *)
(* a hybrid?:    (MLton.Pointer.t -> unit) * (MLton.Pointer.t * int -> unit) *)
(* what I said?: MLton.Pointer.t * int *)
... eek!

So, I think I am swinging back to advocating Matthew's original
_fetch/_store notation... However, _fetch/_store might convey a 
sense of immediacy, contrary to their function nature.

val ptr   : MLton.Pointer.t = _address "x"
val getat : MLton.Pointer.t -> unit -> int = _fetch *
val setat : MLton.Pointer.t -> int -> unit = _store *

val get : unit -> int = _fetch "x"
val set : int -> unit = _store "x"

Also, I don't like how the 'define' option of _symbol leaves the symbol in
an undefined state even after it becomes available to SML. That C can see
it prematurely we have to live with, but in SML, that's bad. How about
_symbol instead be use like the _export keyword for symbols?

val myptr : MLton.Pointer.t = _symbol ("x", 88:int)

Stephan's _symbol suggestion is perhaps shorter if you need both:
	val (get, set) = _symbol "x"
vs
	val (get, set) = (_fetch "x", _store "x")
However, it's not a lot shorter.

When you (sanely) document the type, the _symbol method is hard to read:
	val (get : unit -> int, set : int -> unit) = _symbol "x"
vs
	val get : unit -> int = _fetch "x"
	val set : int -> unit = _store "x"

Another frightening aspect no one has brought up: what about pointers?
val set : int vector -> unit = _store "x"

This is extremely frightening (to me) since it seems the exported pointer 
can never be assumed to contain valid information. For _import this works,
because you don't use the GC during the C function call.

And what about 
val get : unit -> int vector = _fetch "x"
Where does the length information come from?

I just compiled foo.sml:
val ex = _export "test": int vector -> unit;
fun out x = print (Int.toString x ^ "\n")
fun app x = Vector.app out x
val () = ex app
... this actually works, yikes.

I can only assume that the programmer is required to only pass back SML
arrays to SML functions; never arrays coming from C. After the C call
which set the symbol, on return to SML the GC might run. Thus, _fetch
doesn't make sense either.

So, _fetch/_store of heap types should fail to compile, right?

Basically, right now I have the following in mind:

(* The preferred _import style: *)
val somefn    :                    int -> int = _import "somefn"
val somefnptr : MLton.Pointer.t -> int -> int = _import *

(* These generate deprecated warnings (with suggested change): *)
val somefn  : int -> int = _import "somefn": int -> int;
val someval : int = _import "someval": int;

val somefnptr  : MLton.Pointer.t -> int -> int = _import *: int -> int;
val somevalptr : MLton.Pointer.t -> int = _import *: int;

(* This fails to compile (can only import functions): *)
val someval : int = _import "someval"

(* The preferred _export style: *)
fun add (x, y) = x + y
val setAdder : (int * int -> int) -> unit = _export "addr"
val () = _export "addr2" (fn (x, y) => x + y)

(* These generate deprecated warnings (with suggested change): *)
val setAddr : (int * int -> int) -> unit = _export "addr": int * int -> int;
val () = _export "addr2": int * int -> int; (fn (x, y) => x + y)

(* Fails to compile (cannot export polymorphism): *)
val () = _export "id" (fn x => x)

(* These work: *)
val ptr   : MLton.Pointer.t = _address "x"
val myptr : MLton.Pointer.t = _symbol ("y", 88:int)

val getat : MLton.Pointer.t -> unit -> int = _fetch *
val setat : MLton.Pointer.t -> int -> unit = _store *
val get   : unit -> int = _fetch "x"
val set   : int -> unit = _store "x"

(* These fail to compile: (cannot export heap objects) *)
val get : unit -> int ref = _fetch "z"
val set : int ref -> unit = _store "z"
val get : unit -> int vector = _fetch "z"
val set : int vector -> unit = _store "z"

Comments?

-- 
Wesley W. Terpstra