[MLton] better support for C pointers

Stephen Weeks MLton@mlton.org
Fri, 28 Nov 2003 15:52:55 -0800


I've thought some about better support for C pointers in MLton.  Here
are some of the things we'd like to be able to do.

1. Phantom typed pointers that can be used in _import.

      type 'a t

      _import "f": int t -> unit t;

2. Pointer arithmetic.

      val add: 'a t * word -> 'a t
      val diff: 'a t * 'a t -> word
      val isNull: 'a t -> bool
      val null: 'a t
      val sub: 'a t * word -> 'a t

3. Memory fetch and store for various primitive types.

      val getInt: 'a t -> int
      val getWord: 'a t -> word
      ...
      val setInt: 'a t * int -> unit
      val setWord: 'a t * word -> unit
      ...

Broadly speaking, I see two ways that we could go to implement all
this.

A. Keep pointer as a primitive type, and implement all operations via
   _prim or _ffi.

B. Implement pointers in terms of words, and rely on :> to hide stuff.

If we go with (A), then to implement (1) we need to change the
primitive pointer type in the compiler to add the phantom type
argument, and change some early pass to drop the argument.  So, it's
not too hard, and won't affect the optimizer, codegen, or backends.
To implement (2), we will have to add primitives or use _ffi to
implement pointer arithmetic.  This is a bit annoying, since the _ffi
is slow, and adding prims requires compiler mods all the way back to
where the prims are implemented.  This could be mitigated somewhat by
translating the pointer prims into word prims at some early pass.  

If we go with (B), then to implement (1) we need to change :> so that
it propagates the "ffi aspect" of types through :>, i.e., so that the
following program now works.

	structure S:> sig type t end = struct type t = word end
	val _ = _import "foo": S.t -> S.t

From a language persepective, this doesn't seem so bad to me, and adds
what Vesa originally wanted.  Compared to what we currently have,
where :> hides the ffi aspect of types, it seems more powerful, since
one can always keep something from being usable by the FFI by adding a
datatype wrapper.  From an implementation perspective, this is a
pretty easy change to the type checker to tell it to look inside
opaque types when elaborating _{ex,im}port expressions.

If we go with (B), then to implement (2) is easy since these are just
word ops.

To implement (3) for both (A) and (B), we need to add prims that are
implemented by the backend.

The real problem with (B) as I see it is that it exposes the knowledge
of the size of pointers in the basis library code.  I don't see any
way around this.  This will certainly create complexities when we move
to a platform with 64 bit pointers.  It will require some kind of
conditional code in the basis library.  What I don't know is if we
will need conditional code anyways.

Because of the potential future problems with (B), I lean slightly
towards (A).  Hopefully the set of _prims we will need won't be too
large and I will be able to eliminate them all in the front end.
Also, even with (A), I could still change the new meaning of :> w.r.t
FFI.

I am interested to hear other people's thoughts.  Vesa, is there any
functionality that you needed that isn't here?  Could you show us the
code you already have?

There are also a couple of minor questions regarding pointer
arithmetic.  Should add, diff, sub check for overflow?  I lean towards
answering "no", but don't have good arguments.  Should add, diff, and
sub deal with ints instead of words?  Again, I lean towards no, this
time more strongly because using ints would cut down on the range of
representable values.