[MLton] implement _address and _symbol

Stephen Weeks MLton@mlton.org
Tue, 19 Jul 2005 08:37:29 -0700


> > It is a (minor, as in relatively easily fixed) deficiency of the 
> > runtime system that there is no way to register ML pointers with the 
> > runtime to be treated as roots and updated at a GC.
> 
> Wouldn't this be a good thing to change?

Yes.

> Being able to guarantee that heap objects remain valid for the
> duration of a C function call seems like an important thing to
> me.

Pinning arbitrary heap objects isn't quite my picture of what we would
do.  I was thinking there would be a new kind of object, a "pinned"
value, that would be guaranteed to not move.  A pinned value points
into the SML heap at a normal heap heap object, but is not in the heap
itself.  Whenever a GC occurs, the GC updates the pinned pointers, but
does not move the pinned objects.  Probably pinned objects would be
malloc'd, possibly we would use our own (very simple) memory manager
outside the GC.

Then, the right way to pass heap objects between SML and C would be to
pass a pinned value pointing to the heap object.  MLton.Pinned would
have a signature something like

signature PINNED =
   sig
      type 'a t

      val free: 'a t -> unit
      val get: 'a t -> 'a
      val new: 'a -> 'a t
   end

where type 'a t opaquely expands to MLton.Pointer.t and 'a is required
to be a heap pointer.  Pinned.free is necessary because the GC does
not know who has pointers to pinned objects, and hence does not know
when to free them.

> OTOH, if there are significant performance penalties...

I'm pretty sure you would only pay if you use it, and then a small
per-currently-live-pinned-object charge at each GC.

> > > Another frightening aspect no one has brought up: what about pointers?
> > > val set : int vector -> unit = _store "x"
> 
> It seems to me that being able to mark "x" as a GC root fixes that case.
> 
> However, _store * (or _symbol *) can't be fixed that easily....
> You would get a build up of roots with no way to clear them.

I think pinned objects solves this problem.  You don't mark the
symbol.  You create a new object.  It will be cleaner both
conceptually and implementation-wise to separate pinning from the FFI
keywords.

> > > (* Fails to compile (cannot export polymorphism): *)
> > > val () = _export "id" (fn x => x)
> > 
> > This points out a problem with the type inference approach. 
> 
> True.
> 
> > MLton will quite happily infer the type "unit -> unit" for "fn x => x", in
> > which case the export may succeed.
> 
> I'm not sure what you're saying here?

Without the required annotation, there is not enough information for
MLton to deduce the type of "x".  In such situations MLton can either
choose a default type (e.g. unit), issue a warning, or report an
error.

> > > I don't mind that a 'define'-ed _symbol is not initialized; this is *C* 
> > >   and that behavior is allowed.
> > 
> > Not requiring initialization seems better to me.
> 
> I'm not talking about initializing it for C, but SML.
> Here the expectation is that the variables are always 'initialized'.
> 
> If the symbol came from C, then it is statically initialized before main().
> If the symbol came from SML, the best we can do is initialize it before
> it can be referenced inside the SML code.

I still don't get the point.  We're defining a symbol so it can be
accessed from C.  Perhaps C wants to initialize it as well.

Hmmm.  I guess we could make the semantics zero it out?