[MLton-devel] supporting weak pointers

Stephen Weeks MLton@mlton.org
Thu, 13 Mar 2003 15:05:50 -0800


The mGTK guys requested support for finalized values so that they can
do C-side memory management.  To my understanding, finalized values
consist of two things: a way to detect when a value becomes
unreachable (e.g. weak pointers), and a "post-gc-hook" that runs the
finalizers on the unreachable values.  I think that mGTK should work
just fine with support for only unreachability detection, without any
need for a post-gc-hook.  They can build their own finalization out of
weak pointers and do the memory management, e.g., once per callback.

So, I propose to add support for weak pointers to MLton.  Here is the
signature I propose to implement.

signature MLTON_WEAK =
   sig
      type 'a t

      val get: 'a t -> 'a option
      val new: 'a -> 'a t
   end

So, new creates a weak pointer to an object and get returns SOME of
the pointer value, unless it has become unreachable in which case get
returns NONE.

Here is a signature for what I need the compiler and runtime to
provide in order to implement MLTON_WEAK.

signature WEAK_PRIM =
   sig
      type 'a weak

      val canGet: 'a weak -> bool
      val get: 'a weak -> 'a
      val new: 'a -> 'a weak
   end

So, I need a new built in unary type constructor, weak, and primitives:
	_prim "MLton_Weak_canGet": 'a weak -> bool;
	_prim "MLton_Weak_get": 'a weak -> 'a;
	_prim "MLton_Weak_make" : 'a -> 'a weak;

With these, the implementation of MLTON_WEAK is simple.

functor Weak (S: WEAK_PRIM): MLTON_WEAK =
   struct
      type 'a t = 'a S.weak

      val new = S.new

      fun get (x: 'a t): 'a option =
	 if S.canGet x
	    then SOME (S.get x)
	 else NONE
   end

Here is how I propose to implement the primitives in the runtime.

A 'a weak is represented as a normal GC object with a header and two
words.  The first word is either a pointer to the 'a or is zero,
depending on whether the 'a is reachable or not.  The second word is a
link pointer to another weak or is zero.  Although both words in the
'a weak are pointers, from the perspective of the object header and
most of the GC, the weak object is treated as a pair of words.  Hence,
for example, forwarding the weak object does not cause either of the
pointers to be followed.

Each generation in the gc keeps a singly-linked list of all of the
weaks that are in that generation.  The links are through the second
words of the weak objects themselves.

Depending on the gc strategy, copying or mark-compact, the weak lists
are processed differently.

For a copying gc (major or minor), after the forward phase, walk the
weak list, and for each weak object do the following.  If the weak
object has not been forwarded, it is unreachable and useless, so do
nothing.  If the weak object has been forwarded, link the forwarded
object into the weak list of the to-generation and check the 'a
pointer in the forwarded object.  If the 'a points to an object that
has been forwarded, update the pointer.  If not, zero it out. 

For our Jonkers mark-compact gc, the pass that threads pointers
(updateForwardPointers) must treat each weak object that it sees
specially.  If the 'a pointer points to an unmarked object, it must
zero it out.  Otherwise, it must thread the 'a pointer just as it
would a strong pointer.  As to the link, it must follow links,
skipping unmarked weak objects, until it reaches the next marked weak
object and then thread the link pointer to that next marked weak
object.  With all this done, the next pass,
updateBackwardPointersAndSlide, should do the right thing with all the
weak objects.

With all this in place:

  MLton_Weak_canGet simply returns TRUE iff the first word of the weak
	object is nonzero.

  MLton_Weak_get returns the first word of the weak object.

  MLton_Weak_make x allocates a weak object, sets the first word to
	to x, and links it in to the weak list for the young generation.

The only other issue that I can think of is what to do when the 'a in
a 'a weak isn't a pointer.  The compiler will know this in the
SsaToRssa pass.  It seems to me there are two choices.  We can either
pretend that the 'a is always reachable or always unreachable.  The
latter is slightly easier, since we can represent the 'a weak as unit
and implement canGet as fn _ => false.  Threating the 'a as always
reachable is pretty easy too.  Just implement 'a weak as 'a, canGet as
fn _ => true, and get as fn x => x, again all within the SsaToRssa
pass.  If anyone has an argument for going one way or the other, I'd
be interested to hear it.

Well, that's about all I can think of.  It seems like a reasonably
efficient implementation without too much disruption of the runtime
or the compiler.  Comments appreciated.


-------------------------------------------------------
This SF.net email is sponsored by:Crypto Challenge is now open! 
Get cracking and register here for some mind boggling fun and 
the chance of winning an Apple iPod:
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0031en
_______________________________________________
MLton-devel mailing list
MLton-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlton-devel