[MLton-user] 'a ref for C programmers: a primer

Tom 7 twm at andrew.cmu.edu
Mon Feb 12 10:24:09 PST 2007


>From my experience teaching ML to C programmers it's a bad idea to use 
words like "pointer"; it's true that at run time many things are 
represented with pointers, but the notion of a C pointer doesn't have a 
direct analogue in ML. (For instance, datatypes are used to achieve many 
things that pointers would be used for in C.)

Instead I'd concentrate on the concept of a value. Variables are always 
bound to values. Some values are immutable and can never change, like 
integers and lists of immutable values. Some values can change, like 
arrays or open files. The same value may be bound to multiple variables, 
or appear in other aggregate values (lists, tuples, arrays). If a mutable 
value is changed and that same value is accessible in multiple ways, 
they'll all see the change because they're looking at the same value. 
Given that, I think it is an accurate analogy to think of refs as 
1-element arrays. (This is exactly how they're implemented in TILT, in 
fact.)

Of course, if you're interested in the run-time memory layout of your 
program, then you need to know about the many ways that MLton might choose 
to compile your program. It's true that this is complicated, so I think 
it's a good reason to not involve it in the understanding of the meaning 
of 'a ref and variable bindings, but only as a pragmatic issue of writing 
efficient programs when it is necessary to tune them at that level.

  Tom

> As a former C programmer, I've always been confused by 'a ref in SML.
> I wanted to state how I've come to think about them for anyone else who is 
> confused.
>
> Please let me know if there's a mistake anywhere!
>
> Normal variable bindings are by reference, or pointers to the value (in 
> principle).
> For example,
>> val x = SomeVector
>> val y = x
> will share space, both point to the same object in the heap. This applies (in 
> principle) to every object, even integers. Every object lives on the heap. 
> Bindings are just pointers to these objects.  Whenever you access a value by 
> a name, you are implicitly dereferencing the pointer to the object pointed 
> to.
>
> However, MLton is clever enough to sometimes recognize that it's cheaper to 
> just copy a variable, rather than store it on the heap. This is subject to 
> some safety criteria that ensure the program cannot distinguish the behaviour 
> from the 'every binding is a pointer' interpretation above. Another criteria 
> is that the object is not 'too big', which would waste memory.
>
> With an 'a vector, MLton may choose to pack the object directly into the 
> vector. For example, int vector will not include binding pointers for each 
> cell. If the contained object is too big, or too complicated, MLton may still 
> use a pointer in the vector if it chooses.
>
> When you write
>> val x = ref 0
>> val y = x
> this creates a special mutable pointer on the heap. The binding 'x' points to 
> the mutable pointer. The mutable pointer points to the integer 0. Thus, when 
> you copy x to y, both binding x and binding y point to the same mutable 
> pointer. When updating a ref, you first follow the binding's pointer to the 
> mutable pointer, then modify the mutable pointer to point at something else. 
> This is why 'y := 5' updates the value of '!x'.
>
> Again, MLton is clever about these pointers and can optimize them away 
> subject to safety criteria. In the above example, it could recognize that the 
> integer 0 could simply be stored where the mutable pointer should be. This is 
> analogous to how it can decide to eliminate the binding pointer for non-ref 
> values. However, in this case it is the mutable pointer saved, not the 
> binding pointer. If the ref'd object is too complicated, it may not be able 
> to apply this optimization.
>
> MLton may also be able to save the binding pointer. However, this savings 
> will never place the mutable pointer on the stack, as a reference from the 
> heap to the mutable pointer could outlive the current stack frame. So, when 
> you have a variable like 'x' above, you are pretty much guaranteed that there 
> is at least one level of indirection going on.
>
> When an 'a ref is contained in a larger structure, MLton may be clever enough 
> to embed the mutable pointer directly into the structure. It can only do this 
> if the structure lives on the heap. For example, if you have a datatype or 
> tuple that includes a mutable pointer, MLton might choose to put the mutable 
> pointer directly inside the object. MLton never copies an object with an 
> embedded mutable pointer, as this would be violate the 'every binding is a 
> pointer' interpretation. Unfortunately, if you ever need to make another 
> binding to these embedded refs, this optimization cannot be applied due to 
> MLton's GC (no pointers to the inside of an object).
>
> 'a array is similar, except that MLton can safely embed all the references in 
> the containing vector. This is safe, because there do not exist functions to 
> make an 'a ref from the cell of an array. This is not the case with 'a ref 
> vector. Here, the user might copy a ref binding out of the vector, which 
> forbids MLton from embedding the reference into the vector, due to the GC.
>
> In fact, MLton is so clever that it can often apply both optimizations at the 
> same time. For a int array, MLton knows it can use the array cells (the 
> mutable pointers) to just store the int in place of the mutable pointer. 
> Thus, int array works as you would like it to with the FFI. However, from a 
> conceptual point of view, there are still the two levels of pointers. 
> Similarly, with a datatype containing an int ref, MLton may just place the 
> int directly into the datatype. Modifying the ref cell modifies the value in 
> the datatype. This object will never be copied and the user cannot not make 
> new bindings to the embedded ref (or this optimization would not happen).
>
> So, there's a reason it's confusing: it is complicated! Conceptually, the 
> 'every binding is a pointer' and 'a ref introduces a mutable pointer' are 
> correct. However, in practice, both of these pointers can often be eliminated 
> completely by MLton.
>
>
> _______________________________________________
> MLton-user mailing list
> MLton-user at mlton.org
> http://mlton.org/mailman/listinfo/mlton-user
>
>


[ NEW! : http://tom7.org/       ]
[ OLD! : http://fonts.tom7.com/ ]



More information about the MLton-user mailing list