[MLton] Two questions about FFI types

Stephen Weeks MLton@mlton.org
Tue, 10 May 2005 07:44:21 -0700


> 1. Some of the mysql functions take a number of C strings (character 
> pointers), that behave differently if they are null. Unfortunately there 
> doesn't appear to be a way to give a type to these C functions for 
> _import; string doesn't work because I can't pass null, and 
> MLton.Pointer.t doesn't work because I can't pass string. The only two 
> solutions I have to this are (a) to import the function at multiple types, 
> and then call the one that matches my dynamic set of nulls--there are then 
> 2^n imports in general! or (b) write a C stub that takes, for each 
> argument, a char* and a bool to indicate if it is supposed to "be null". 

Of these two, obviously (b) is the better solution since it requires
only one import.

As an improvement, you could use the same idea as (b), but a different
encoding of NULL.  The idea is to pass SML strings, but use the fact
that some SML strings (such as "") are not valid C strings, which must
be null terminated.  So, you can encode NULL with the SML empty
string.  Then, your wrapper on the C side checks the SML string
length, and if it is zero, passes NULL.  Presumably all normal strings
will have been null terminated on the SML side (we find it useful in
the MLton basis library to enforce this via the type system).  I don't
think one can do much better than having a test on the C side, since
the GC treats zero as a pointer, and hence zero is not a valid SML
string.

If you're willing to suffer copying the strings, you could use
MLton.Pointer.t.  Then, when you want to pass an SML string, you copy
it to a C string using malloc, pass it, and then free it upon return.
When you want to pass NULL, use MLton.Pointer.null.  But I like the
previous solution better.

> I need to 
> make one copy to read data from rows returned from the server and generate 
> the mlton representation, and I'd like to limit it to that--but I want to 
> be manipulating strings in my program, not character arrays. According to 
> the FFI documentation, it appears that one way to do this would be to pass 
> a CharVector (=string) to the FFI, and have the C code modify it in place:
> 
> let
>    (* s would be allocated based on its target length *)
>    val s = "_______"
>    val f = _import "f" : string -> unit ;
> in
>    f s;
>    ... s ...
> end
> 
> But is this safe? Will the mlton optimizer, knowing that strings are 
> immutable, make them share space (hash consing?) or optimize subscripts on 
> constant strings?

This is not safe.  MLton's optimizer and runtime know that vectors and
strings are immutable and take advantage of this.  For example, the
optimizer will conflate different occurrences of the same string
constant.  So, another use of the string "_______" in the above
program would also see the modifications.  Also, the MLton runtime can
perform hash-consing, although it is disabled by default.

Looking at the SSA generated by a simple test program reveals that
MLton doesn't optimize constant subscripts on constant strings :-(.

  val () = if #"o" = String.sub ("foo", 1) then () else raise Fail "bad"

I could easily see this optimization going in someday.

> If it's not safe, is there some way to go from C->array->vector that 
> doesn't do two copies? I'm already doing FFI, so I don't mind if it's not 
> type-safe (but it obviously needs to be robust).

Yes.  You can allocate an SML array, pass it to the C side, and then
convert it to a vector using the built-in Array_toVector primitive

  _prim "Array_toVector": 'a array -> 'a vector;

This of course requires the programmer to be convinced that he's
holding the only copy of the array and will not modify it after the
conversion, which should be easy to do in a local use of the FFI.

Array_toVector is not exported by the basis library.  There are a
couple of ways to get it.  You could tweak the basis library to expose
it in the ARRAY_EXTRA signature and the MLTON_ARRAY signature.
Although I might put it somewhere else, like MLton.Array.Unsafe, just
as a reminder.

Alternatively, you could access the primitive directly from your
program with

  val a2v = _prim "Array_toVector": 'a array -> 'a vector;

To do this, you must compile the file that uses _prim with the
annotation "allowPrim true", using either the MLB "ann" syntax or
-default-ann from the command line.