[MLton] Two questions about FFI types

Matthew Fluet fluet@cs.cornell.edu
Tue, 10 May 2005 10:50:05 -0400 (EDT)


> 1. Some of the mysql functions take a number of C strings (character 
> pointers), that behave differently if they are null. Unfortunately there 
> doesn't appear to be a way to give a type to these C functions for 
> _import; string doesn't work because I can't pass null, and 
> MLton.Pointer.t doesn't work because I can't pass string. The only two 
> solutions I have to this are (a) to import the function at multiple types, 
> and then call the one that matches my dynamic set of nulls--there are then 
> 2^n imports in general! or (b) write a C stub that takes, for each 
> argument, a char* and a bool to indicate if it is supposed to "be null". 
> Neither of these is very nice to me. I'd be the first to argue that null 
> is an abomination, but it is very common in C libraries that mlton 
> programs would want to interface with, so are there any prospects of being 
> able to do this a cleaner way? (Or is there already a cleaner way?)

I don't know of any cleaner way given the way things are currently 
implemented.  I ran into a very small version of this issue with some of 
the networking functionality of the Basis Library:

http://mlton.org/pipermail/mlton/2002-December/022923.html

Our conclusion there was to have two different C functions.  But, that 
doesn't really scale to your situation.

Be careful importing the same C function at two different ML types.  The C 
codegen emits a C-prototype for each imported function, so you need to 
ensure that the ML type maps to the same C type.  This isn't a problem in 
this situation as the C type of a MLton.Pointer.t and the C type of an 
array are both (char*).

To be honest, I don't think that importing the function at multiple types 
and exporting it with a type of string options is that bad an option.

> 2. Since this interfaces with a database (and in fact my task will be very 
> data-intensive), I want to avoid copying as much as possible. I need to 
> make one copy to read data from rows returned from the server and generate 
> the mlton representation, and I'd like to limit it to that--but I want to 
> be manipulating strings in my program, not character arrays. According to 
> the FFI documentation, it appears that one way to do this would be to pass 
> a CharVector (=string) to the FFI, and have the C code modify it in place:
> 
> let
>    (* s would be allocated based on its target length *)
>    val s = "_______"
>    val f = _import "f" : string -> unit ;
> in
>    f s;
>    ... s ...
> end
> 
> But is this safe? Will the mlton optimizer, knowing that strings are 
> immutable, make them share space (hash consing?) or optimize subscripts on 
> constant strings? I can't tell from the FFI docs.

It is not safe.  The optimizer (and hash-consing gc) assumes that vectors 
are in fact immutable.

> If it's not safe, is there some way to go from C->array->vector that 
> doesn't do two copies? I'm already doing FFI, so I don't mind if it's not 
> type-safe (but it obviously needs to be robust).

Yes, there is an unsafe array -> vector coercion, used internally, though 
not exported by the Basis implementation.  You would use it like:

let
  val arr = CharArray.tabulate (n, #" ")
  val f = _import "f": CharArray.array -> unit ;
  val () = f arr
  val vec = Unsafe.CharVector.fromArray arr
in
  ... vec ...
end


If you want to play even faster and looser, you could do

let
  var arr = Unsafe.CharArray.create n
  val f = _import "f": CharArray.array -> unit ;
  val () = f arr
  val vec = Unsafe.CharVector.fromArray arr
in
  ... vec ...
end

which won't even bother initializing the array (which is fine for a 
character array, as it has no internal pointers).  If your C function 
needs for the array to be null terminated, you may need to do a tabulate 
so that no extra \000 chars may appear in the array.