[MLton] mlton bug [String.fromCString ?]

Matthew Fluet fluet@cs.cornell.edu
Thu, 28 Jul 2005 09:04:03 -0400 (EDT)


> I strongly suspect the problem is with String.fromCString, unless
> I'm misusing ffi ?

You are misusing FFI.

val int2string = _import * : DynLink.fptr -> int -> string;
val get_name = int2string (DynLink.dlsym (hndl, "get_name"));

If you _import a function that accepts and/or returns "string", then the 
function must accept/return an ML string on the ML heap.  Part of that 
includes having a valid header word, a length word, no need for null 
termination, etc.  The "get_name" function in names.c does not return an 
ML string on the ML heap; rather, it returns a C-string in static data.

String.fromCString does not convert a null-terminated C string into an ML 
string; it converts an ML (heap allocated) string to another ML (heap 
allocated) string, obeying C-style escape sequences.  However, 
String.fromCString is required to convert only the maximal prefix of the 
string that contains printable C-characters.  Hence, it stops converting 
upon encountering the null termination of the C-strings.

> The program retrieves a series of strings by calls to get_name in
> names.so, then prints them.
>
> The first two strings do not show up properly.

I am very surprised that the program terminates normally with only the two 
strings not printing correctly.  What appears to be happening is that the 
first two strings are somehow interpreted as ML strings of length zero; 
presumably, the padding in the static data section has put enough zeros in 
that when interpreting the pointer as an ML heap allocated string, we find 
zero where we expect to find the length.  After the first two strings, we 
start seeing the previous strings where we expect to find the length. 
Hence, the lengths are interpreted as "big-enough" to convert the C-string 
until encountering the null termination.  This is certainly not robust, 
and is quite sensitive to the fact that there appear to be no intervening 
garbage collections -- which would be thouroughly confused by your 
C-strings.

The right way to accomplish what you have in mind is the following:

val int2ptr = _import * : DynLink.fptr -> int -> MLton.Pointer.t;
val get_name = int2ptr (DynLink.dlsym (hndl, "get_name"));

fun fetchCString ptr =
    let
       fun loop (i, accum) =
 	 let
 	    val w = MLton.Pointer.getWord8 (ptr, i)
 	 in
             (* Search for explicit null termination. *)
 	    if w = 0wx0
 	       then String.implode (List.rev accum)
 	       else loop (i + 1, (Byte.byteToChar w) :: accum)
 	 end
    in
       loop (0, [])
    end

val v = Vector.tabulate(c,fn i => fetchCString (get_name i));


> Also, funnyly, mlton rejects file names.sml on Solaris, in which
> it finds a syntax error ...

I don't have an explaination for that.