[MLton] Representation of strings in the FFI

Jens Axel Søgaard jensaxel@soegaard.net
Sun, 01 May 2005 03:05:28 +0200


Stephen Weeks wrote:

>>Since exported functions can't return tuples nor pairs, the obvious
>>solution of letting intArrayLength return a pair of the length and
>>the array won't work.
> 
> True.  To return multiple results you could use the standard C
> approach of passing in pointers to the results as parameters and
> setting them on the SML side.  But at least in this case, perhaps
> GC_arrayNumElements is best.

The following worked fine:

; ml-array-length : pointer -> int
;   return the length of an (raw) array returned from mlton
(define (ml-array-length array)
    ; see GC_arrayNumElementsp in gc.h
    (ptr-ref array _uint -2))

>>Is the location of globals fixed?
> 
> No.

Looking at the generated code, I can see that the values of
globals are in the heap. However one can find them using
the various global C arrays:

     Int8 globalInt8 [0];

Since globals are treated as roots by the garbage collector,
I can use global reference cells to prevent the garbage
collector to free needed values.

BTW - since the location of the value of a global can move,
one needs to write accessor functions like:

   val a = 42;
   val _ = _export "aGlobal": unit -> int; (fn x => a);

This isn't difficult, but it would be more convenient if
the compiler turned e.g.

   val _ = _exportGlobal "a";

into the above line.

(this is similar to, but different from Fluet's suggestion
<http://mlton.org/pipermail/mlton/2003-May/023593.html>)

> I'm not entirely sure what you're trying to do.  

The overall goal is to make it easy to use an MLton library
for a Scheme programmer knowing (almost) nothing about ML.
In an ideal world one would do:

   1. Give MLton the -shared-library flag
   2. Import the functions (and global variables) in Scheme
      using the same type annotation as used in ML

Since MLton needs to know which functions are exported, a step 0
is needed:

   0. Add _export declarations to the ML program for the needed functions

Given the type annotations in 2, the Scheme-ML FFI should automatically
convert Scheme values to ML values, when a call to an ML functions is
made, and the ML return value should be converted to a Scheme value.


The conversion of return values from ML to Scheme turned out to be easy.

; types for atomic values:
(define Int8.int _int8)
...

(define (unsafe-ml-array-ref type array index)
   (ptr-ref array type index))

(define (ml-array-ref type array index)
   (assert-range 0 index (ml-array-length array) (error "index out of range"))
   (unsafe-ml-array-ref type array index))

(define (ml-array->vector type ml-array)
   (let* ((len (ml-array-length ml-array))
          (s   (make-vector len)))
     (do ([i 0 (+ i 1)])
         [(= i len) s]
       (vector-set! s i (ml-array-ref type ml-array i)))))

Similar code works for strings and vectors.

The other way, converting Scheme values to ML values, is tricky
due to the garbage collector.

> Here's an idea
> without a lot of thought behind it -- perhaps you could build your own
> registration system on the SML side to keep track of values that you
> don't want to be GC'ed.  You can't prevent values from being moved,
> but you can force them to remain alive until you explicitly unregister
> them.  Then, you could export from SML to C routines that allow to
> lookup values via the registration system so you can be sure to have
> their current location.

Yes - I think that might be neccesarry in order to keep values alive
betweeen calls. Consider:

ex_array.sml

   fun arraySum a1 a2 = Array.appi (fn i => Array.sub(a1,i) + Array.sub(s2.i) );
   val _ = _export "arraySum": int array -> int array; arraySum;

   val _ = _export "array1"; unit -> int array; (fn _ => Array.fromList [1,2,3]);
   val _ = _export "array2"; unit -> int array; (fn _ => Array.fromList [4,5,6]);

ex_array.scm

   (get-ffi-ml array-sum lib "arraySum" (fun (int array) (int array) -> (int array)))
   (get-ffi-ml array1 lib "array1" (fun unit -> (int array)))
   (get-ffi-ml array2 lib "array2" (fun unit -> (int array)))

   (array-sum (array1) (array2))


This will be evaluated as

   1. array1 is called and returns an ML array a1
   2. array2 is called and returns an ML array a2
   3. arraySum is called with arguments a1 and a2

Without a way of registering roots, a garbage collection during 2.
will result in a1 being moved - and thus the call in 3. will
fail.


A simple solution in the absence of a way of registering
garbage collection roots is to convert all ML values into
Scheme values as soon as possible:

   1a. array1 is called and returns an ML array a1
   1b. copy elemtents of a1 into a Scheme vector v1
   2a. array2 is called and returns an ML array a2
   2b. copy elements of a2 into a Scheme vector v2
   3a. allocate new ML arrays a3 and a4
   3b. copy elements of v1 and v2 into a3 and a4
   3c. call arraySum with arguments a3 and a4,
       an ML array a5 is returned
   3d. copy a5 into a Scheme vector v3

Step 3a reveals why I would like to be able to allocate
arrays from the C level.

-- 
Jens Axel Søgaard