[MLton] Wide{Char,String}.toCString

skaller skaller@users.sourceforge.net
Tue, 22 Nov 2005 12:42:25 +1100


On Tue, 2005-11-22 at 01:29 +0100, Wesley W. Terpstra wrote:

> > val toCString : string -> String.string
> 
> Worse of a problem.
> 
> There is no way to express the unicode chars via C escapes.
> I suppose the most 'reasonable' thing I could do would be
> to dump the text in as UTF-8, escaped with \x32 codes...

Except that function should be called 'toUTF-8' :)

The thing is, C strings do not have any specific associated
code set, so NO conversion makes any sense -- unless
you consider BOTH systems as integer code points 
(and therefore entirely independent of any notion of
character set).

In this latter case, UTF8 conversion is wrong.

In practice, C strings are 'anything you like' because
they're arrays of integers. So conversion to UTF-8
makes sense in that is going to be the most commonly
needed recoding.

> For now, I am following the spec and raising Chr if toCString
> can't fit a code point.

What if the underlying C string uses a signed char type?

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net