[MLton] Unicode... again

Fri Feb 9 18:24:55 PST 2007

On Fri, 2007-02-09 at 15:05 -0600, Matthew Fluet wrote:

> As I understand the implementation of the latter in MLton, any string 
> that has \uXXXX will be inferred to have type String16.string = 
> Char16Vector.vector and any string that has \UXXXXXXXX will be inferred 
> to have type String32.string = Char32Vector.vector.  (Inference might 
> also force the type to a higher StringN.string type.)

I wouldn't do that: inference may seem in keeping with the ML
way but for strings it could make problems. I'd go with explicitly
annotating literals with their type:

	".." 8 bit
	u" .." 32 bit

or, if you want 16 bits you might make u" 16 bit an U" 32 bit
though I admit to not liking that.

Be aware in these considerations that GNU gettext functionality
generally requires a prefix for dictionary lookup too, in C
you write

	_("....")

I think, where macro _ specifies a catalogued message to be
translated to the user language. This would stack on top
of any string type indicators..

the problem with inference in general is bad error handling.
For strings you might get an error you couldn't even see:
a bad character code in a string is likely to be hard to
find if your text editor can't display it (which is possible
if it is a bad character ... :)

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net