[MLton-user] SML unicode support

Alexandre Xlex0x835@rambler.ru
Wed, 5 Jan 2005 22:39:35 +0300


Ok, I understand my fault now.
So, if I it right now - unicode can be stored in C/C++ wchar_t - isn't 
it?
If so, is it a problem to make SML interpreter to store characters in 
wchar_t like container (I'm sorry, if the question is too lame)?


Regards,
/Alexandre.

P.S. To the maillist administrator: probably it is not comfortable only 
for me, but usually (as far as I know) when I reply to the message my 
mailer take maillist e-mail (in this example - mlton-user@mlton.org), 
but not user, who post message. Or you make it especially to 
"provocate" people to send message to the maillist & user (via "reply 
all")?

On Jan 5, 2005, at 22:18, Henry Cejtin wrote:

> There  is  no way to casually handle UTF-8 (or even Unicode) 
> characters in C.
> The encodings UTF-8 and UTF-16 do not store one character in 8  or  16 
>  bits.
> That  would  clearly not be possible because there are more than 256 
> and even
> more than 65,536 Unicode characters.  UTF-8 and UTF-16 are ways  of  
> encoding
> characters  as  COLLECTIONS  of  8-bit  bytes  or  16-bit  chunks.   
> Not  all
> characters will take the  same  number  of  bytes/chunks.   UTF-32  
> lets  all
> characters  be the same size (32-bits or 4 bytes) but no one stores 
> them that
> way externally (in files) because of the large waste of space.
>
> The expectation is that files will be in UTF-8 or UTF-16 and on 
> reading  them
> they  will  be converted to something more convenient.  (Note, if you 
> store a
> string in UTF-8 itself, then you can't  go  to  the  N-th  character  
> without
> walking through all the previous characters to see how long they are.)
>
> _______________________________________________
> MLton-user mailing list
> MLton-user@mlton.org
> http://mlton.org/mailman/listinfo/mlton-user
>