[MLton-user] SML unicode support

Alexandre Xlex0x835@rambler.ru
Wed, 5 Jan 2005 21:34:03 +0300


Probably...
But if so, I still have to questions:
-1. How else can casual unix programs handle utf-8 in char (using 
locale hack)?
-2. Here (http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf), at 
the 18 page (physical, 27 page on "paper") I find the following phrase:
"The Unicode Standard provides three distinct encoding forms for 
Unicode characters, using 8-bit, 16bit, and 32-bit units. These are 
correspondingly named UTF-8, UTF-16, and UTF-32."
So, if I understand it right, it mean, that: UTF-8 store one character 
using 8 bit, UTF-16 - using 16 bit and  UTF-32 - using 32 bits 
(maximum). If so, C char type is Ok for that... Or I really confused? 
If so, excuse me, but we (I & google) can not find enough information 
about that... =/

Regards,
/Alexandre.

On Jan 5, 2005, at 21:20, Henry Cejtin wrote:

> You are confused.  A C char certainly cannot hold an arbitrary UTF-8 
> encoded
> character.  The reason that your file copy worked is because at each 
> stage
> the char variable had some PART of a UTF-8 character.
>
> _______________________________________________
> MLton-user mailing list
> MLton-user@mlton.org
> http://mlton.org/mailman/listinfo/mlton-user
>