[MLton-user] SML unicode support
Wed, 5 Jan 2005 22:39:35 +0300
Ok, I understand my fault now.
So, if I it right now - unicode can be stored in C/C++ wchar_t - isn't
If so, is it a problem to make SML interpreter to store characters in
wchar_t like container (I'm sorry, if the question is too lame)?
P.S. To the maillist administrator: probably it is not comfortable only
for me, but usually (as far as I know) when I reply to the message my
mailer take maillist e-mail (in this example - firstname.lastname@example.org),
but not user, who post message. Or you make it especially to
"provocate" people to send message to the maillist & user (via "reply
On Jan 5, 2005, at 22:18, Henry Cejtin wrote:
> There is no way to casually handle UTF-8 (or even Unicode)
> characters in C.
> The encodings UTF-8 and UTF-16 do not store one character in 8 or 16
> That would clearly not be possible because there are more than 256
> and even
> more than 65,536 Unicode characters. UTF-8 and UTF-16 are ways of
> characters as COLLECTIONS of 8-bit bytes or 16-bit chunks.
> Not all
> characters will take the same number of bytes/chunks. UTF-32
> lets all
> characters be the same size (32-bits or 4 bytes) but no one stores
> them that
> way externally (in files) because of the large waste of space.
> The expectation is that files will be in UTF-8 or UTF-16 and on
> reading them
> they will be converted to something more convenient. (Note, if you
> store a
> string in UTF-8 itself, then you can't go to the N-th character
> walking through all the previous characters to see how long they are.)
> MLton-user mailing list