[MLton] WideChar?

Wesley W. Terpstra terpstra@gkec.tu-darmstadt.de
Thu, 9 Dec 2004 16:31:57 +0100


On Thu, Dec 09, 2004 at 10:00:31AM -0500, Matthew Fluet wrote:
> > As  you say, the fact that a unicode code needs more than 4 nibbles is
> > really a problem.  You cannot make the number of hex characters in \u
> > variable because then it is ambiguous (because you can't tell where the
> > character code ends).  Always requiring 8 hex digits would really be
> > even more onerous than just the fact that you need to use \u at all.
> 
> Again, for expedience, one might (gasp) extend the lexical defintion to
> allow \Uxxxxxxxx, which would let you write down any Unicode string.
> If you happen to fall in the low (plane / codepage / whatever terminology
> is correct), then you can use \uxxxx.

Sure, that's a good idea.

However, for the reasons I cited earlier, the \uxxxx and \Uxxxxxxxx methods
should only be used for non-typeable characters. Requiring people to look up
every letter of a string in a unicode table is not acceptable.

PS. If I make WideChar = Word21, will/could MLton pack arrays so they only
need 3 or less bytes per character? If so, then I see no need for a 2-byte
version of Unicode in memory. It would provide a simpler API.

-- 
Wesley W. Terpstra