[MLton] WideChar

Stephen Weeks MLton@mlton.org
Mon, 13 Dec 2004 11:18:37 -0800


Wesley, the compiler and runtime support is in place for 2-byte and
4-byte character and strings, as well as for \U escapes.  I added a
few definitions to misc/primitive.sml to show you how to access the
new types and primitives.  Feel free to start working on adding basis
library support based on this stuff.  There are surely some kinks
remaining, so let us know when you hit them.

I didn't add support for UTF-8 encoded strings, since we don't yet
have the the decoding function.  But once we have that it should be
easy to tweak ml.lex to call the decoder.

One thing to keep in mind is that we need to be able to run both the
SML/NJ-compiled MLton as well as the MLton-compiled MLton.  So, the
compiler sources should never contain any \U escapes.  The basis
library sources can use \U, since those would only be seen by MLton.

Also, it would be good for the SML/NJ-compiled MLton to be able to
handle UTF-8 encoded strings.  This means that the decoding function
should be written portably.  One possibility is to define a WideChar
stub in SML/NJ that uses int.  Another is to simply have the decoder
take a word8 vector and return an int vector.