[MLton] Re: [Sml-basis-discuss] Unicode and WideChar support

Aaron Turon adrassi@gmail.com
Tue, 29 Nov 2005 10:55:34 -0600


On 11/29/05, Geoffrey Alan Washburn <geoffw@cis.upenn.edu> wrote:

>     Though given that there isn't yet an agreed upon Basis module for
> Unicode what does your lexer generate in terms of strings?

Within the tool, REs are languages over Word32 "symbols" (clamped to
128 for ASCII).

For the generated code, when lexing unicode the plan is to have an
abstract notion of unicode characters and a front end translation from
a UTF8-encoded character stream into those characters.  Probably at
this point they will just be Word32's.  If at a later point there is a
better representation it would be easy to switch things over.

Aaron