[MLton] Re: [Sml-basis-discuss] Unicode and WideChar support

Aaron Turon adrassi@gmail.com
Tue, 29 Nov 2005 11:18:44 -0600


On 11/29/05, John Reppy <jhr@cs.uchicago.edu> wrote:
> I think that we'll have
>
>        val yytext : unit -> substring
>
> where UTF-8 is used to encode unicode characters.  We use substrings
> to avoid
> unnecessary copying and a function to be lazy about substring
> creation (our assumption
> is that compilers are better at eliminating unused local functions
> than unused calls
> to external functions that happen to be pure).

That sounds right.  On the other hand, we are already doing the work
to decode UTF-8 for performing lexical analysis, so we might be able
to offer an additional value (say, yyutext) that will yield the
decoded substring.  Probably that feature could wait until a standard
unicode representation is established.

Aaron