Fwd: [MLton-user] unicode again

Matthew Fluet matthew.fluet at gmail.com
Thu Apr 29 20:11:48 PDT 2010


On Thu, Apr 29, 2010 at 9:21 PM, Sean McLaughlin <seanmcl at gmail.com> wrote:
> Hi,
>   I can't figure out how to use the \U syntax to enter unicode strings.  For
> example
> from a former post about unicode:
>>
>> MLton supports \Uxxxxxxxx escape sequences for describing characters with
>> ordinal value greater than 2^16.  (The SML Definition allows \uxxxx.)
>> Note that the overload resolution depends on how the value is used, not
>> the constant that defines it; so, you might need a WideString.string
>> constraint.
>>
>
> Say I want to enter the unicode forall (∀).  The unicode hex value is
> E28880.  If I translate this to the 8 bit escape sequence I can print it:
> val _ = print "\226\136\128"
> prints ∀
> This conversion is a bit painful though.  The unicode point value is \U2200,
> but I can't get MLton to compile either of these:
> val _ = print "\U2200"
> val _ = print (WideString.toString ("\U2200" : WideString.string))
> What am I doing wrong?

The "\Uxxxxxxxx" syntax requires 8 hex digits.  That explains the
lexer error "Illegal string escape."

However, neither
  val _ = print "\U00002200"
nor
  val _ = print (WideString.toString ("\U00002200" : WideString.string))
will print the forall glyph.

The former will have a compile error:
  Error: z.sml 1.15.
    Character too big: #"\u2200".

The latter will compile, but execute as:
"\u2200"
because WideString.toString produces a String.string that can be lexed
back to the WideString.string.  (See the STRING.toString specification
in the Basis Library.)

MLton doesn't currently have any support for converting between
encodings.  WideString.string would correspond to UTF-32, and, from
your example, you would like to print a UTF-8.  It shouldn't be
terribly difficult to write a function to convert from a UTF-32
WideString.string to a UTF-8 String.string for printing.



More information about the MLton-user mailing list