[MLton] unicode

Matthew Fluet fluet@cs.cornell.edu
Fri, 9 Sep 2005 13:16:09 -0400 (EDT)


> I've found this by accident:
>
>   http://srfi.schemers.org/srfi-75/
>
> How is the state of the art for the support of Unicode in sml, 
> especially mlton?

There is no real support for Unicode in the Definition of Standard ML;
there are a few throw-away sentences stating things along the lines of 
"ASCII must be a subset of the character set in programs", but that hardly 
constitues support.

Neither is there real support for Unicode in the Standard ML Basis 
Library.  The general consensus (which includes the opinions of the 
editors of the Basis Library) is that the LargeChar structure is 
insufficient for the purposes of Unicode.

MLton has some preliminary support for 16 and 32 bit characters and 
strings.  It is even possible to include arbitrary Unicode characters in 
32-bit strings using a \Uxxxxxxxx escape sequence.  (This longer escape 
sequence is a minor extension over the Definition which only allows 
\uxxxx.)  This is by no means completely satisfactory in terms of support 
for Unicode, but it is what is currently available.

There are periodic flurries of questions and discussion about Unicode in 
SML/MLton.  The most recent, which did lead to some seemingly sound design 
decisions, was last December:

The discussion started at:
   http://mlton.org/pipermail/mlton/2004-December/026396.html

Stephen posted a good summary of points at:
   http://mlton.org/pipermail/mlton/2004-December/026440.html

and the discussion continued.