[MLton] Unicode / WideChar

Wesley W.Terpstra terpstra@gkec.informatik.tu-darmstadt.de
Sun, 20 Nov 2005 20:05:00 +0100


--Apple-Mail-5--412522420
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=ISO-8859-1;
	delsp=yes;
	format=flowed

So, in response to Henry's talk of a new gtk library,
I have gotten off my ass wrt. Unicode. (gtk requires
that all passed strings be in UTF-8).

Everything is going pretty smoothly, but I wanted to
get some feedback about some choices I have made.

First, the database of these properties is provided
in two files from unicode.org: UnicodeData.txt and
PropList.txt. They total about 1M, but compress to
about 150K. I think the right thing to do is to put
these files inside the mlton svn with a tool that can
parse them and output an appropriate file as part
of the basis. The actual information we need is
much smaller, so the output won't be as large.

The why:
We can update easily to new versions of the
Unicode standard. 1M is not much. We already
build some dynamic sml via mlprof/mlyacc.

The next point of discussion is my take on the
is* fields of WideChar. A few of the relationships
guaranteed by the wording of the standard seem
to be impossible, so I will comment in each entry
below.

I also compare with C++'s implementation, with
the following test program:

#include <locale>
#include <iostream>
using namespace std;

int main() {
   locale l("en_CA.UTF-8");
   for (wchar_t x =3D 0; x < 0x110000; ++x) {
     if (isspace(x, l)) cout << int(x) << ": yes\n";
   }
   return 0;
}

> isAscii c
> returns true if c is a (seven-bit) ASCII character, i.e., 0 <=3D ord =20=

> c <=3D 127. Note that this function is independent of locale.

Not a problem, implemented exactly as stated.

> toLower c
> toUpper c
> These return the lowercase (respectively, uppercase) letter =20
> corresponding to c if c is a letter; otherwise it returns c.

I found that there is a 'simple uppercase/lowercase'
mapping described in UnicodeData.txt. For things
like the German =DF, it simply provides no uppercase
since 'SS' would be two letters, not one.

So... more or less no problem.

> isLower c
> returns true if c is a lowercase letter.
> isUpper c
> returns true if c is an uppercase letter.

Easy; defined in Unicode.

> isAlpha c
> returns true if c is a letter (lowercase or uppercase).

Here we get into trouble. Letter is defined in Unicode
to be uppercase, lowercase, titlecase, modifier, other.
Other includes things like Japanese which have no
concept of case. So, isAlpha returns if a code point is
a letter, which is more than the union of isUpper/Lower.

This is one of those questionable decisions. :-)

> isAlphaNum c
> returns true if c is alphanumeric (a letter or a decimal digit).
> isDigit c
> returns true if c is a decimal digit [0-9].

Here there's more problems.

alphanumeric to me means that it is a letter or a
symbol representing a number. There are more
number symbols than simply those in decimal.

I  took this to include all letters and numbers as
defined as letters and numbers in unicode.

isDigit I took to mean things that correspond to
a decimal (0-9) counting system. There are
characters in Unicode like the Tibetan 3&1/2,
which has it's own symbol.

If I expand isDigit to include non-decimal
numbers, that this problem goes away, but it
ignores the [0-9] part of the standard.

The problem is alphanum > digit + alpha

I note that C++ does NOT count anything
but ascii [0-9] as isDigit. However, it includes
the Tibetan '9' as a letter. WTF?!

This might be because it is using an English
locale...

> isHexDigit c
> returns true if c is a hexadecimal digit [0-9a-fA-F].

There are two definitions of this: Hex_Digit and
ASCII_Hex_Digit. I took it to mean Hex_Digit which
includes a few alternate characters.

C++ with English locale only counts ASCII_Hex_Digit.

> isCntrl c
> returns true if c is a control character.

Not a problem; Unicode defines this.

> isPrint c
> returns true if c is a printable character (space or visible), =20
> i.e., not a control character.

Printable is not defined by Unicode, but
control character is. So, the opposite.

> isSpace c
> returns true if c is a whitespace character (space, newline, tab, =20
> carriage return, vertical tab, formfeed).

Not a problem; PropList.txt describes these.

> isGraph c
> returns true if c is a graphical character, that is, it is =20
> printable and not a whitespace character.

Again, ok, just follow the rules.

> isPunct c
> returns true if c is a punctuation character: graphical but not =20
> alphanumeric.

Here we get a problem.
At this point the standard says that isPunct =3D
graphical && !alphanum =3D
(printable && !space) && !alphanum =3D
!control && !space && !alphanum

... which would mean symbols, marks,  punctuation.
Punctuation is obvious.

Marks include characters which influence the next
character. ie: 'Combining dot above'. which if placed
before a letter 'g' will make a letter g with a dot on top.

Is this punctuation?

What about symbols, which includes all the math
arrows, currency signs, etc.

I lean towards declaring symbols as punctuation.
Marks, I am not so sure about.

For what it's worth, C++ calls math symbols
punctuation. Combining marks too.

One last thing about the implementation:
I don't intend to use a lookup table like char0.sml.
With 10FFFF characters, that's a big lookup table.
Also, lookup tables are only fast if they are in the
cache, so I think a simply binary search terminating
at the ranges makes most sense. (Unicode groups
these things together for us)


--Apple-Mail-5--412522420
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=ISO-8859-1

<HTML><BODY style=3D"word-wrap: break-word; -khtml-nbsp-mode: space; =
-khtml-line-break: after-white-space; ">So, in response to Henry's talk =
of a new gtk library,<DIV>I have gotten off my ass wrt. Unicode. (gtk =
requires</DIV><DIV>that all passed strings be in UTF-8).</DIV><DIV><BR =
class=3D"khtml-block-placeholder"></DIV><DIV>Everything is going pretty =
smoothly, but I wanted to</DIV><DIV>get some feedback about some choices =
I have made.</DIV><DIV><BR =
class=3D"khtml-block-placeholder"></DIV><DIV>First, the database of =
these properties is provided</DIV><DIV>in two files from unicode.org: =
UnicodeData.txt and</DIV><DIV>PropList.txt. They total about 1M, but =
compress to</DIV><DIV>about 150K. I think the right thing to do is to =
put</DIV><DIV>these files inside the mlton svn with a tool that =
can</DIV><DIV>parse them and output an appropriate file as =
part</DIV><DIV>of the basis. The actual information we need =
is</DIV><DIV>much smaller, so the output won't be as =
large.</DIV><DIV><BR class=3D"khtml-block-placeholder"></DIV><DIV>The =
why:</DIV><DIV>We can update easily to new versions of =
the</DIV><DIV>Unicode standard. 1M is not much. We =
already</DIV><DIV>build some dynamic sml via =
mlprof/mlyacc.</DIV><DIV><BR =
class=3D"khtml-block-placeholder"></DIV><DIV>The next point of =
discussion is my take on the</DIV><DIV>is* fields of WideChar. A few of =
the relationships</DIV><DIV>guaranteed by the wording of the standard =
seem</DIV><DIV>to be impossible, so I will comment in each =
entry</DIV><DIV>below.</DIV><DIV><BR =
class=3D"khtml-block-placeholder"></DIV><DIV>I also compare with C++'s =
implementation, with</DIV><DIV>the following test program:</DIV><DIV><BR =
class=3D"khtml-block-placeholder"></DIV><DIV>#include =
&lt;locale&gt;</DIV><DIV>#include &lt;iostream&gt;</DIV><DIV>using =
namespace std;</DIV><DIV><BR =
class=3D"khtml-block-placeholder"></DIV><DIV>int main() {</DIV><DIV>=A0 =
locale l("en_CA.UTF-8");</DIV><DIV>=A0 for (wchar_t x =3D 0; x &lt; =
0x110000; ++x) {</DIV><DIV>=A0=A0 =A0if (isspace(x, l)) cout &lt;&lt; =
int(x) &lt;&lt; ": yes\n";</DIV><DIV>=A0 }</DIV><DIV>=A0 return =
0;</DIV><DIV>}</DIV><DIV style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; font: normal normal normal =
16px/normal Times; min-height: 19px; "><BR></DIV><DIV style=3D"margin-top:=
 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; =
"><BLOCKQUOTE type=3D"cite"><DIV style=3D"margin-top: 0px; margin-right: =
0px; margin-bottom: 0px; margin-left: 0px; "><FONT =
class=3D"Apple-style-span" face=3D"Courier" size=3D"3"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: 13px;">isAscii =
</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Courier" =
size=3D"3"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
13px;"><I>c</I></SPAN></FONT></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><FONT =
class=3D"Apple-style-span" face=3D"Times" size=3D"4"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: 16px;">returns =
</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Courier" =
size=3D"3"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
13px;">true</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: 16px;"> =
if </SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;"><I>c</I></SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Times" size=3D"4"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 16px;"> is a (seven-bit) ASCII character, =
</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;"><I>i.e.</I></SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Times" size=3D"4"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 16px;">, 0 &lt;=3D </SPAN></FONT><FONT =
class=3D"Apple-style-span" face=3D"Courier" size=3D"3"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: =
13px;">ord</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: 16px;"> =
</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;"><I>c</I></SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Times" size=3D"4"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 16px;"> &lt;=3D 127. Note that this function is =
independent of locale.</SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Times" size=3D"4"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 16px;">=A0</SPAN></FONT></DIV></BLOCKQUOTE><DIV =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; "><BR class=3D"khtml-block-placeholder"></DIV>Not a =
problem, implemented exactly as stated.</DIV><DIV style=3D"margin-top: =
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><BLOCKQUOTE =
type=3D"cite"><DIV style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; "><FONT class=3D"Apple-style-span" =
face=3D"Courier" size=3D"3"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 13px;">toLower </SPAN></FONT><FONT =
class=3D"Apple-style-span" face=3D"Courier" size=3D"3"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: =
13px;"><I>c</I></SPAN></FONT></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><FONT =
class=3D"Apple-style-span" face=3D"Courier" size=3D"3"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: 13px;">toUpper =
</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Courier" =
size=3D"3"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
13px;"><I>c</I></SPAN></FONT></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><FONT =
class=3D"Apple-style-span" face=3D"Times" size=3D"4"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: 16px;">These return the =
lowercase (respectively, uppercase) letter corresponding to =
</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;"><I>c</I></SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Times" size=3D"4"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 16px;"> if </SPAN></FONT><FONT =
class=3D"Apple-style-span" face=3D"Times" size=3D"4"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: =
16px;"><I>c</I></SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Times" size=3D"4"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 16px;"> is a letter; otherwise it returns =
</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;"><I>c</I></SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Times" size=3D"4"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 16px;">.</SPAN></FONT><FONT class=3D"Apple-style-span"=
 face=3D"Times" size=3D"4"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 16px;">=A0</SPAN></FONT></DIV></BLOCKQUOTE><DIV =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; "><BR class=3D"khtml-block-placeholder"></DIV>I found =
that there is a 'simple uppercase/lowercase'</DIV><DIV =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; ">mapping described in UnicodeData.txt. For =
things</DIV><DIV style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; ">like the German =DF, it simply =
provides no uppercase</DIV><DIV style=3D"margin-top: 0px; margin-right: =
0px; margin-bottom: 0px; margin-left: 0px; ">since 'SS' would be two =
letters, not one.</DIV><DIV style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">So... more or =
less no problem.</DIV><DIV style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; "><BR><BLOCKQUOTE type=3D"cite"><DIV=
 style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; "><FONT class=3D"Apple-style-span" face=3D"Courier" =
size=3D"3"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
13px;">isLower </SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Courier" size=3D"3"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 13px;"><I>c</I></SPAN></FONT></DIV><DIV =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; "><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;">returns </SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Courier" size=3D"3"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 13px;">true</SPAN></FONT><FONT =
class=3D"Apple-style-span" face=3D"Times" size=3D"4"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: 16px;"> if =
</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;"><I>c</I></SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Times" size=3D"4"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 16px;"> is a lowercase letter.</SPAN></FONT><FONT =
class=3D"Apple-style-span" face=3D"Times" size=3D"4"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: =
16px;">=A0</SPAN></FONT></DIV></BLOCKQUOTE><BLOCKQUOTE type=3D"cite"><DIV =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; "><FONT class=3D"Apple-style-span" face=3D"Courier" =
size=3D"3"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
13px;">isUpper </SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Courier" size=3D"3"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 13px;"><I>c</I></SPAN></FONT></DIV><DIV =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; "><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;">returns </SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Courier" size=3D"3"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 13px;">true</SPAN></FONT><FONT =
class=3D"Apple-style-span" face=3D"Times" size=3D"4"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: 16px;"> if =
</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;"><I>c</I></SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Times" size=3D"4"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 16px;"> is an uppercase letter.</SPAN></FONT><FONT =
class=3D"Apple-style-span" face=3D"Times" size=3D"4"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: =
16px;">=A0</SPAN></FONT></DIV></BLOCKQUOTE><DIV style=3D"margin-top: =
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV>Easy; defined in =
Unicode.</DIV><DIV style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; "><BR><BLOCKQUOTE type=3D"cite"><DIV=
 style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; "><FONT class=3D"Apple-style-span" face=3D"Courier" =
size=3D"3"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
13px;">isAlpha </SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Courier" size=3D"3"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 13px;"><I>c</I></SPAN></FONT></DIV><DIV =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; "><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;">returns </SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Courier" size=3D"3"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 13px;">true</SPAN></FONT><FONT =
class=3D"Apple-style-span" face=3D"Times" size=3D"4"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: 16px;"> if =
</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;"><I>c</I></SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Times" size=3D"4"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 16px;"> is a letter (lowercase or =
uppercase).</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;">=A0</SPAN></FONT></DIV></BLOCKQUOTE><DIV style=3D"margin-top: =
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Here we get =
into trouble. Letter is defined in Unicode</DIV><DIV style=3D"margin-top: =
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">to be =
uppercase, lowercase, titlecase, modifier, other.</DIV><DIV =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; ">Other includes things like Japanese which have =
no</DIV><DIV style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: =
0px; margin-left: 0px; ">concept of case. So, isAlpha returns if a code =
point is</DIV><DIV style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; ">a letter, which is more than the =
union of isUpper/Lower.</DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">This is one =
of those questionable decisions. :-)</DIV><BR><BLOCKQUOTE =
type=3D"cite"><DIV style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; "><FONT class=3D"Apple-style-span" =
face=3D"Courier" size=3D"3"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 13px;">isAlphaNum </SPAN></FONT><FONT =
class=3D"Apple-style-span" face=3D"Courier" size=3D"3"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: =
13px;"><I>c</I></SPAN></FONT></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><FONT =
class=3D"Apple-style-span" face=3D"Times" size=3D"4"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: 16px;">returns =
</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Courier" =
size=3D"3"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
13px;">true</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: 16px;"> =
if </SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;"><I>c</I></SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Times" size=3D"4"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 16px;"> is alphanumeric (a letter or a decimal =
digit).</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;">=A0</SPAN></FONT></DIV></BLOCKQUOTE><BLOCKQUOTE type=3D"cite"><FONT=
 class=3D"Apple-style-span" face=3D"Courier" size=3D"3"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: 13px;">isDigit =
</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Courier" =
size=3D"3"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
13px;"><I>c</I></SPAN></FONT><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><FONT =
class=3D"Apple-style-span" face=3D"Times" size=3D"4"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: 16px;">returns =
</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Courier" =
size=3D"3"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
13px;">true</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: 16px;"> =
if </SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;"><I>c</I></SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Times" size=3D"4"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 16px;"> is a decimal digit [</SPAN></FONT><FONT =
class=3D"Apple-style-span" face=3D"Courier" size=3D"3"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: =
13px;">0</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;">-</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Courier" =
size=3D"3"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
13px;">9</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;">].</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;">=A0</SPAN></FONT></DIV></BLOCKQUOTE><DIV style=3D"margin-top: =
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV>Here there's more =
problems.</DIV><DIV style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">alphanumeric =
to me means that it is a letter or a</DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">symbol =
representing a number. There are more</DIV><DIV style=3D"margin-top: =
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">number =
symbols than simply those in decimal.</DIV><DIV style=3D"margin-top: =
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">I=A0 took =
this to include all letters and numbers as</DIV><DIV style=3D"margin-top: =
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">defined =
as letters and numbers in unicode.</DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">isDigit I =
took to mean things that correspond to</DIV><DIV style=3D"margin-top: =
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">a =
decimal (0-9) counting system. There are</DIV><DIV style=3D"margin-top: =
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; =
">characters in Unicode like the Tibetan 3&amp;1/2,</DIV><DIV =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; ">which has it's own symbol.</DIV><DIV =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; "><BR class=3D"khtml-block-placeholder"></DIV><DIV =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; ">If I expand isDigit to include non-decimal</DIV><DIV =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; ">numbers, that this problem goes away, but =
it</DIV><DIV style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: =
0px; margin-left: 0px; ">ignores the [0-9] part of the =
standard.</DIV><DIV style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">The problem =
is=A0alphanum &gt; digit + alpha</DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">I note that =
C++ does NOT count anything</DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">but ascii =
[0-9] as isDigit. However, it includes</DIV><DIV style=3D"margin-top: =
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">the =
Tibetan '9' as a letter. WTF?!</DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">This might be =
because it is using an English</DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; =
">locale...</DIV><DIV style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; "><BR><BLOCKQUOTE type=3D"cite"><DIV=
 style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; "><FONT class=3D"Apple-style-span" face=3D"Courier" =
size=3D"3"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
13px;">isHexDigit </SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Courier" size=3D"3"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 13px;"><I>c</I></SPAN></FONT></DIV><DIV =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; "><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;">returns </SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Courier" size=3D"3"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 13px;">true</SPAN></FONT><FONT =
class=3D"Apple-style-span" face=3D"Times" size=3D"4"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: 16px;"> if =
</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;"><I>c</I></SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Times" size=3D"4"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 16px;"> is a hexadecimal digit [</SPAN></FONT><FONT =
class=3D"Apple-style-span" face=3D"Courier" size=3D"3"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: =
13px;">0</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;">-</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Courier" =
size=3D"3"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
13px;">9a</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;">-</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Courier" =
size=3D"3"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
13px;">fA</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;">-</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Courier" =
size=3D"3"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
13px;">F</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;">].</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;">=A0</SPAN></FONT></DIV></BLOCKQUOTE><DIV style=3D"margin-top: =
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV>There are two definitions of =
this: Hex_Digit and</DIV><DIV style=3D"margin-top: 0px; margin-right: =
0px; margin-bottom: 0px; margin-left: 0px; ">ASCII_Hex_Digit. I took it =
to mean Hex_Digit which</DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">includes a =
few alternate characters.</DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">C++ with =
English locale only counts ASCII_Hex_Digit.</DIV><DIV style=3D"margin-top:=
 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><BLOCKQUOTE =
type=3D"cite"><DIV style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; "><FONT class=3D"Apple-style-span" =
face=3D"Courier" size=3D"3"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 13px;">isCntrl </SPAN></FONT><FONT =
class=3D"Apple-style-span" face=3D"Courier" size=3D"3"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: =
13px;"><I>c</I></SPAN></FONT></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><FONT =
class=3D"Apple-style-span" face=3D"Times" size=3D"4"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: 16px;">returns =
</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Courier" =
size=3D"3"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
13px;">true</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: 16px;"> =
if </SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;"><I>c</I></SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Times" size=3D"4"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 16px;"> is a control character.</SPAN></FONT><FONT =
class=3D"Apple-style-span" face=3D"Times" size=3D"4"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: =
16px;">=A0</SPAN></FONT></DIV></BLOCKQUOTE><DIV style=3D"margin-top: =
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV>Not a problem; Unicode defines =
this.</DIV><DIV style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; "><BR><BLOCKQUOTE type=3D"cite"><DIV=
 style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; "><FONT class=3D"Apple-style-span" face=3D"Courier" =
size=3D"3"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
13px;">isPrint </SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Courier" size=3D"3"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 13px;"><I>c</I></SPAN></FONT></DIV><DIV =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; "><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;">returns </SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Courier" size=3D"3"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 13px;">true</SPAN></FONT><FONT =
class=3D"Apple-style-span" face=3D"Times" size=3D"4"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: 16px;"> if =
</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;"><I>c</I></SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Times" size=3D"4"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 16px;"> is a printable character (space or visible), =
</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;"><I>i.e.</I></SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Times" size=3D"4"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 16px;">, not a control character.</SPAN></FONT><FONT =
class=3D"Apple-style-span" face=3D"Times" size=3D"4"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: =
16px;">=A0</SPAN></FONT></DIV></BLOCKQUOTE><DIV style=3D"margin-top: =
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV>Printable is not defined by =
Unicode, but</DIV><DIV style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; ">control character is. So, the =
opposite.</DIV><DIV style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; "><BR><BLOCKQUOTE type=3D"cite"><DIV=
 style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; "><FONT class=3D"Apple-style-span" face=3D"Courier" =
size=3D"3"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
13px;">isSpace </SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Courier" size=3D"3"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 13px;"><I>c</I></SPAN></FONT></DIV><DIV =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; "><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;">returns </SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Courier" size=3D"3"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 13px;">true</SPAN></FONT><FONT =
class=3D"Apple-style-span" face=3D"Times" size=3D"4"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: 16px;"> if =
</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;"><I>c</I></SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Times" size=3D"4"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 16px;"> is a whitespace character (space, newline, =
tab, carriage return, vertical tab, formfeed).</SPAN></FONT><FONT =
class=3D"Apple-style-span" face=3D"Times" size=3D"4"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: =
16px;">=A0</SPAN></FONT></DIV></BLOCKQUOTE><DIV style=3D"margin-top: =
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV>Not a problem; PropList.txt =
describes these.</DIV><DIV style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><BLOCKQUOTE =
type=3D"cite"><DIV style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; "><FONT class=3D"Apple-style-span" =
face=3D"Courier" size=3D"3"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 13px;">isGraph </SPAN></FONT><FONT =
class=3D"Apple-style-span" face=3D"Courier" size=3D"3"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: =
13px;"><I>c</I></SPAN></FONT></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><FONT =
class=3D"Apple-style-span" face=3D"Times" size=3D"4"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: 16px;">returns =
</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Courier" =
size=3D"3"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
13px;">true</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: 16px;"> =
if </SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;"><I>c</I></SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Times" size=3D"4"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 16px;"> is a graphical character, that is, it is =
printable and not a whitespace character.</SPAN></FONT><FONT =
class=3D"Apple-style-span" face=3D"Times" size=3D"4"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: =
16px;">=A0</SPAN></FONT></DIV></BLOCKQUOTE><DIV style=3D"margin-top: =
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Again, ok, =
just follow the rules.</DIV></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; =
"><BR><BLOCKQUOTE type=3D"cite"><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><FONT =
class=3D"Apple-style-span" face=3D"Courier" size=3D"3"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: 13px;">isPunct =
</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Courier" =
size=3D"3"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
13px;"><I>c</I></SPAN></FONT></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><FONT =
class=3D"Apple-style-span" face=3D"Times" size=3D"4"><SPAN =
class=3D"Apple-style-span" style=3D"font-size: 16px;">returns =
</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Courier" =
size=3D"3"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
13px;">true</SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: 16px;"> =
if </SPAN></FONT><FONT class=3D"Apple-style-span" face=3D"Times" =
size=3D"4"><SPAN class=3D"Apple-style-span" style=3D"font-size: =
16px;"><I>c</I></SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Times" size=3D"4"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: 16px;"> is a punctuation character: graphical but =
not alphanumeric.</SPAN></FONT><FONT class=3D"Apple-style-span" =
face=3D"Times" size=3D"4"><SPAN class=3D"Apple-style-span" =
style=3D"font-size: =
16px;">=A0</SPAN></FONT></DIV></BLOCKQUOTE><BR></DIV><DIV =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; ">Here we get a problem.</DIV><DIV style=3D"margin-top: =
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">At this =
point the standard says that isPunct =3D</DIV><DIV style=3D"margin-top: =
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; =
">graphical &amp;&amp; !alphanum =3D</DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">(printable =
&amp;&amp; !space) &amp;&amp; !alphanum =3D</DIV><DIV style=3D"margin-top:=
 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; =
">!control &amp;&amp; !space &amp;&amp; !alphanum</DIV><DIV =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; "><BR class=3D"khtml-block-placeholder"></DIV><DIV =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; ">... which would mean symbols, =
marks,=A0=A0punctuation.=A0</DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Punctuation =
is obvious.</DIV><DIV style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Marks include =
characters which influence=A0the next=A0</DIV><DIV style=3D"margin-top: =
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; =
">character. ie: 'Combining dot above'. which if placed</DIV><DIV =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; ">before a letter 'g' will make a letter g with a dot =
on top.</DIV><DIV style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Is this =
punctuation?</DIV><DIV style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">What about =
symbols, which includes all the math</DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">arrows, =
currency signs, etc.</DIV><DIV style=3D"margin-top: 0px; margin-right: =
0px; margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">I lean =
towards declaring symbols as punctuation.</DIV><DIV style=3D"margin-top: =
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Marks, I =
am not so sure about.</DIV><DIV style=3D"margin-top: 0px; margin-right: =
0px; margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">For what it's =
worth, C++ calls math symbols</DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">punctuation. =
Combining marks too.</DIV><DIV style=3D"margin-top: 0px; margin-right: =
0px; margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">One last =
thing about the implementation:</DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">I don't =
intend to use a lookup table like char0.sml.</DIV><DIV =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; ">With 10FFFF characters, that's a big lookup =
table.</DIV><DIV style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; ">Also, lookup tables are only =
fast if they are in the</DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">cache, so I =
think a simply binary search terminating</DIV><DIV style=3D"margin-top: =
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">at the =
ranges makes most sense. (Unicode groups</DIV><DIV style=3D"margin-top: =
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">these =
things together for us)</DIV><DIV style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><BR =
class=3D"khtml-block-placeholder"></DIV></BODY></HTML>=

--Apple-Mail-5--412522420--