[MLton] lexical curiosities

rossberg at mpi-sws.org rossberg at mpi-sws.org
Wed May 25 02:46:52 PDT 2011


Matthew Fluet wrote:
>
> But, here's a curiosity.  Suppose that we were to define alphanumId
> and symId regular expressions that excluded the reserved words.  The
> behavior of a traditional lexer generator is to always choose the rule
> that matches the longest prefix of the current input.  So, the
> program:
>
>   val x = Int.val
>
> would be lexed as VAL, ID(x), EQUALS, LONGID(Int.va), ID(l); and the
> program:
>
>   val x = val.=>
>
> would be lexed as VAL, ID(x), EQUALS, ID(va), LONGID(l.=), ID(>).
> [The Definition allows "=" as a symbolic identifier, as a special
> exception to the "exclude reserved words" restriction.]
>
> Hard to say if there is a "best" way to handle reserved words in long
> identifiers.  (And, since such programs would be rejected for *some*
> reason, it isn't clear that it needs to be handled in the lexer.)  The
> Definition states "Comments and formatting characters separate items
> [of lexical analysis]", which might argue that "Int.val" cannot be
> lexed as LONGID(Int.va) followed by ID(l), though clearly we want
> "f(x)" to be lexed as ID(f), LPAREN, ID(x), RPAREN, despite there
> being no comments or formatting characters between items.  But,
> arguably, lexing "Int.val" as LONGID(Int.va) followed by ID(l) leads
> to even stranger static semantic errors than those presently reported
> by MLton and HaMLet.

Interesting. I'll include that in the list of (pedantic) bugs in the
Definition.

My take on this is that the whole idea of treating longids as lexical
entities like the Definition does is stupid. I see no reason whatsoever to
do it that way(*). It actually is useful to be able to break up longids,
e.g. for line wrap. IMHO, SML/NJ (and others who implement it similarly) did
the only sensible thing there.

/Andreas

(*) My guess is that the only reason they did it was so that they could
treat lookup of longids as a simple meta function over a lexical entity. But
that strikes me as a pretty lame reason.




More information about the MLton mailing list