[MLton] type error messages

Stephen Weeks MLton@mlton.org
Wed, 15 Dec 2004 13:33:27 -0800


> John came up with a few signature/structure pairs where the error
> message reported could be improved.

I've thought about these, and in short, one was an easy bug to fix,
while the other two are hard.  First, the bug.

> Error: y.sml 6.15.
>   Type y admits equality in signature but not in structure.
>     not equality:

This spurious error was due to MLton treating primitive types with an
incorrect notion of equality (only affecting erroneous programs).  I
checked in a fix.

Now, for an explanation of how signature matching is implemented, and
why this makes the other two messages harder to fix.

Matching a structure against a signature ("fun transparentCut" in
elaborate-env.fun), is done in two stages.

 1. "Realize" all the flexible tycons in the signature with real types.
 2. Verify that each component of the structure matches the
    corresponding realized type in the signature.

For example, in 

structure S:
   sig
      type t
      val x: t
   end =
   struct
      type t = int
      val x = 13
   end

we choose that tycon t in the signature should correspond to int.  We
then verify that val x in the structure matches type t realized by
int, i.e. that the type of x is int.

This is all straightforward and leads to good error messages as long
as there is only one "name" for each flexible tycon in a signature.
However, sharing and where constraints as well as type definitions in
signatures can lead to multiple names for the same flexible tycon.
For example, in

signature SIG =
   sig
      type t
      type u = t
   end

there is only one flexible tycon, which has two names, t and u.  When
matching a structure S against SIG, MLton has to choose whether to use
the structure's definition of t or of u to realize this tycon.  Of
course, in a structure that matches, it won't matter, because t and u
will be equal.  But in a non-matching structure, different choices
will lead to different error messages.  MLton uses a simple rule in
deciding which names to use: chose the shortest name (in terms of
number of structure components), breaking ties alphabetically.  So,
when matching against the above signature, MLton will choose the
structure's definition of t to realize the flexible tycon.  Thus, for

structure S: SIG =
   struct
      type t = int
      type u = char
   end

we get the following error

  Type definition in structure disagrees with signature.
    type: u
    structure: [u]
    signature: [t]

This error message reflects MLton's choice of t for the flexible
tycon, as well as the other rule, which you correctly inferred, for
displaying tycon names: display the shortest name (in terms of number
of structure components), breaking ties by using the most recently
defined name.  

For type checking other than signature matching, most recently defined
is pretty easy to define.  For example, in

type t = int
val _ = "a": int

we will get the following error

Expression and constraint disagree.
    expects: [t]
    but got: [string]
    in: ("a"): int

This shows that the more recent definition of t is used instead of
int.

For signature matching, I decided that the most reasonable notion of
recent is to put the definitions made by the structure in scope -- see
lines 2459-2465 of elaborate-env.fun.  Going back to our earlier error

  Type definition in structure disagrees with signature.
    type: u
    structure: [u]
    signature: [t]

Type t in the structure is defined to be int, so MLton realizes type t
in the signature to be int.  Because type u = t, u is also realized
int.  Type u in the structure is defined to be char.  This doesn't
match type u in the signature, so we report an error.  To report the
types in the structure (char) and realized signature (int), we use the
shortest/closest definitions (u for char, t for int).

We could choose to not use the structure's definitions in naming
types, leading to the following error.

  Type definition in structure disagrees with signature.
    type: u
    structure: [int]
    signature: [char]

That looks great.  But it doesn't work so well when we define
datatypes in the signature.  For example

structure S: SIG =
   struct
      datatype t = T
      type u = int
   end

leads to the error

  Type definition in structure disagrees with signature.
    type: u
    structure: [int]
    signature: [?.S.t]

This is arguably less readable than

  Type definition in structure disagrees with signature.
    type: u
    structure: [u]
    signature: [t]

which says quite clearly that type u and t are not equal (remember
that brackets in type errors always indicate components that are not
equal).

Another nice thing about using the type names introduced in the
structure is that error messages regarding types in the signature
often look pretty much like the type in the signature.  For example,

structure S:
   sig
      type t
      val x: t
   end =
   struct
      type t = int
      val x = "a"
   end

gives the following error message

  Variable type in structure disagrees with signature.
    variable: x
    structure: [string]
    signature: [t]

Without introducing the definition for t, we would have

  Variable type in structure disagrees with signature.
    variable: x
    structure: [string]
    signature: [int]

Anyways, I don't really know which way is better.  You can comment out
the "openStructure" on line 2463 of elaborate-env.fun if you want to
try out a MLton that doesn't introduce the current structure's type
names into scope when reporting signature matching errors.  I'd be
interested to hear if people think that that produces better errors on
average.

As to your question

> When printing out an error about a particular type name, would it be
> possible to choose the scope not including the type name?

All the error messages for a structure/signature match use the same
scope, so this is tough.  It would be nice to use a per-error scope,
and go back and use the scope just before the definition of the type
name.  But I don't see an easy way to do that.  Checkpointing the
environment at each definition would probably be too costly.  We could
put in a special case hack to just avoid that one name by passing a
list of disallowed names to setTyconNames, but I dunno if that's a
good idea.

No matter what we do there will always be room for human explanation
of type errors.  I think it would be nice to add a TypeErrors page to
the wiki with a catalog of all type error messages along with example
programs that generate them.

It would also be nice to add some of the above explanation to the
Elaborate page on the wiki, as well as some of the more user-oriented
stuff to the TypeChecking page.  People shouldn't have to infer the
closest-definition rule.  If you understand how things work, the error
messages should be clearer.