[MLton-devel] cvs commit: new front end

Fri, 17 Oct 2003 16:28:33 -0700

> I know they are at the bottom of the list, but I find the use of "_"
> (dontCare) in error messages a little confusing,

Let me explain the motivation for _ and exactly what it means.  My
biggest gripe with SML/NJ type error messages is that I have
difficulty finding exactly what parts disagree of the two types that
fail to unify, and this gets worse as types get bigger.  I *hate*
digging through two large record types trying to find the one field
that differs, especially when it was just a typo in a field name.  The
fact that I use t for every type and that they don't print longtycons
very well makes this even worse.  Sometimes types that are different
are printed exactly the same.

The idea behind the use of _ is to show exactly where the two types
are different.  This is an if and only if.

1. If the two types differ in some component, then something other
than _ is shown.

2. If a _ is shown for two types in some component, then you know that
those components do unify.

I just want to make sure that you understand that if you see

	a -> _
 	b -> _

Then you know that the result types agree, not just that the unifier
never bothered to look at them.  This is unlike SML/NJ, which once it
finds an error doesn't compare the rest of the types.

Given this, I'd like to clarify what I think you mean by

> especially when fixing the immediate type error can simply require
> fixing another one down the line:

I think you're referring to things that don't unify like (_ * _) vs (_ *
_ * _), where you don't get any useful information about the
components and that once you get the right number of components, then
you will have to worry about their types.  The tradeoff is that once
you start filling the components it becomes difficult to see where the
problem is, possibly even difficult to see that it is a 2-tuple versus
a 3-tuple.

> [fluet@nonnel temp]$ cat z.sml
> datatype t = T of int -> int
> 
> val x = T ()
> val x = T ord
> 
> 
> datatype 'a u = U of 'a -> 'a
> 
> val x = U ()
> val x = U ord
> [fluet@nonnel temp]$ mlton -stop tc z.sml
> Error: z.sml 3.9: function applied to incorrect argument
>    expects: _ -> _
>    but got: unit
>    in: T ()

> It's especially the fact that the expects in the first two messages is
> definitely known to be  int -> int; 

Sure, but the point that is made very clear by using _ -> _ is that it
the argument wasn't even of arrow type.  We could display the error as

  Error: z.sml 3.9: function applied to incorrect argument
     expects: int -> int
     but got: unit
     in: T ()

I agree that in this case it isn't confusing to fill in the _.  The
tradeoff seems to change as types get bigger.

> Error: z.sml 4.9: function applied to incorrect argument
>    expects: int -> _
>    but got: char -> _
>    in: T ord

> likewise, with ord, the got is definitely char -> int.

Again, I agree that filling in the _ isn't confusing.

  Error: z.sml 4.9: function applied to incorrect argument
     expects: int -> int
     but got: char -> int
     in: T ord

> Error: z.sml 10.9: function applied to incorrect argument
>    expects: _ -> char
>    but got: _ -> int
>    in: U ord
> compilation aborted: elaborate reported errors

> The last message is especially confusing, since
> having expected  _ -> char means that 'a has been pinned as a char.

Yeah, this is also bad because the error depends on the order in which
unification was done.  It could just as easily be

  Error: z.sml 10.9: function applied to incorrect argument
     expects: int -> _
     but got: char -> _
     in: U ord
  compilation aborted: elaborate reported errors

So, overall I agree with you that in these cases filling in the types
is no worse, and might be better (although I've already internalized
the simple rule for _ and find it quite easy to jump to the
difference).  The tradeof is that using the _ makes it easier to spot
the difference but may require more reading of the code to find what
the _ are.

Here are some ideas that I take from your mail.

* There is sometimes an asymmetry between the two types where you may
want to see more of one than the other, e.g for the "expects" in
function application.

* It's weird to show a type variable instantiated in one component, but
not another.

* In small types, the _ doesn't buy you much.

We could put in some hacks to show more of the types to address these
issues.  For example, we could only use _ if the type it replaces
requires more than 10, say, characters to display.  Or we could always
display the entire type if it fits on a line.  Or we could always go 2
levels deep before inserting _ instead of just showing the 1 outermost
constructor.

I am a little worried about tweaking the messages too much for small
synthetic cases to the detriment of messages in larger programs.  I
have been using MLton it to type check itself for the last week or so
and I am much happier with the _ than I am with SML/NJ's messages,
even with the functor stuff not there yet.  MLton's messages are
especially better with the large record types, where it just shows the
fields that are different.  Hmmm, maybe different type constructors
should be treated differently.

My biggest worry is that tweaks will cloud the simple rule for _.
Right now it is dead simple to find where the disagreements are.
True, there may be disagreements at a deeper level once you fix the
outer ones.  But again, those will be easy to find.  Once we start
printing out parts that are the same, the programmer has to start
searching to find the parts that are different.  Maybe there is some
other way to display more of the types while still making it visually
easy to see the parts that are different.

Le me know if you have any thoughts on intermediate approaches between
MLton and SML/NJ that still have simple explanations to the programmer
(and hopefully simple implementations).  I'd summarize the current
approaches as

MLton: show every difference that you can find in the two types

SML/NJ: show everything you know about the two types

Hmmm.  After writing this, here is another idea.  _ in MLton is being
used in two ways.  Suppose we are unifying

	(real * real) * int * char
with	(real * int * bool) * int * string

Then we will see

	(_ * _) * _ * char
	(_ * _ * _) * _ * string

So the _'s in the inner tuple mean that we didn't even get a chance to
unify these types, since an outer type constructor was not unifiable,
while the _'s in the second component of the outer tuple mean that
these two types unified.  We could reserve _ to be used for types that
match, and always show types that we didn't get a chance to unify.
So, we would see

	(real * real) * _ * char
	(real * int * bool) * _ * string

This rule would at least get the first and third cases from your mail,
and would address the "fixing another one down the line" problem.  I
think it's arguable that with this simple meaning for _ that your
second example is fine too.  It doesn't address your fourth example
and the type variable problem, though.

-------------------------------------------------------
This SF.net email sponsored by: Enterprise Linux Forum Conference & Expo
The Event For Linux Datacenter Solutions & Strategies in The Enterprise 
Linux in the Boardroom; in the Front Office; & in the Server Room 
http://www.enterpriselinuxforum.com
_______________________________________________
MLton-devel mailing list
MLton-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlton-devel