[MLton] Slightly more flexible _overload mechanism

Vesa Karvonen vesa.a.j.k at gmail.com
Sat Dec 15 09:45:30 PST 2007


Since the subject of overloading has been such a hot topic in c.l.f
lately, I thought that I'd take a look at how difficult it might be to
extend MLton's overloading mechanism a bit.  A patch with my extensions is
attached to this post.  While I think that my extensions make the
mechanism slightly more practical, I do not intend to suggest that the
simple approach taken here would particularly elegant.

As some of you already know, MLton has a simple (non-standard) mechanism
for introducing overloads.  I guess it has been designed for implementing
the very simple overloads

  http://mlton.org/basis/top-level-chapter.html#section:3

mandated by the Basis library spec (and the Definition).  You can see an
example of using the mechanism in MLton's implementation of the Basis
library.  First of all, the annotation "allowOverload true" must be
specified to be able to use the mechanism:

  http://mlton.org/cgi-bin/viewsvn.cgi/mlton/trunk/basis-library/overloads.mlb?view=auto

Overloads can then be introduced using the _overload construct:

  http://mlton.org/cgi-bin/viewsvn.cgi/mlton/trunk/basis-library/libs/basis-2002/top-level/overloads.sml?view=auto

In general, the current _overload construct has the form

  _overload priority id : type as longvar (and longvar)*

The first longvar after "as" will give the default type for the overload,
because the overload resolution algorithm will try each of the given
longvars in order until it finds one whose type unifies with the inferred
usage.

I don't fully understand the reasons behind the priority and haven't
examined the issue carefully.  From what I saw while quickly browsing the
code, it would seem that the priority is used as a priority for insertion
into a kind of work queue (sequence that will be sorted later) for
overload resolution.  Two overloads with different priority, unless they
are otherwise forced to be resolved earlier, will be resolved in order
that depends on the priority (one always before the other).  I briefly
looked at the Basis library book and the Definition and saw no mention of
such overload resolution priority.  AFAICT, it should mostly have an
effect when the resolution of several overloads depend on each other.
Also see the following thread from the mailing list:

  http://mlton.org/pipermail/mlton/2004-January/025014.html

Overload definitions respect scope.  So, one can, for example, introduce
local overloads in a (large) module for convenience.  OTOH, MLton does not
currently support specifying overloads in signatures.  While I haven't
done it, this might not be very difficult to implement in a "practical
form", although, if you want a nice general solution, functors will
complicate the matter.  Currently, MLton safely prevents one from matching
an overload to a value specification.

Now, while MLton's current overload mechanism is expressive enough for
implementing the overloads mandated by the Basis library spec, it has a
couple of limitations that make it somewhat cumbersome for more general
use.

One limitation of the current mechanism is that an overload must be
defined all at once.  IOW, while one can completely (re)define an
overload, there is no way to extend an existing overload.  Essentially,
overload definitions are unmodular.  For example, suppose you have defined
a new numeric type and would like to add an overload for it:

structure My = struct
   datatype t = Z | O
   val plus =
    fn (Z, Z) => Z
     | (Z, O) => O
     | (O, Z) => O
     | (O, O) => Z
end

_overload 2 + : 'a * 'a -> 'a as My.plus

Unfortunately the previously defined overloads for + are now forgotten
and, for example, 1+2, will no longer work.  To solve this problem, I
changed the mechanism to allow the identifiers in the overload definition
to include other overloaded identifiers as well as ordinary variables:

_overload priority id : type as longid (and longid)*

With my patch, one can write

_overload 2 + : 'a * 'a -> 'a as + and My.plus

which has the effect of adding My.plus to the set of overloads for +.  One
could also write

_overload 2 + : 'a * 'a -> 'a as My.plus and +

which differs from the previous in that it effectively makes My.plus the
default for +.

Another limitation with the current mechanism is that it only works for
monotypes.  Consider, for example, defining an overloaded sub for
accessing arbitrary sequences:

_overload 2 sub : 'a * int -> 'b
as        List.nth
and      Array.sub
and     Vector.sub
and  CharArray.sub
and CharVector.sub

Unfortunately this does not work.  While the above definition passes the
type checking stage, an attempt to use the overloaded sub on lists, arrays
and vectors fails with errors such as:

Error: mlton-overload.sml 23.9.
  Impossible use of overloaded var: sub.
    (int32 list * int32) -> int32

Also note the silly type, 'a * int -> 'b, given for the overload.  It is
perhaps a bit surprising that MLton accepts the above definition at all.
A more advanced mechanism would perhaps allow one to give a type such as

  forall t. 'a t * int -> 'a

quantifying over type constructors, or maybe allow giving an intersection
type such as

  'a list * int -> 'a andalso
  'a vector * int -> 'a andalso
  (* and so on *)

but these would be considerably more difficult to achieve.  At any rate,
it turns out that MLton currently does not check (at all) whether the
alternatives of an overload can actually be unified with the given type.
So, one could write, for example

_overload 2 funny : int -> int as Real.+ and CharVector.sub

and MLton will currently not give any error at the point of defining the
overload.  I changed this by adding a check to the implementation that
tries to unify each variant with the given type, so now the above gives
the following errors:

Error: /home/vk/work/sml/sandbox/mlton-overload.sml 23.35.
  Variant does not unify with overload.
    overload: [int] -> [int]
    variant:  [real * real] -> [real]
    in: _overload 2 funny: (int -> int) as Real.+ and CharVector.sub
Error: /home/vk/work/sml/sandbox/mlton-overload.sml 23.46.
  Variant does not unify with overload.
    overload: [int] -> [int]
    variant:  [string * int] -> [char]
    in: _overload 2 funny: (int -> int) as Real.+ and CharVector.sub

In addition, I made the typing of overloads more flexible allowing
polymorphic types for overload variants, so now the previous overload for
sub also works.  I'm not entirely sure whether my implementation is fully
correct with respect to the rest of MLton's type checking infrastructure,
but it seems to work (type checks and generates correct code) for the
simple tests I've tried.

I think that MLton's current mechanism should at least be changed to check
that the overload variants/alternatives match the type specified for the
overload.  I also don't see major harm in integrating the rest of my
improvements (with fixes if necessary).  In particular, the possibility of
extending overloads later could be useful in the Basis library
implementation.  Also, the additional complexity of allowing polymorphic
types in overloads is relatively small, IMO, but it is neither required
nor useful in the Basis library implementation.

BTW, developing these extensions didn't take a lot of time.  In terms of
uninterrupted working time, I would estimate roughly two hours (including
compile time).  Knowing that overloads are enabled via an MLB annotation,
I started from "mlton/control/control-flags.sml", searched for "overload",
found the "allowOverload" control, and then browsed the code with the
def-use mode from there (following the uses of the allowOverload flag).

-Vesa Karvonen
-------------- next part --------------
A non-text attachment was scrubbed...
Name: more-flexible-_overload.patch
Type: text/x-diff
Size: 11975 bytes
Desc: not available
Url : http://mlton.org/pipermail/mlton/attachments/20071215/d5f270ac/more-flexible-_overload-0001.bin


More information about the MLton mailing list