[MLton-user] Re: RFC: Extended Basis Library

Sun Feb 25 13:00:41 PST 2007

Quoting Geoffrey Alan Washburn <geoffw at cis.upenn.edu>:
> Vesa Karvonen wrote:
[...]

> > [...] an operation like this can be performed with various
> > asymptotic time complexities:
> >
> >   divideByEq    ->  O(n^2)
> >   divideByCmp   ->  O(n log n)
> >   divideByToInt ->  O(n)
> >   
>     Splitting this into two (or three) different functions as proposed 
> makes sense.  Currently, my implementation is just »divideByEq«, but 
> I'll probably add »divideByCmp« soon as I can certainly use it in a 
> number of places. 

Ok.  Don't spend time implementing stuff you don't need (e.g. the ByToInt
versions).  New variations can be provided later when needed, as long as
the naming scheme allows it.

>     Providing »ByCmp« and »ByToInt« means that it would probably be 
> worthwhile to also provide some sorting functions
> 
>        val sortByCmp : 'a BinPr.t -> 'a t -> 'a t
>        val sortByToInt : ('a -> Int.t) -> 'a t -> 'a t
> 
> However, I'm not sure how to select which algorithms should be used
> here. Even if they are required to have time complexity of O(n log n)
> and O(n) respectively, that leaves a number of choices.  I'm not sure if
> instead it would be preferable to just name them by algorithm
> 
>        val sortMerge : 'a BinPr.t -> 'a t -> 'a t
>        val sortQuick : 'a BinPr.t -> 'a t -> 'a t
>        val sortRadix : ('a -> Int.t) -> 'a t -> 'a t
> 
> and so forth.

Yes, I've thought about the same thing.  I think that it would be best to
name functions by the properties that they are supposed to have (the
properties that the user depends on) rather than by the algorithms they
might employ (which one might want to change later as faster algorithms
are implemented).  This is like it is in the C++ STL, for example.  So,
one could use a name like

  sortByCmp

for the generally fastest comparison based sorting function and

  stableSortByCmp

for the sorting function that is as fast as possible while being stable
(O(n log n) and preserves relative order of equal elements).

[...]
>        val uniqueByEq : 'a BinPr.t -> 'a t UnPr.t
>        val uniqueByCmp : 'a Cmp.t -> 'a t UnPr.t
[...]
> > What about the name of unique?
> 
>     I'm not sure what you mean.  Do you mean an alternative to »unique« 
> or do you mean using the »ByFoo« convention?

I meant the "ByFoo" convention.

> >>                  val flattenOpt : ('a Option.t) List.t -> 'a List.t
[...]
>   Yes, it would seem that this is equivalent to just using 
> »mapPartial«/»mapPartial id«.

Just wondering, do you use flattenOpt often?  Would it be difficult to
rewrite the code to use mapPartial (or something else) so that options
would not be added to lists?  (I mean, if flattenOpt is particularly
convenient in some cases, then maybe we should just add it.)

> > [...] I assume you mean arbitrary precision natural numbers.
>
>     Hmm.  I hadn't actually thought about using it for natural numbers 
> that large, but that is probably the correct design choice.

The main reason I assumed arbitrary precision is that the Word types
already work like unsigned integers with modular arithmetic.

> It would probably also be useful to provide a kind of unary view of the
> natural numbers, but offhand I'm not sure if there is a standard idiom
> for faking »views« in Standard ML.

I recall one article with those keywords:

  http://mlton.org/References#WangMurphy .

> I have some signatures for monads that may be worth including.

Good.  As it happens, I was just working on writing concept signatures for
Monads (and Functors) as I received your e-mail.  See the draft signatures
with notes below.

Covariant functor:

  signature FUNC = sig
     type 'a func
     val map : ('a -> 'b) -> 'a func -> 'b func
  end

No need to use the name fmap as there is no overloading.  I abbreviate the
name as "functor" is a keyword.  I'm not sure if there are any useful
derived utility functions to provide with functors.  If there are, then
the FUNC signature should be split like is done with MONAD_CORE and MONAD
below.

Contravariant functor:

  signature CFUNC = sig
     type 'a func
     val map : ('b -> 'a) -> 'a func -> 'b func
  end

Contravariant functors are actually quite common.  I don't particularly
like the name CFUNC, but I would prefer a short name.

The Monad core signature:

  signature MONAD_CORE = sig
     type 'a monad
     val return : 'a -> 'a monad
     val >>= : 'a monad * ('a -> 'b monad) -> 'b monad
  end

The point here is that there is a functor

  functor MkMonad (Arg : MONAD_CORE) : MONAD

that makes a MONAD (with extra derived utility functions) given a
MONAD_CORE.  The MONAD signature is further split into two:

  signature MONAD_EX = sig
     type 'a monad_ex
     include FUNC where type 'a func = 'a monad_ex
     val >> : 'a monad_ex * 'b monad_ex -> 'b monad_ex
     val >>& : 'a monad_ex * 'b monad_ex -> ('a, 'b) Product.t monad_ex
     val >>* : 'a monad_ex * 'b monad_ex -> ('a * 'b) monad_ex
     val >>@ : ('a -> 'b) monad_ex * 'a monad_ex -> 'b monad_ex
     val seq : 'a monad_ex List.t -> 'a List.t monad_ex
  end

  signature MONAD = sig
     include MONAD_CORE
     include MONAD_EX where type 'a monad_ex = 'a monad
  end

The MONAD_EX signature, which one usually does not refer to explicitly,
specifies the derived Monad operations.  The MONAD signature is just the
conjunction of MONAD_CORE and MONAD_EX.  The reason for having MONAD_EX is
to avoid duplicating specifications when specifying MONADP (MonadPlus):

  signature MONADP_CORE = sig
     include MONAD_CORE
     val zero : 'a monad
     val plus : 'a monad BinOp.t
  end

I'm not sure if the names zero and plus are the best.

  signature MONADP_EX = sig
     type 'a monadp_ex
     val sum : 'a monadp_ex List.t -> 'a monadp_ex
  end

  signature MONADP = sig
     include MONADP_CORE
     include MONAD_EX where type 'a monad_ex = 'a monad
     include MONADP_EX where type 'a monadp_ex = 'a monad
  end

Without the EX signatures, one would have to duplicate specifications in
MONADP.  There is also a functor

  functor MkMonadP (Arg : MONADP_CORE) : MONADP

for making a MONADP out of a MONADP_CORE.

The set of derived operations in MONAD_EX and MONADP_EX is not complete -
I plan to add new derived operations as I see use for them.

In addition to the above, I also have versions of the above signatures
with an extra type parameter:

  signature FUNC' = sig
     type ('a, 'x) func
     val map : ('a -> 'b) -> ('a, 'x) func -> ('b, 'x) func
  end

  signature CFUNC' = sig
     type ('a, 'x) func
     val map : ('b -> 'a) -> ('a, 'x) func -> ('b, 'x) func
  end

  signature MONAD_CORE' = sig
     type ('a, 'x) monad
     val return : 'a -> ('a, 'x) monad
     val >>= : ('a, 'x) monad * ('a -> ('b, 'x) monad) -> ('b, 'x) monad
  end

  signature MONAD_EX' = sig
     type ('a, 'x) monad_ex
     include FUNC' where type ('a, 'x) func = ('a, 'x) monad_ex
     val >> : ('a, 'x) monad_ex * ('b, 'x) monad_ex -> ('b, 'x) monad_ex
     val >>* : ('a, 'x) monad_ex * ('b, 'x) monad_ex -> ('a * 'b, 'x) monad_ex
     val >>& : ('a, 'x) monad_ex * ('b, 'x) monad_ex -> (('a, 'b) Product.t, 'x) monad_ex
     val >>@ : ('a -> 'b, 'x) monad_ex * ('a, 'x) monad_ex -> ('b, 'x) monad_ex
     val seq : ('a, 'x) monad_ex List.t -> ('a List.t, 'x) monad_ex
  end

  signature MONAD' = sig
     include MONAD_CORE'
     include MONAD_EX' where type ('a, 'x) monad_ex = ('a, 'x) monad
  end

  signature MONADP_CORE' = sig
     include MONAD_CORE'
     val zero : ('a, 'x) monad
     val plus : ('a, 'x) monad BinOp.t
  end

  signature MONADP_EX' = sig
     type ('a, 'x) monadp_ex
     val sum : ('a, 'x) monadp_ex List.t -> ('a, 'x) monadp_ex
  end

  signature MONADP' = sig
     include MONADP_CORE'
     include MONAD_EX' where type ('a, 'x) monad_ex = ('a, 'x) monad
     include MONADP_EX' where type ('a, 'x) monadp_ex = ('a, 'x) monad
  end

  functor MkMonad' (Arg : MONAD_CORE') : MONAD'
  functor MkMonadP' (Arg : MONADP_CORE') : MONADP'

Having two sets of signatures is a nuisance, but the extra type parameter
is needed often enough.

I think that the above set of signatures feels a bit complicated, but I
think that they avoid as much duplication as possible and provide the
variations that are need most.  I'm open to suggestions for improvement.

> > [...] I assume that you are aware of the various design choices for
> > implementing lazy lists (like whether to make them odd or even,
> > whether to always memoize or not, etc...).
>
>     I am aware of some possible choices, but I don't know (or remember) 
> the tradeoffs.

The tradeoffs have to do with the degree of laziness, possibility of space
leaks, ease of programming and efficiency.  Even lists are lazier, but
also likely to be slightly slower and heavier to program with than odd
lists.  Wadler's article

  http://homepages.inf.ed.ac.uk/wadler/papers/lazyinstrict/lazyinstrict.ps

and SRFI-45

  http://srfi.schemers.org/srfi-45/srfi-45.html

are relevant here.  Memoization also has a cost and sometimes you might
want to avoid it.

> > Also, I think many algorithms on lists, lazy lists and other sequences
> > could be implemented as a functor to reduce duplication.  (But
> > refactorings like that can also done later.)
> 
>     It might actually be better to treat these as streams instead of 
> lists.  A common ancestor signature might be shared between them and the 
> various IO streams.

Yes, what I meant above is to have signature such as

  signature STREAM = sig
     type 'a t
     val getItem : 'a t -> ('a * 'a t) Option.t
  end

and express algorithms against it.

>     It may also be worth asking whether making lazy lists part of the 
> Extended Basis is reasonable.  Offhand I'm not sure if there was a 
> rationale behind not including sets and maps as part of the Standard
> Basis, and whether the same rationale applies to the Extended Basis.

Well, I think that there are too few SML libraries.  I think that lazy
lists are something that people use often enough that it makes sense to
provide a reasonable implementation in the Extended Basis library.

> > [...] Have you considered asking for commit access to mltonlib?
> 
>     I hadn't really thought about it, but I'm not sure whether my 
> throughput is going to be enough to warrant it.  Right now most of the 
> suggestions that I do have implemented were developed as part of my 
> dissertation project, which has been keeping me busy enough as it is.

If you don't mind the overhead of sending patches and waiting for them to
be applied, then you don't need commit access.  Personally, I feel that it
adds a bit of overhead and then there is the danger that small, but
useful, additions never get suggested.

> Still, I'll try to send along some patches for a few things soon.
> Should I send them to you directly, or also CC the list?

CC on the list.  I'm hoping that someone on the list might, for example,
be inspired to suggest further additions or comment on the library
design. :)  If the traffic starts bothering someone, then we can move it
to a new mailing list for discussing libraries (that Stephen Weeks has
promised to create if necessary).

-Vesa Karvonen