[MLton-user] Re: RFC: Extended Basis Library

Sun Mar 4 05:15:29 PST 2007

On 3/4/07, Geoffrey Alan Washburn <geoffw at cis.upenn.edu> wrote:
> Vesa Karvonen wrote:
>
> > Note that the misc-util library in mltonlib already has a merge sort implementation:
> >
> >   http://mlton.org/cgi-bin/viewsvn.cgi/mltonlib/trunk/com/ssh/misc-util/unstable/sorted-list.sml?view=auto
> >   http://mlton.org/cgi-bin/viewsvn.cgi/mltonlib/trunk/com/ssh/misc-util/unstable/sorted-list-test.sml?view=auto
>
> Are you recommending moving these into extended basis?

At least not in the current form of the SortedList module.  The merge sort
algorithm implemented in the SortedList module is fairly efficient (modulo
any inefficiency introduced by the use of selectors to specify
cardinality) and you might want to benchmark against it (possibly with the
cardinality policy stuff implemented using more conventional techniques).
The technique of specifying cardinality policy using a selector is
experimental.  The point is that client code remains concise:

  SortedList.insert #1 ...

vs

  SortedList.insert SortedList.noDups ...

One downside is that it is likely to be much harder for a compiler to
optimize, but I haven't investigated this.

> I've been thinking that it would make sense to develop a signature
> »SORTABLE« in »public/sequence«.

A good place for such concept signatures might be public/sequence/concept.
(Hmm... Maybe we should abbreviate sequence to seq.)  I think that at
least two separate concept signatures are needed: SORTABLE_IN_PLACE and
SORTABLE.  IOW, One for mutable sequences that can (and perhaps should) be
sorted in-place (e.g. arrays) and another for immutable sequences that can
not (e.g. vectors, strings, lists, ...).

> However, I am not sure how to best address the cardinality policy in
> your implementation above.  When sorting arrays (and perhaps vectors as
> well) I expect it would be undesirable to use a cardinality policy that
> alters the length of the array.

Depends.  One might indeed want to sort an array and discard duplicates.
In SML, the length of an array can't be mutated, so discarding duplicates
in an in-place sort would probably be a bit unnatural.  OTOH, with
resizable arrays, which might also be nice to provide, it would make
sense.

> I'm not sure if the correct thing to do here would be to instead of
> providing a policy argument to just provide two sets of functions if it
> makes sense for the underlying sequence.

Perhaps that could be expressed as a sortable concept for resizable
sequences.

> I almost have a patch for the »ByEq« and »ByCmp« functions ready, but I
> wanted to wait until we hashed out the sorting signature(s) as discussed
> above.

I think that it would be good to look at concrete signature proposals.

> > Instead of
> >
> >   case prjn x of
> >     ...
> >
> > you could then write
> >              __n__
> >             /     \
> >   case prjF S ... S $ x of
> >     ...
>
>         Okay, I'll have to look into giving this a try.

(BTW, after sending the message I noticed that there is an off-by-one
error above...)

> > Hmmm... The symbolic ++ and an infix declaration for ++ might be better.
> > What about associativity and precedence?  We should probably reconsider
> > the precedences of all monadic operators.
>
> Haskell uses
>
>    infix 1 >> >>=
>    infixr 1 =<<
>
> I haven't seen any infix uses of mplus other than via `mplus`.

I think that many Haskell libraries provide their own symbol in addition
to mplus.  For example, Parsec uses <|> and, IIRC, STM uses orElse.  Using
++ with a precedence lower than the precedence of >> and >>= might be
sufficiently convenient for many libraries.

> > Also, from my quick reading of the article, I get the impression that
> > once and ifte (and interleave and >>-) can be implemented in terms of
> > msplit + MonadPlus, so I'm not sure that having once and ifte in a
> > CORE signature makes sense.
>
> I had meant to mention that technically »msplit« is more primitive, but
> for my purposes I actually found it easier to write »once«, etc.
> directly.  But that

Here is what I think.  The simpler/simplest CORE signature should be given
the shortest/most convenient name.  Other alternative "CORE" signatures
(and corresponding functors to create utility functions) might also be
provided as an optimization/convenience if makes sense.

> > AFAIU, the fail function in Haskell's Monad type class is there mainly for
> > practical reasons related to the Haskell language and the same reasons do
> > not apply to SML.  In particular, in SML, you can just raise an exception
> > if you detect an *error* during a computation.
>
> Having »fail« in practice seems to be a little more convenient than
> raising an exception.  At least, my idiom tends to look something like
>
>         (some computation) >> (fail errorMsg)
>
> and simply replacing it with a
>
>         (some computation) >> (raise errorExn)
>
> would have the incorrect semantics, so additional thunking would be
> necessary everywhere
>
>         (some computation) >> (thunkM (fn () => raise errorExn))
>
> I suppose arguably a reasonable middle ground would be to give »fail«
> the type
>
>         val fail : exn -> 'a monad
>
> giving
>
>         (some computation) >> (fail errorExn)

Ok.  Couldn't you instead just have

  fun fail exn = fn _ => raise exn

with the spec

  val fail : exn -> 'a -> 'b

and then write, regardless of the underlying monad,

  some computation >>= fail exn
                   ^^^
                  not >>

where fail gets a type of the form

  exn -> 'a -> 'b monad .

Actually, the Extended Basis provides the function raising

  http://mlton.org/cgi-bin/viewsvn.cgi/mltonlib/trunk/com/ssh/extended-basis/unstable/public/basic.sig?view=auto

which is equivalent to the above fail function

So, with raising, you should already be able to just write

  some computation >>= raising exn

without having to extend any of the MONAD* signatures.

> > Something like that could also be provided in addition to other monad
> > signatures.  However, I would factor that into combinable signatures.
[...]
> >   signature MONAD_WS
> >   signature MONAD_STATE
> >   signature MONADP_STATE
> >   signature LOGICM_STATE
[...]
> Sure that sounds like a good plan.
>
> Though that turns out to be quite a few signatures.  It would be nice if
> something like »Ramsey, Fisher, Govereau«'s proposal were adopted.

I agree.  It would be nice to have a more expressive language of
signatures.

When writing a library containing a family of concept signatures, it
probably doesn't make sense to provide all the reasonable combinations
(powerset).  Indeed, the whole point is that given a set of combinable
signatures, one should be able to pick any reasonable subset of those
signatures and combine them easily using include and where constraits.

> My implementation has a number of functions that probably would be best
> put into the MONAD_EX signature (and the MkMonad functor)

Yes, MONAD_EX would be the right place for these.

>    (* Like "o" but for monadic functions *)
>    val oo : ('b -> 'c monad) * ('a -> 'b monad) -> ('a -> 'c monad)

The idea looks good.  Do you use both oo and o in the same scope a lot?
If not, then I think that it might be better to just use the same symbol
for both.  This way one can "reuse" the fixity declaration.
Alternatively, the Extended Basis currently provides the infix operator
<--> with the same fixity as o.  The <--> symbol is (currently) used for
isomorphism and embedding composition.  I'm not sure whether it would be
better to just use o for those as well.

>    (* Turn a "pure" function into a bindable function *)
>    val lift  : ('a -> 'b) -> 'a -> 'b monad

Do you use it in expressions like

  aM >>= lift a2b

I sometimes write

  aM >>= return o a2b

which is probably syntactically closests to the above kind of use of lift,
but there many many ways to express these things.  One way is to use map
(fmap in Haskell)

  map a2b aM

yet another is to use >>@ (ap in Haskell)

  return a2b >>@ aM

What is most natural (with respect to what you want to express) and/or
concise depends on the case.

Anyway, the type/idea of lift looks good, but I wonder about the name. In
Haskell (http://members.chello.nl/hjgtuyl/tourdemonad.html#liftM), the
name lift is used for functions having a type of the form

  ('a1 -> ... -> 'aN -> 'r) -> 'a1 monad -> ... -> 'aN monad -> 'r monad

>    (* Turn a "pure" thunk into a computation *)
>    val thunk : 'a Thunk.t -> 'a monad

Hmm... Does this mean that type constructor monad is always of the
following form?

  type 'a monad = 'a monad_dom -> 'a monad_cod

I'm not sure that it makes sense to restrict the MONAD concept signatures
to such monads only.  This should probably go into a separate concept
signature.

>    (* Ignore the result of a computation *)
>    val ignore : 'a monad -> unit monad
>
>    (* Perform some computation only if the guard is true *)
>    val when : bool -> unit monad -> unit monad
>
>    (* Perform some computation only if the guard is false *)
>    val unless : bool -> unit monad -> unit monad

These look good.

> then there are a few more that pretty much deal entirely with lists.  I
> don't know if it makes sense to put them in MONAD_EX or somewhere else.

I think that these should also go into MONAD_EX.

>   Right now I've given them them the names used in Haskell just for my
> benefit, but it would probably be better to give them more SML like names
>
>    val tabulateM : int -> (int -> 'a monad) -> 'a List.t monad
>    val mapM : ('a -> 'b monad) -> 'a List.t -> 'b List.t monad
>    val mapM_ : ('a -> 'b monad) -> 'a List.t -> unit monad
>    val appM : ('a -> unit monad) -> 'a List.t -> unit monad
>    val foldlM : ('a * 'b -> 'b monad) -> 'b -> 'a List.t -> 'b monad
>    val foldrM : ('a * 'b -> 'b monad) -> 'b -> 'a List.t -> 'b monad
>
> Let me know I what you think and I can prepare a patch for these.

I think that in SML we should just drop the M suffix.  In SML one uses the
module system for scoping.  In a signature, one would have,

  structure Monad : MONAD  (* or M : MONAD if you prefer concise names *)

and in code one would say

  Monad.foldl ...

Also, I don't particularly like the Haskell naming conventions.  Note that
the name map is already used in FUNC.  Here is what I propose.  Use the
name seq when you want to obtain the list of results and the name app when
you aren't interested in the results.  Use the suffix With when you want to
map over the list of monads.  These conventions would give us:

  val seq  : 'a monad_ex List.t -> 'a List.t monad_ex
  val seqWith : ('a -> 'b monad_ex) -> 'a List.t -> 'b List.t monad_ex
  val app : 'a monad_ex List.t -> Unit.t monad_ex
  val appWith : ('a -> 'b monad_ex) -> 'a List.t -> Unit.t monad_ex

or possibly

  val app : Unit.t monad_ex List.t -> Unit.t monad_ex
  val appWith : ('a -> Unit.t monad_ex) -> 'a List.t -> Unit.t monad_ex

> Another bit of code I have that might make sense in the extended basis
> is my »ListTriple« signature and structure.  Is it basically just »zip«
> and »unzip« for triples instead of pairs, but could be extended to match
> »ListPair«.  However, maybe it would be better to write a »ZIP«
> signature for sequences based upon »Fold« and the »PRODUCT« signature.

Hmm... I like the Zip : ZIP name.  Yes, I think that Zip : ZIP would be
better than ListTriple.  Stephen Weeks and I wrote several variations of
such a module (named ListProduct or SeqProduct) a while back and you can
find those on the MLton list.

-Vesa Karvonen