[MLton] extended basis library

Sat Oct 21 02:11:12 PDT 2006

Stephen Weeks wrote:
> Comment style
> ----------------------------------------
> I saw a couple of commenting conventions that I liked.  I was
> wondering if they were aimed at automatic extraction of documentation,
> and also if we should establish them as conventions.

Yes, the conventions are aimed at automatic extraction, but are also
intended to look readable on their own.  The conventions would obviously
be more useful if there were a tool that would check their syntax and
generate documentation from them.  We'll get there eventually :).  At any
rate, changing comment decorations to suit the tool that we'll adopt at
some point is a minor effort compared to writing the documentation
comments in the first place.

[...]
> Second was the convention of following specifications in signature
> with the comment.
>
>   val intIso : (char, Int.int) iso
>   (**
>    * The isomorphism between characters and character codes.  It
>    * always equals {(ord, chr)}.  Note that the projection part of the
>    * isomorphism, namely {chr}, is likely to be a partial function.
>    *)
>
> The MLton style has typically put the comment before the spec, which I
> think is a mistake.

Yes, I think that putting the comment after type and val specs has the
advantage that the spec itself essentially becomes a natural part of the
comment (or documentation).

> Formatting
> ----------------------------------------
> The code is indented/layed-out in the MLton style, e.g.
>
>   signature ISO =
>      sig
>         type ('a, 'b) iso = ...
>         ...
>      end
>
> [...]  I prefer the C-brace style
>
>   signature ISO = sig
>      type ('a, 'b) iso = ...
       ...
>   end
[...]
> The advantage of this style is that it saves vertical space (no big
> deal) and indentation nesting (a big win).

I kind of like the MLton-style better, because, IMO, it better reflects
the structure of the language.  (In C the braces in named definitions are
part of the named definition and there are no corresponding anonymous
forms.)  But I've also noticed the issue with excessive indentation, which
gets pretty bad for substructures.  OTOH, I also dislike the exception to
omit indentation when a top-level module declaration is in its own file.
So, perhaps the C-brace style is a practical compromise.  (As you probably
noticed already, I switched to the C-brace style.)  At any rate, I don't
think that we should mandate any specific indentation style, but strongly
recommend using using some consistent style within a library.  In the long
run, we should have a documentation extractor which means that library
users don't necessarily have to browse the source code.

> Naming convention for functors
> ----------------------------------------
> I don't see the point of prefixing functors with "Mk".  It seems
> redundant, as of course a functor is making a new structure.  Dropping
> the "Mk" doesn't create a naming conflict, as functors have their own
> namespace.  Also, there is no conflict in filenames as functors get a
> different suffix (.fun).

Well, I think that having separate namespaces for functors and structures
is a mistake in SML.  In Alice ML functors and structures share the same
namespace.  So, using the same names for functors and structures makes the
code slightly more difficult to translate to Alice ML and potentially other
SML like languages with higher-order functors.  At any rate, those
particular functors are not supposed to be exposed from the library, so it
should be possible to change their names at any time without breaking
compatibility.  So, I'll leave the names as they are for now.

> Avoiding copying in MonoVector.{from,to}Poly
> --------------------------------------------
> One could almost use the following implementation of toPoly.
>
>   fun toPoly v =
>     if MLton.isMLton then
>        v
>     else
>        Vector.tabulate (length v, fn i => sub (v, i))
>
> It doesn't quite work because MLton hides the equivalence between the
> two types.  It would be worth exporting something in the MLton
> structure to expose the equivalence so that one could avoid the copy.

Indeed.  Is there some reason why the equivalences between polymorphic and
monomorphic vectors (and likewise for arrays) aren't exposed?  I haven't
noticed anything in the basis library manual that would disallow it.  I
would guess that the main reason to not expose the equivalences is to avoid
exposing an implementation detail that might change.

> One might worry that the use of the nonstandard "MLton.isMLton" makes
> the code less portable.  But I don't think it's any worse than the use
> of MLBs, Int64, etc..  In the end we will have to decide how much
> effort we're willing to spend to implement code for other SML
> compilers.  I'd like to do the best we can for MLton users, think of
> the extended basis as a spec, and leave to other compilers to figure
> out the best way to implement the spec in their compiler.

You've probably noticed that the names of some of the extended basis lib sml
files have '-mlton' in them.  The idea is that compiler specific code
lives in separate files for each compiler.  When/if it becomes possible to
implement the optimization on MLton, I plan to restructure the code so
that the toPoly/fromPoly functions are implemented in their own file that
then becomes compiler specific.  Porting to a new compiler should then
mean just adding new files files to the library.

I've also thought about having a MLB path variable, say "COMPILER", to
name the compiler.  Then MLB file snippets like

   (* Extended real modules *)
   local
      mk-real-ext.fun
   in
      real.sig
      reals-mlton.sml
   end

would be changed to

   (* Extended real modules *)
   local
      mk-real-ext.fun
   in
      real.sig
      reals-$(COMPILER).sml
   end

and it might then be possible to use the same MLB file on all compilers
that support the ML Basis System.

> MLB convention
> ----------------------------------------
> A useful convention in MLBs is to use an export filter to specify what
> the MLB exports.  This is analogous to using a signature to specify
> what a structure exports.  It nicely collects in one place all of the
> exports, making it easier for a reader to understand at a glance.  It
> is also easier than trying to carefully hide unwanted exports in a
> number of "local"s.  Finally, it serves as a check that the code
> really does export everything that you want.

I agree that using export filters has some nice properties, but I'm not
entirely sold on the idea just yet.  It means that to add new things one
has to change more lines.  OTOH, I guess you'll get more accurate compiler
warnings for unused things in return.  I'll have to think about this.

In the meanwhile, I think that it would nice if, in addition to
-show-basis, MLton would have an option, say -show-basis-summary, that
would only show the kinds (type, val, signature, structure, ...) and names
of all exported top-level bindings (IOW, it would essentially output an
export filter).  I think that such an option would make it easier to maintain
such export filters as well as helpful in understanding MLBs.  It might also
be useful to show the infix status of identifiers in -show-basis output.

Thanks for the comments!

-Vesa Karvonen