[MLton-devel] A CM replacement for MLton

Wed, 9 Jul 2003 18:24:55 -0700

I've done some research and some thinking about a system for MLton to
do programming in the large and to replace CM, and think I have a
simple, yet powerful, approach.  The timetable I have in mind is to
implement this (or whatever we come up with after discussion) after
the release this week but before the next release, hopefully in
conjunction with the new front end.  Then, we'll leave CM support in
for a while, but eventually drop it in six months to a year.

Comments appreciated.

--------------------------------------------------------------------------------
Introduction
--------------------------------------------------------------------------------

Here are the systems I looked at:

	SML/NJ's CM [7,8,9]
	ML Kit's Projects [1]
	Moscow ML's mosmake [2] and mosmldep [3]
	OCaml's ocamldep [4]
	Poly/ML's Make System [5]
	Portable library descriptions [6]

There is one system that I wanted to study but didn't -- I recall an
email sent to sml-implementers some time back from some people at CMU
describing some attempt to specify dependencies and handle separate
compilation.  I couldn't find it, but would be interested to look at
it again if someone digs it up.

Now for a brief overview of each system.

A CM group file specifies an unordered list of other group files and
SML files, and an export filter.  CM infers an order on the SML files
in a program.  An export filter describes the modules exported from a
group file and hides all other modules.

An ML Kit Project (.pm) file specifies a list of imports, which are
other .pm files, and a body, which is a sequence of SML files.  The
meaning is pretty clear: a .pm file denotes the basis computed by
first making available everything defined in the imports and then
elaborating the SML files in sequence starting in that basis.  The
body can also include "local" declarations for name hiding.

mosmldep is a very simple tool that produces makefile dependencies.
It has two modes of operation.  In the first mode, it takes a an
ordered list of Moscow ML files and produces a simple makefile in
which each file depends on all the previous ones in the list.  In the
second mode, it processes all the Moscow ML files in the directory and
infers accurate dependencies.  However, it only works for a very
restricted subset of SML.

mosmake was built by a dissatisfied user of mosmldep.  With it, the
user explicitly writes down the dependencies between SML files in a
terse format.  mosmake then creates the makefile that will make the
appropriate calls to mosmlc to rebuild the project and do cutoff
recompilation.

ocamldep is like mosmldep in that it takes a list of source files and
produces makefile dependencies.  It works on the full OCaml language,
and presumably produces accurate results.  It also assumes a
connection between structure names and the files that define them.

Poly/ML's make system is very simple.  You can call PolyML.make with a
structure name and PolyML will recompile the files needed to rebuild
that structure.  PolyML will compute the dependencies based on free
references and assumes a convention for naming files that define the
free references.

Portable library descriptions are an attempt at a common interchange
format between SML implementations to describe collections of modules.
They provide a simple language for describing def-use graphs for
modules imported and exported by files.  There are is also mechanism
for hiding modules.

In summary, here are the various design choices.

  file order: is the order of files in the program implicit or explicit
  dependencies: are they explicitly listed or implicitly inferred
  scoping: is there a way to hide names
  cutoff recompilation: does the system avoid recompiling a module
    when its dependencies haven't changed in an essential way
  file name convention: does the system assume some convention for finding
    the file that defines a module

Here is where the various systems fit in the design space.

						file
		file			cutoff	name
	 	order	depends scoping	recomp	conv
		-------	-------	-------	-------	-------
CM		imp	imp	yes	yes	no
ML Kit .pm	exp	imp	yes	yes	no
mosmake		imp	exp	no	yes	no
mosmldep	exp	imp	no	no	yes
ocamldep	exp	imp	no	yes	yes
PLDs		imp	exp	yes	N/A	no
Poly/ML		imp	imp	no	yes	yes

--------------------------------------------------------------------------------
My Proposal
--------------------------------------------------------------------------------

The system I propose is closest to ML Kit .pm files, and answers the
five design questions in the same way (except of course cutoff
recompilation, which we don't have but the design allows).  It is a
generalization and simplification of .pm files that allows more mixing
of imports and exports and allows renaming of modules without dropping
down into SML files.

The idea is to have a new kind of file, an .mlb (ML Basis) file, that
describes a library or program.  An .mlb file contains a "basis
declaration", defined by the following grammar.

<bdec> ::= <file>.{fun|sig|sml}
         | <file>.mlb
         | functor (<fctid> [= <fctid>])*
         | local <bdec> in <bdec> end
         | signature (<sigid> [= <sigid>])*
         | structure (<strid> [= <strid>])*
         | <bdec> <bdec>

Comments are allowed in (* *).

Conceptually, a basis file is elaborated starting in an empty basis,
and each basis declaration adds to the basis, producing a basis as a
result.  References to SML files cause the file to be elaborated in
the "current" basis.  References to other ML basis files cause the
basis denoted by that ML basis file to be imported.  Functor,
signature, and structure declarations bind a module in the current
basis.  Local declarations are for name hiding.

Here is a more precise definition.  First, The Definition of SML
defines the following:

	E in Env = StrEnv x TyEnv x ValEnv		page 16
	B in Basis = FunEnv x SigEnv x Env		page 29
	B |- <topdec> => B'				page 36

Second, we assume we have a mapping for filesystem contents:

	C: file.{fun|sig|sml} --> <topdec>
	C: file.mlb --> <bdec>

Third, we expand abbreviations:

	functor F is an abbreviation for functor F = F
	similarly for signatures and structures
	functor F = F' ... is an abbreviation for functor F = F' functor ...
	similarly for signatures and structures

We then define the static semantics of basis declarations as the
following relation:

	B |- <bdec> --> B'

	B |- C("file.{fun|sig|sml}") => B'
	----------------------------------
	B |- "file.{fun|sig|sml}" --> B'

	  |- C("file.mlb") --> B'
	-------------------------
	B |- "file.mlb" --> B'

	----------------------------------------------
	B |- functor F = F' --> [F |-> B(F')] in Basis

	------------------------------------------------
	B |- signature S = S' --> [S |-> B(S')] in Basis

	------------------------------------------------
	B |- structure S = S' --> [S |-> B(S')] in Basis

	B |- p1 --> B1   B + B1 |- p2 --> B2
	------------------------------------
	   B |- local p1 in p2 end --> B2

	B |- p1 --> B1   B + B1 |- p2 --> B2
	------------------------------------
	      B |- p1 p2 --> B1 + B2

The rule for SML files is to elaborate the topdec from the file in the
current basis.  If a file is listed multiple times, it will be
elaborated multiple times (with duplicate code).  The rule for an SML
basis (mlb) file is to elaborate the bdec from the file in the empty
basis.  That is, all mlb files are self contained.  Functors,
signatures, and structures add a binding to the basis.  Local and
sequence basis declarations are analogous to the module and core
languages.

--------------------------------------------------------------------------------
Observations and Examples
--------------------------------------------------------------------------------

----------------------------------------
Relative paths
----------------------------------------
One omission in the above semantics is that pathnames in basis
declarations can be relative or absolute.  Relative path names (as
with .cm and .pm) are relative to the directory containing the .mlb
file.

----------------------------------------
Sharing and side effects
----------------------------------------
Since .mlb files are elaborated in the empty basis, they only need to
be elaborated once.  The intended semantics, not covered by the above
rules is that the results of .mlb elaboration are cached.  Thus any
effects are not duplicated if the .mlb file is referred to multiple
times.  This is different from SML files, which are elaborated (and
duplicated) each time they are referred to.

----------------------------------------
List of files
----------------------------------------
The simplest kind of .mlb file is simply a list of files.  This means
that we can easily handle everything we do now and there is a very low
barrier to entry to using .mlb files.

----------------------------------------
Export filters
----------------------------------------
Suppose you only want to export certain functors, signatures, and
structures from a collection of files, a la CM export filters.  Here
is how that looks.

local
   file1.sml
   ...
   filen.sml
in
   (* export filter here *)
   functor F
   structure S
   ...
end

The abbreviation for module bindings makes "basis signatures"
unnecessary.  One simply defines the basis one wants in a very concise
way.

----------------------------------------
Export filters with renaming
----------------------------------------
Suppose you want an export filter, but want to rename one of the
modules.  Both .cm and .pm files require you to drop down into an SML
file to rename at the module level.  But with .mlb files it is easy.

local
   file1.sml
   ...
   filen.sml
in
   (* exports filter here *)
   functor F
   structure S' = S
   ...
end

----------------------------------------
Import filters
----------------------------------------

Suppose you only want to import functor F from group1 and functor G
from group2.  That's easy:

local
  group1.mlb
in
  functor F
end
local
  group2.mlb
in
  functor G
end

CM requires "administrative groups", i.e. extra .cm files, to do
this, and .pm files require similar machinations.

----------------------------------------
Import filters with renaming
----------------------------------------
Suppose you want to use a structure S from group1 and a structure S
from group2.  That's easy:

local
  group1.mlb
in
  structure S1 = S
end
local
  group2.mlb
in
  structure S2 = s
end

CM and PM would require the creation of extra SML and group files to
do this.

----------------------------------------
Types and values are exported too
----------------------------------------
Unlike CM group files, .mlb files elaborate to full bases including
toplevel types and values, not just functors, signatures, and
structures.  This means that they can solve the "type int = Int.t"
problem that we currently have using CM and MLton, and can be used to
build replacements for the basis library.

----------------------------------------
Renaming types and values
----------------------------------------
Although types and values are part of the meaning of a .mlb file, I
did not provide mechanisms to rename them at the .mlb level.  This
would certainly be possible, but the added complexity (what about
renaming datatypes, what about the tyvars in type functions, what
about exceptions, what about constructor status) given the rarity of
need didn't seem worth worrying about.  If it becomes clear that it
would be nice to have, we can add it.

----------------------------------------
Explicit file ordering
----------------------------------------
Because files in a .mlb file are in order, we avoid the
indeterminateness in CM that leaves the order of effects unspecified.
We also avoid the complexities of dependency analysis to figure out
the order and the restrictions that CM imposes.

----------------------------------------
Dependency analysis
----------------------------------------

An .mlb file does precisely specify what definitions connect to what
uses.  Thus it is easy to build a module dependency graph while
elaborating.  This graph could be used by a compiler to do cutoff
recompilation, similar to how the ML Kit uses .pm files.

----------------------------------------
Dropping unused files
----------------------------------------
An important thing that CM libraries give you is that unreferenced
modules are dropped from the program.  We could do the same thing
after dependency analysis.  The only tricky part is determining the
"roots" of the dependency tree.  I.E. what modules do we initially
force to be kept, hence requiring all the modules that they depend on
(recursively) to be kept.

----------------------------------------
Constructing the Basis Library
----------------------------------------
.mlb files are sufficient to describe how we currently construct the
basis, with one modest extension.  As the basis-library README
describes, the way we construct the basis is to prefix the program
with 

	local
	  <concatenate files in libs/build>
	in
	  <concatenate files in libs/basis-*/bind>
	end

Well, now we can make that explicit.  First, create
basis-library/build.mlb with the contents of basis-library/build.

	misc/primitive.sml
	posix/primitive.sml
	...

Then, for each basis library we want to make available, create a new
.mlb file, e.g. for the 2002 basis create basis-library/2002.mlb with
the following contents:

	local
	   build.mlb
	in
           libs/basis-2002/top-level/overloads.sml
	   ...
           libs/basis-2002/top-level/top-level.sml
	end

The only addition that is necessary is that the above semantics
specifies that an .mlb file is elaborated in the empty basis.  There
needs to be someway to get at primitive types.  The easiest way I can
think of to do that is to add a new keyword "prim" as a special kind
of basis dec that elaborates to the primitive basis.

----------------------------------------
Basis suffixes
----------------------------------------
I'm not entirely sure what to do with the basis library suffix that
handles cleanly exiting the program.  One idea would be to make it
more intimately tied with the compiler, like we do for the toplevel
handler.

----------------------------------------
Accessing the basis library
----------------------------------------
User programs can easily access the basis library of their choice,
possibly even using different versions of the basis at different parts
of their program.  All we need to do is to deliver the basis sources
with the installation of mlton, and for the user to include the
following in any .mlb file where they want the basis available.

/usr/lib/mlton/basis-library/2002.mlb

----------------------------------------
Environment variables
----------------------------------------
Using a hardwired absolute path like /usr/lib/mlton is bad.  So, I
propose to allow environment variables to appear in path names in the
form $(VAR).  MLton will ensure that the following variable is
always set correctly:

SML_LIB = /usr/lib/mlton/basis-library

So, the user can access the basis library by
$(SML_BASIS)/2002.mlb

----------------------------------------
Conditional compilation
----------------------------------------
This proposal does not support conditional compilation, but it is easy
to imagine adding an if-then-else bdec.

----------------------------------------
Other missing features
----------------------------------------
The only mechanism for naming a basis is to put the bdecs in a file
and use the filename as the name.  Perhaps there should be a bdec for
doing so (and other bdecs and bexps for manipulating these).

The only mechanism for starting in a clean environment is to create a
new .mlb file.  Perhaps there should be a bdec "clean <bdec> end" for
doing so.

--------------------------------------------------------------------------------
References
--------------------------------------------------------------------------------

[1] Programming with Regions in the ML Kit (for Version 4.1.1)
    Modules and Projects, Chapter 15, pages 145-154
    http://www.it.edu/research/mlkit/dist/mlkit-4.1.1.pdf

[2] Mosmake
    http://www.diku.dk/~makholm/mosmake/

[3] Moscow ML Owner's Manual
    Recompilation management using mosmldep and make, Section 9, page 16
    http://www.dina.kvl.dk/~sestoft/mosml/manual.pdf
    mosmldep

[4] The Objective Caml System
    Dependency Generator (ocamldep), Chapter 13
    http://caml.inria.fr/ocaml/htmlman/manual027.html

[5] Chapter 7: The Poly/ML Make System
    http://www.polyml.org/docs/Make7.html#MakeSystem7

[6] Portable library descriptions for Standard ML
    http://people.cs.uchicago.edu/~blume/pgraph/proposal.pdf

[7] CM: The SML/NJ Compilation and Library Manager
    http://smlnj.org/doc/CM/index.html

[8] Heirarchichal Modularity
    http://people.cs.uchicago.edu/~blume/papers/cm-TOPLAS.ps.gz

[9] Dependency Analysis for Standard ML
    http://people.cs.uchicago.edu/~blume/papers/depend.ps.gz

-------------------------------------------------------
This SF.Net email sponsored by: Parasoft
Error proof Web apps, automate testing & more.
Download & eval WebKing and get a free book.
www.parasoft.com/bulletproofapps
_______________________________________________
MLton-devel mailing list
MLton-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlton-devel