[MLton] straw-man packaging proposal

Stephen Weeks MLton@mlton.org
Tue, 23 Aug 2005 17:09:13 -0700


I've been thinking a little about a packaging system for MLton using
MLBs and thought I'd throw out a straw man to see if we can make some
progress.  Perhaps this will also shed some light on MLB path
variables.

As I see it, the main point of a packaging system for MLton is to
allow separate development and delivery of libraries.  As soon as one
does this, there will be multiple versions of a library extant, and so
the packaging system must allow developers to express dependencies
between packages.  My proposal focuses on this aspect.

Intuitively, think of a package as a collection of code (SML + MLBs
files), supporting files (documentation, ...), and auxiliary
information to support dependency checking.  All that is really
relevant from the point of view of dependecies is the MLB files, so I
won't mention SML code or supporting files further.  More precisely, a
package consists of 

  * a unique id
  * a name
  * exports
  * imports

The unique id is globally unique across all packages (think hash of
contents).  The name specifies how the pacakge is referenced by other
packages, and is not globally unique, since different versions of the
same package will have the same name.  I propose to use as a package
name an MLB path variable together with a relative path.  So, for
example, the following are package names

  $(SML_LIB)/basis/
  $(SML_LIB)/smlnj-lib/
  $(SML_LIB)/ckit-lib/

Once MLB path variables are mapped to absolute file system paths, a
package name also specifies a location in the filesystem.

The exports of a package are the MLBs that the package intends to be
referenced by other packages.  So, each export is a relative MLB file
name.  For example, for $(SML_LIB)/basis/, the exports are

  basis-1997.mlb
  basis.mlb
  mlton.mlb
  sml-nj.mlb
  unsafe.mlb
  infixes.mlb
  pervasive.mlb
  equal.mlb
  overloads.mlb
  pervasive-exns.mlb
  pervasive-types.mlb
  pervasive-vals.mlb

It will be common that exports live in the toplevel of the package,
but there seems to be no reason to enforce this.

The imports of a package are the MLBs that the package needs from other
packages.  To name an import, we need the name of the package to
import from, plus the name of the MLB being exported.  So, to import
the standard basis, we use

  $(SML_LIB)/basis/ + basis.mlb

Now, for the tricky part.  How to handle package versions?  I propose
to keep separate (conceptually) from a package assertions about
the "meaning" of the exports of the package.  Of course, if the
package has imports, then the meaning of its exports will depend on
the meaning of those, so assertions will refer to imports too.  Now, I
don't propose any kind of automated interpretation of the semantics of
code.  Rather, all the semantics of "meanings" lives at the packaging
level, and is governed by assertions written by humans.

For example, an assertion might say

  export <a/b/c.mlb> 
  of package <UID012404932132> 
  has meaning <MY_MEANING>
  provided that
     import <$(SML_LIB)/basis/basis.mlb> has meaning <M_BASIS_20050901>
     import <$(SML_LIB)/smlnj-lib/smlnj-lib.sml> has meaning <M_NJ_100>

Again, meanings are just tokens (like version numbers).  We might
impose some rules so that one can do version-like things (ordering,
ranges), but lets leave that for when things become more concrete.

When one actually wants to include in a program a package with a
particular meaning, one must have an assertion on hand that says that
the package has that meaning, and the program must also therefore
include the imports of the package with the meanings required to
justify the assertion.  It's a pretty straightforward computation to
transitively include the needed imports.  I would like to keep
separate from the discussion of the semantics of packages the
mechanisms for finding and downloading packages.

The advantage of keeping assertions separate is that one can use the
same package source code with different imports, sometimes to get the
same meaning, sometimes to get different meanings.  This makes it easy
(and cheap) to express the fact that the meaning of a package has
changed, not because its source code has changed, but because its
imports have changed.

That's pretty much it.  An installation of the packaging system
consists of a set of packages and a set of assertions.  Those justify
the use of certain packages at certain meanings in a program.


Here's how I view packaging interacting with MLB path variables.
Right now, a reference to an MLB-rooted path, like
$(SML_LIB)/smlnj-lib/smlnj-lib.mlb, refers to a specific file in the
filesystem, determined by expanding the MLB path variable SML_LIB.
With the packaging system, elaboration of all MLBs will take place in
a context, determined by an assertion, that specifies the package IDs
of any imports that MLB elaboration will need.  So, when elaboration
reaches

  $(SML_LIB)/smlnj-lib/smlnj-lib.mlb

the context will specify that the import can be met by package ID

  <UID129834129083>

and furthermore that while elaborating that package, the context (for
its imports) should be <C>.

So, MLB path variables continue to look like they do now.  But they no
longer refer to paths in the filesystem.  Instead, they serve as
"roots" of a heirarchy of package names.  Absolute references to MLB
files (i.e. references rooted in some MLB path variable) no longer
refer directly to MLB files in the file system -- the actual MLB files
they refer to are only determined by an assertion context during
elaboration.


Below is a more precise spec of the system.  I've coded up the
semantics of assertions via a fixed point that starts with all the
packages that have no imports, an iteratively builds from them more
and more packages (with more and more meanings) using all avaiable
assertions, until no more meanings can be found.

----------------------------------------------------------------------

signature S =
   sig
      structure Id:
         sig
            type t (* unique id *)

            val equals: t * t -> bool
         end
      structure Meaning:
         sig
            (* A package meaning is an absraction of the semantics of a
             * package.  As far as the packaging system is concerned, there is
             * no internal structure to package meanings.  They are simply
             * tokens.
             *)
            type t (* unique id *)

            val equals: t * t -> bool
         end
      structure PathVar:
         sig
            type t (* $(XYZ) *)
         end
      structure RelDir:
         sig
            type t (* a path of the form a/b/c/ *)
         end
      structure AbsDir:
         sig
            type t = PathVar.t * RelDir.t

            val equals: t * t -> bool
         end
      structure RelMLB:
         sig
            type t (* a path of the form a/b/c.mlb *)

            val equals: t * t -> bool
         end
      structure Export:
         sig
            type t = RelMLB.t

            val equals: t * t -> bool
         end
      structure Import:
         sig
            type t = AbsDir.t * Export.t
         end
      structure Package:
         sig
            (* A package has a unique id, a name, and a collection of imports
             * and exports
             *)
            type t

            val exports: t -> Export.t list
            val id: t -> Id.t
            val imports: t -> Import.t list
            val name: t -> AbsDir.t
         end
      structure Assertion:
         sig
            (* A package assertion gives information about a package export.
             * It says what the meaning of the export is, given the meanings of
             * the package's imports.  This is a partial function because under
             * some (many) collections of import meanings, the meaning of the
             * export will be unknown.
             *)
            type t

            val about: t -> Id.t * Export.t
            val meaning: t * (Import.t * Meaning.t) list -> Meaning.t option
         end
      structure Installation:
         sig
            (* An installation consists of a set of packages and a set of
             * assertions about the exports fo those packages.
             *)
            type t

            val assertions: t -> Assertion.t list
            val packages: t -> Package.t list
         end
   end

functor F (S: S) =
struct

open S

structure List =
   struct
      open List

      fun exists (l, f) = List.exists f l
      fun has (l, x, equals) = exists (l, fn x' => equals (x, x'))
      fun map (l, f) = List.map f l
      fun fold (l, a, f) = List.foldl f a l
      fun peek (l, f) = List.find f l
   end

structure Installation =
   struct
      open Installation

      val allMeanings: t -> (Package.t * Export.t * Meaning.t) list =
         fn i =>
         let
            fun loop pems =
               let
                  val gotNew = ref false
                  val pems =
                     List.fold
                     (packages i, pems, fn (p, pems) =>
                      List.fold
                      (assertions i, pems, fn (a, pems) =>
                       if not (Id.equals (Package.id p,
                                          #1 (Assertion.about a))) then
                          pems
                       else
                          let
                             val (_, e) = Assertion.about a
                             fun try (is: Import.t list,
                                      ims: (Import.t * Meaning.t) list,
                                      pems) =
                                case is of
                                   [] =>
                                      (case Assertion.meaning (a, rev ims) of
                                          NONE => pems
                                        | SOME m =>
                                             if List.exists
                                                (pems, fn (p', e', m') =>
                                                 Id.equals (Package.id p,
                                                            Package.id p')
                                                 andalso Export.equals (e, e')
                                                 andalso Meaning.equals (m, m'))
                                                then pems
                                             else (gotNew := true
                                                   ; (p, e, m) :: pems))
                                 | (i as (dir, e)) :: is =>
                                      List.fold
                                      (pems, pems, fn ((p', e', m'), pms) =>
                                       if AbsDir.equals (dir,
                                                         Package.name p')
                                          andalso Export.equals (e, e') then
                                          try (is, (i, m') :: ims, pems)
                                       else pems)
                          in
                             try (Package.imports p, [], pems)
                          end))
               in
                  if !gotNew then loop pems else pems
               end
         in
            loop []
         end
   end

end