[MLton] MLton library project licensing

Sun Oct 8 12:11:10 PDT 2006

Quoting Stephen Weeks <sweeks at sweeks.com>:
[...]
> Great.  We never did settle on a place.  I think that it would be best
> to create a new toplevel project in the repository, parallel to the
> mlton project.  I proposed
> 
>   svn://mlton.org/mltonlib
> 
> Since there were no objections, I've gone ahead and created it, with
> the standard {branches,tags,trunk} subdirectories.

I have no objections regarding the place, but I think that the standard
repository layout is not ideal for a collection of libraries.  (Thanks
to SVN this shouldn't matter as we can reorganize the repository.)  Below
are some thoughts on flexible library development.

About requirements
------------------

If there is one thing that I've learned over the years about designing
libraries is that you can never get the interface just right the *first*
time.  You can only really begin to understand how good the interface of a
library is after you've seen a lot of code written by *others* that uses
your library.  Others often use and misuse your library in ways that you
simply couldn't imagine by yourself.  Others also often run into problems
that you didn't.

Some of the issues found by others indicate weaknesses in the library
interface and some in the library documentation.  Some issues may be best
solved within the library - perhaps by revising the library or
documentation.  Some issues may be best solved by writing an auxiliary
library.  All this really boils down to this: as a library author, you
want to be able to add value to your library by revising it by changing
the interface of the library in backwards incompatible ways.  Otherwise
your library will gradually lose value until it becomes obsolete and is
replaced by another library.

However, you just can't keep constantly changing the interface of your
library.  If you do, your users will quickly become annoyed by the need to
continuously upgrade their code.  You want to do be able to revise your
library *without imposing* a tight schedule upon the users of your library
to upgrade to the revised interface.

Library users would generally like to be able to use entirely new
libraries as they become available.  However, at the same, they also want
to decide *when* to upgrade their code to use a new revision of a
library.  The schedules of library users and authors aren't likely to be
neatly synchronized.  Thus, if you are distributing a collection of
libraries, you shouldn't tie the launch of a new library to the revision
of another library.  That would impose an upgrade burden upon the users of
the revised library should they want to use the newly added library.

In summary, the two most important requirements for a library architecture
are that library authors need to be able to revise their libraries, but at
the same time, library users need to be able to decide when to upgrade
their code to use the revised library.  If something in the library
architecture conflicts with this, such as tying the release of new
libraries to the revision of old libraries, then there will be pain.

Per repository branching
------------------------

Probably the most common approach to version control is to do repository
wide branching.  Under SVN, the repository would be organized using the
"standard layout" and the trunk would look roughly like this:

  trunk/
    a/ ... lib a files ...
    b/ ... lib b files ...

The treatment of the trunk varies from approach to approach.  In one
approach the trunk is treated as a sort of "stable" branch that could, in
principle, be released at any point.  New features are developed in
"feature branches" before being integrated into the trunk.  In another
approach, new features are developed in the trunk and a "release branch"
is created for each release effectively selecting the set of new features
to be stabilized for the release.  At any rate, the branches duplicate the
structure of the trunk

  branches/
    b1/a/ ... lib a files ...
       b/ ... lib b files ...
    b2/a/ ... lib a files ...
       b/ ... lib b files ...
    ... more branches ...

and some merging is performed between branches.  In particular, changes
(either new features or bug fixes) are usually merged from branches to the
trunk.

As suggested above, a repository rarely contains just one library.  A big
library usually consists of many smaller libraries.  In a collaborative
library project each library has its own set of maintainers with their own
schedules.  Libraries are often developed in parallel.  This kind of
development is poorly supported by repository wide branching.

One problem is that the release schedules of all libraries are tied
together.  This happens whether stable features are merged to the trunk or
new features are stabilized in release branches.  Suppose you are already
using library "a" when a new release of the library collection is made
containing a new library "b".  Suppose further that the release also contains
a new backwards incompatible revision of library "a".  Now, in order to use
library "b" conveniently, you are effectively forced to upgrade to the
revised library "a".  While one can imagine working around this and
effectively using two releases of the library collection, it becomes less
and less attractive as the number of libraries increases.  Per repository
branching is fundamentally all or nothing.

As another example, consider a scenario where two libraries, "a" and "b",
are being developed.  Suppose that the developers of those libraries
decide to revise their libraries and create separate branches for
development (following the "stable trunk" model).  Let's further assume
that the implementation of "b" uses "a".  Now, suppose then that the
revision of library "a" is finished earlier and merged into the trunk.
First of all, this means that someone must have already upgraded the
revision of "b" in the "a" development branch to use the revised "a"
(otherwise the trunk would be broken).  Second, the developer of library
"b" is now in a bind.  He can't just directly merge the new revision of
"b" to the trunk as the revisions of "a" in the development branch and the
trunk differ.  The best course of action is to merge the changes to "a"
from trunk to the development branch of "b" so that the library "b" can be
tested with the revised "a" (maintaining the stability of the trunk).
Similar problems occur with other repository wide branching schemes.

The need to perform this kind of extra merging and upgrading is quite
unsatisfactory.  Both introduce costly synchronization into an otherwise
naturally parallel development process.  The less merging you need to do
to get your job done (without disturbing the work of others, of course) the
better.  Also, as discussed earlier, users of a library shouldn't be forced
to upgrade to a new library revision immediately.  In the above scenario,
the best person to upgrade library "b" to use the new revision of library
"a" is likely to be the developer of "b".  Neither forcing the developer of
"b" to do the upgrade on "a"'s schedule nor having the developer of "a" do
the upgrade is ideal.  It would be best to let the developer of "b" do the
upgrade when it fits into his schedule, but that is unattainable with
repository wide branching schemes.

Transparent per library branching
=================================

Less commonly known and used, probably mostly due to the limitations of
early version control systems, but superior to per repository brancing is
"transparent per library branching".  Under SVN, one wouldn't use the
standard layout for the repository, but would rather use a form of the
standard layout for each library.  The repository layout would look
roughly like this:

  a/
    r1/ ... library a files ...
    r2/ ... library a files ...
    ...
  b/
    r1/ ... library b files ...
    r2/ ... library b files ...
    ...

Each library has a subdirectory under which the supported revisions (and
temporary development branches) of the library are stored.  (The idea here
is that a new "revision" is started when incompatible interface changes are
needed.)  This looks similar to having a repository per library, but there
are a couple of important differences.  The main difference is that the
library revisions are supposed to be used as they are.  They are not supposed
to be "installed" (or copied) to another kind of directory structure where
library revisioning would not be reflected in the directory structure.  Instead,
the revisioning (or branching) is made visible to the users and they can
directly refer to any of the supported revisions of a library.  Releases and
snapshots of the library collection are supposed to include all the supported
revisions of all libraries.

In practise, this means that each library user uses some technique to
select the revisions of libraries used to build the user's application.
One way to do this is to specify the revisions in a build configuration
file.  Using the MLB system, one could use a set of path variables

  LIB_A $(LIB_COLLECTION)/a/r1/lib.mlb
  LIB_B $(LIB_COLLECTION)/b/r2/lib.mlb

and use those variables to refer to libraries

  local
     $(LIB_A)
  in
     some-of-my-code.sml
  end
  local
     $(LIB_B)
  in
     some-more-my-code.sml
  end

The point is that when a new revision of a library is created, the user
doesn't need to do anything.  Each revision of a library stays at the same
place as long as it is supported.  To upgrade to a new revision of a
library, the user will simply change the build configuration file to refer
to the new revision and then tweak the application code to make it work
with the new revision.  The latter step of tweaking the code may require a
non-trivial amount of work and it is best to allow the user to schedule
such work as flexibly as possible.

Let's reconsider the scenarios of the previus section under transparent
per library branching.  In the first scenario a new library is added to
the repository as well as a new revision of a previously existing library.
This requires no immediate action from the user.  To use the new library,
all the user needs to is to refer to the newly added library.  The old
revision of the existing library will still be there (or, at least, will
be there long enough for most users to upgrade without any kind of hurry).
The user can then upgrade to the new revision at his leisure.

In the second scenario, new revisions of two libraries, "a" and "b", are
being cosntructed concurrently and the new revision of "a" is stabilized
first.  This just means that the new library revision of "a", say "r2", is
pronounced stable.  Pronouncing a library revision "stable" means that no
further incompatible interface changes will be made to the revision.  No
immediate actions are required from anyone.  The earlier revision of "a",
say "r1", will still be accessible.  The new revision of "b" can still
refer to the old revision of "a".  The new "b" revision can even be
pronounced stable without upgrading it to use the new "a" revision.  The
implementation of the new "b" revision can then be upgraded to use the new
"a" revision.  One could (in most cases) even upgrade the old "b" revision
to use the new "a" revision and perhaps reimplement the old "a" revision
in terms of the new "a" revision and so on.  All this can be done without
requiring any actions from the users of the libraries.

BTW, I didn't invent the idea of transparent per library branching while
writing this.  I developed a variation of the technique while working at
Housemarque and we used it there for years (at least until I left).  It worked
as intended.  It basically meant that people were empowered to improve the
libraries.  You could develop a new revision of a library without immediately
interfering with the work of anyone else.  In practise, we had about 1-3
revisions of a library.  After a couple of revisions you tend to have a
pretty good picture of how the interface should be.

--Vesa Karvonen