[MLton] mlb support

Tue, 22 Jun 2004 21:28:48 -0400 (EDT)

I've been working a little on supporting mlb files, as they were last
discussed (http://www.mlton.org/pipermail/mlton/2004-March/015645.html).
I've gotten the two "big pieces" working -- lexer/parser and elaborator.
The lexer/parser is interesting; I wanted to preserve the situation that
Henry reported -- i.e., all lexing/parsing errors before any elaboration
error.  So, there are a few games in getting the MLB parser to recursively
invoke itself, and dealing with proper handling of relative file names.
The elaborator is very straight-forward.  Interestingly, the elaboration
of an MLB file yields no decs -- only the sub-elaboration of a .sml file
yields decs.

I'm now trying to figure out how to deal with the various MLton options.
I like the idea of making the Basis Library just another .mlb to be
included in a project; from a compilation standpoint, this isn't a problem
and I've been able to do just that.  However, we often treat the Basis
Library specially for various reasons, and now it just looks like more
user code.  So, I've identified a few issues and am looking for the best
way to deal with them.

1) With symbolic links, one can have multiple paths to the same file.
Should an .mlb file that is included through different paths be treated as
the same .mlb (i.e., elaborated exactly once)?  For now, I've gone with
the simpler solution and not attempted to determine if two paths really
correspond to the same file.

2) Dead code pass.  This is a prime example of where we treat the Basis
Library specially.  I believe that we have hinted at wanting to allow
other libraries to be handled by the dead code pass.  Here's a proposed
solution:

Change the grammer for basdec as follows:

basdec ::= ...
         | file.mlb
         | !(lib) file.mlb
         | !(user) file.mlb

The idea is to allow the inclusion of a .mlb file to be annotated for
special treatment.  All declarations that are included under a !(lib)
annotation can be dropped by the dead code pass.  All declarations that
are included under a !(user) annotation must be kept.  An un-annotated
.mlb file inherits the annotation of it's parent.  The .mlb file passed on
the command line is implicitly !(user) annotated.  An .mlb file that is
annotated with both !(lib) and !(user) is treated as being annotated with
!(user).

Now, to keep syntactic clutter in user .mlb files to a minimum, we would
export the Basis Library as follows:

basis-2002.mlb:
  !(lib) basis-2002-proxy.mlb

basis-2002-proxy.mlb:
  local
    ../build.mlb
  in
    top-level/infixes.sml
    top-level/basis-funs.sml
    top-level/basis-sigs.sml
    top-level/top-level.sml
    top-level/overloads.sml
  end

Of course, this violates Stephen's mantra of being able to do everything
without extra/proxy files.  I don't know that being able to annotate
arbitrary basdecs is necessarily better.

3) Lookup constants.  I can't use lookupConstantError, because both
basis-library and user code are elaborated within the same .mlb.  Again,
I suggest annotations as a way of turning on constant lookup within the
basis and keeping it off within user code.  (Something similar might apply
to the rebinding of equals.)

4) Backwards compatibility.  How should we deal with no user file, a .sml
file, and a .cm file.  My suggestion is the following: each of these
situations induces a .mlb file to be compiled.  I'm using basis-2002, but
the actual choice would depend on the -basis flag:

<no file>  ==>  $(SML_BASIS)/basis-2002.mlb
f.sml      ==>  local $(SML_BASIS)/basis-2002.mlb in f.sml end
f.cm       ==>  local $(SML_BASIS)/basis-2002.mlb in f1.sml ... fn.sml end
                (* where f1.sml ... fn.sml are the files induced *)

5) -show-basis <file>
It seems that this simply prints the basis (as in, top-level bindings)
that are in scope.  So, that is a notion that is well-defined under the
.mlb model.  However, the way it currently works, is that if there is no
user file, then the Basis Library basis is printed to the file; if there
is a user file, then the basis at the end of elaborating the user file is
printed minus the bindings from the Basis Library.  So, each of these
situations would be preserved by the "encoding" above.  Note that the
end-user will get very different results for

z.mlb:
  $(SML_BASIS)/basis-2002.mlb
  z.sml

a.mlb:
  local
    $(SML_BASIS)/basis-2002.mlb
  in
    a.sml
  end

I actually think this is a good thing, because it indicates very different
situations if they further include z.mlb or a.mlb in another mlb file.

6) -show-basis-used <file>
I don't know how to support this for backwards compatibility or what it
should do for a .mlb file.

7) -warn-unused
There's no problem with tracking use information when elaborating mlb
files.  Again, it's a question of what we intend it to mean.  Note that in
the a.mlb example abov, a large portion of the Basis Library is considered
unused, because they are _not_ in scope in the final basis of the
elaborated program.  This is in contrast with the z.mlb example, which
leaves all of the Basis Library in the final basis.  However, a.mlb seems
to be the more principled way to write mlb files.

8) -show-def-use
Similar to (7), the infrastructure works fine.  However, without a clear
distinction between Basis Library and user code, I never call
Env.clearDefUses after elaborating the Basis Library.  Hence, I always get
def-use information for the _whole_ program, Basis Library and user.

9) Empty programs
There is the question of what to do in completely empty programs.  If the
user doesn't include a Basis Library, then no top-level suffix and
top-level handler are installed.  Given the way in which to-level items
are incorporated into the program, there is no seg-fault, just program
termination with a "MLton bug: " error message.

Furthermore, the empty basis is really the completely empty basis -- not
even any of the primitive datatypes.  (These are added with a  _prim
basdec.)  An empty program compiled without the primitive datatypes
terminated with  "true has no mono property" in the mono pass.  I suspect
that other passes would also break.

We could change the elaboration rules for mlbs to make the "empty" basis
correspond to the primitive basis.  This wouldn't alter any of the
arguments that an mlb need only be elaborated once.  Another option would
be to _always_ include the decs corresponding to the primitive basis, but
still require the _prim basdec to add the bindings to a basis.  (This
would correspond to elaborating  local _prim in end  as a prefix to the
user's .mlb file.  The _prim basdec is "cached" in the same way as .mlb
files.)

Not sure what to do about the top-level things.