[MLton] mlb files and the ML Kit

Sun, 14 Mar 2004 10:28:32 -0800

> My motivation is that an mlb-language with support for basis bindings
> lets me (as a programmer) specify exactly what a program unit should
> depend on - to any degree I want!

That makes sense.

> I could also write the my2.mlb-file myself, if I want to be explicit
> about dependencies.  Without basis bindings, new mlb-files are needed
> to do such things.
>
> The tool could be useful, for example, to extract parts of a program
> for use in another software project. More importantly ;) the language
> allows me to limit the number of bases the ML Kit needs to deserialize
> when forming the elaboration and compilation bases for elaborating and
> compiling a program unit.

These all sound like good reasons.  I am becoming more convinced.

> We could also add macros (with call-by-name semantics), although I
> think that would be overkill:

I agree it is overkill.

> The syntax I'm using right now in the implementation is the
> following:
> 
>    <bexp> ::= <bexp> <bexp>
>             | let <bdec> in <bexp> end
>             | file.sml
>             | file.sig
>             | ( <bexp> )
>             | ()
>             | bid
> 
>    <bdec> ::= <bdec> <bdec>
>             | <empty>
>             | local <bdec> in <bdec> end
>             | bas bid = <bexp>
>             | open <bexp>
>             | open file.mlb

Looking at this and at the proposal I sent a day earlier

>    <bdec> ::= clean <bdec> end
>             | functor <fctid> = <fctid>
>             | local <bdec> in <bdec>
>             | open <bid>
>             | prog <program> end
>             | <bdec> <bdec>
>
>    <bmdec> ::= basis <bid> = bas <bdec> end
>              | <bdec>
>              | <bmdec> <bmdec>

I am getting a feeling of incompleteness with both that is leading me
back to (a variant of) my original mlb language from last summer.  It
seems to me that there are three kinds of bases we could consider.

	Basis0 = FunEnv x SigEnv x Env
	Basis1 = (Bid -> Basis0) x Basis0
	Basis2 = (Bid -> Basis2) x Basis0

An mlb language must decide which kind of basis a basis identifier
denotes and which kind of basis an mlb file denotes.  In my original
proposal from last summer

> <bdec> ::= <file>.sml
>          | <file>.mlb
>          | local <bdec> in <bdec> end
>          | <bdec> <bdec>

there were no basis identifiers, and mlb files denoted elements of
Basis0.  In my extension, which added basis identifiers, both mlb
files and basis identifiers denoted elements of Basis2.  In the
proposals that Martin and I recently sent, an mlb file denotes an
element of Basis1 and a basis identifier denotes an element of Basis0.

The problem with the proposals that Martin and I sent, as I see it, is
that as soon as you choose different sets for the two denotations then
you are faced with the argument that you can achieve a different level
of expressiveness by introducing an mlb file than can be achieved in
the mlb language.  And that violates a principle we are trying to
meet.  So, as soon as the denotation of mlb files includes maps from
basis identifiers to bases, we are led to conclude that the meaning of
basis identifiers should include those maps as well.  And that means
that basis expressions should include them.  And then, since one can
write "basis <bid> = <bexp>" in an mlb file, we are led to conclude
that the denotation of mlb files includes two-level basis maps.  And
so on.

The only way I see to achieve equality between the expressiveness of
mlb files and basis expressions is to allow arbitrarily deep bases as
their denotation.

	Basis2 = (Bid -> Basis2) x Basis0

This is completely analogous with the denotation of structures in SML	

	Env = (Strid -> Env) x TyEnv x ValEnv

Strictly keeping the analogy with SML structures, we have the
following language.

<bdec> ::= basis <bid> = <bexp>
         | local <bdec> in <bdec> end
	 | open <bid>*
         | <bdec> <bdec>
         | <empty>
<bexp> ::= <bid>
         | bas <bdec> end
         | let <bdec> in <bexp> end

Now, where to add .mlb and .sml files to the grammar?  Since an mlb
file denotes an element of Basis2, we might expect that it would
contain a single <bexp> and it should be allowed to refer to an mlb
file in the grammar for <bexp>'s.  However, for syntactic convenience,
I think it is easier to go the other way.  I would like the contents
of an mlb file to be a <bdec> and to refer to mlb files directly as a
<bdec>.

<bdec> ::= ... | file.mlb

mlb files still denote elements of basis2.  This syntactic convenience
is simply the implicit wrapping of a "bas ... end" around the contents
of an mlb file and the implicit prefixing of an "open" wherever an mlb
file is referenced.  

For similar reasons, I would like to include sml files as <bdec>'s.

<bdec> ::= ... | file.sml

The net result is that the common case of a list of sml and mlb files
is itself valid contents of an mlb file, requiring no additional
syntax.

With the above syntax, Martin's first two examples are written as

   my.mlb
     open A.sml B.sml C.sml
   becomes
     A.sml B.sml C.sml

   my2.mlb
     bas A = A.sml
     bas B = let open A in B.sml end
     bas C = let open A in C.sml end
     open A B C
   becomes
     bas A = bas A.sml end
     bas B = let open A in bas B.sml end end
     bas C = let open A in bas C.sml end end
     open A B C

Here's a static semantics for the language, including the handling of
caching of mlb files and elaboration of sml files.

    b in Bid
    d in Bdec
    e in Bexp
    C in (File.sml -> Topdec) x (File.mlb -> Bdec)
    B in Basis = (Bid -> Basis) x FunEnv x SigEnv x Env
    F in (File.mlb -> Basis + {NONE})

    Judgement: B, C, F |- e --> B', F'

    ------------------------
    B, C, F |- b --> B(b), F

    B, C, F |- d --> B', F'
    -------------------------------
    B, C, F |- bas d end --> B', F'

    B, C, F |- d --> B', F'  B + B', C, F' |- e --> B'', F''
    --------------------------------------------------------
    B, C, F |- let d in e end --> B'', F''

    Judgement: B, C, F |- d --> B', F'

    B |- C(file.sml) => B'
    -----------------------------
    B, C, F |- file.sml --> B', F

    F(file.mlb) = B' in Basis
    -----------------------------
    B, C, F |- file.mlb --> B', F

    F(file.mlb) = NONE    {}, C, F |- C(file.mlb) --> B', F'
    --------------------------------------------------------
    B, C, F |- file.mlb --> B', F'[file.mlb -> B']

    B, C, F |- e --> B', F'
    -----------------------------------------
    B, C, F |- basis b = e --> [b |-> B'], F'

    B, C, F |- d1 --> B', F'  B + B', C, F' |- d2 --> B'', F''
    ----------------------------------------------------------
    B, C, F |- local d1 in d2 end --> B'', F''

    ----------------------------------------------------
    B, C, F |- open b1 ... bn --> B(b1) + ... + B(bn), F

    B, C, F |- d1 --> B', F'  B + B', C, F' |- d2 --> B'', F''
    ----------------------------------------------------------
    B, C, F |- d1 d2 --> B' + B'', F''

    ----------------------
    B, C, F |-   --> {}, F

> BTW: should it be allowed to specify a source file twice in an
> mlb-file? Or in an entire project?

It seems OK to me.  The meaning I proposed for that a while back was
to duplicate (and re-elaborate) the file contents at each reference.
I've kept that meaning in the semantics above.

> Also, would it be bad to require mlb-file names to be unique

Yes, that would be bad :-).

> - or should we require only absolute paths to mlb-files to be
> unique.

I'm not sure what this means.  We will certainly traverse symbolic
links.  How could absolute paths not be unique?

> To obtain a simple mechanism for generating unique machine code
> labels (and new type names, for that matter), it would be great if
> the concatenation of an mlb-file name and a source file name
> uniquely determines the source file.

I think with the above semantics it will uniquely determine an
instance of the source file, which I would think is enough.