[MLton] mlb files and the ML Kit

Stephen Weeks MLton@mlton.org
Thu, 11 Mar 2004 18:23:00 -0800


> I'm just referring to the problem of identifying identifier status of
> value identifiers in patterns. Checking for recompilation upon
> modification of source code requires tracking of identifier status of
> identifiers being bound in patterns.

Thanks for the explanation.  That makes sense.

> In my Ph.D. thesis, the concept of elaboration dependence was
> developed for a subset of Standard ML, including a simple form of
> datatypes. See [1, Chapter 4] for some formal properties.

This notion makes sense as well.

> Notice also that your -show-def-use flag for MLton does not say that
> B.sml depends on the value identifier a having identifier status
> 'v'...

True.  

My thinking on what we will do in MLton is that we will not solve the
problem of determining what to re-elaborate when a source file
changes.  If we think of a program as a list of source files, then
whenever a file is changed, we can simply re-elaborate all files after
it in the list.  That is surely sufficient :-).  Because elaboration
in MLton is quite fast, it should be OK from a development environment
perspective as well.  We do plan to provide one feature, the ability
to save the result of elaborating a prefix of a program, which should
give any speedup needed.

I understand that this approach is not cut-off incremental
recompilation as in the Kit, but it is simpler and handles the cases
we care about.  In fact, it seems like you might be able to use the
same approach in the Kit, sacrificing cut-off incremental
recompilation for elaboration only, but using the results of the
elaboration to do cut-off incremental recompilation for the rest of
the compiler passes.

> I have decided to implement something like the mlb stuff for use with
> the ML Kit. I have now got the serialization of elaboration and
> compiler bases into place (see [2]) and now I need a language for
> specifying dependencies between program units. 

To be clear, do you mean a replacement for PM, something that users
will write?

> The language should satisfy the following properties:

I like your properties, and think that mlb files are mostly there, but
I have a couple of questions.

>  2. The static semantics of the language should be clearly defined
>     (e.g., with inference rules in the style of the definition and
>     scope rules for infix directives).

As to the scope of fixity directives, my thinking is that we should
extend Basis with a component telling the fixity of identifiers.

 	Fixity = Infix of int | Infixr of int | Nonfix
	Basis = ... x (Id -> Fixity)

so that one could write the following to build the basis library, and
have the fixities be exported to user code.

	local  
	   build.mlb
	in
	   infixes.sml
	   basis-funs.sml
	   basis-sigs.sml
	   top-level.sml
	   overloads.sml
	end

If we need, we could add a fixity declaration to bdec

	<bdec> ::= ... | fixity <id>*

that would allow one to control what fixities are exported.  But that
seems pretty arcane to me.

>  3. The dynamic semantics of the language should be clearly defined
>     (e.g., with inference rules in the style of the definition and
>     scope rules for infix directives).

Here, we can either define this directly by parroting the static
semantics, or we can define a translation to SML and appeal to its
semantics.

>  4. The language should support binding of bases (e.g., with basis
>     identifiers). In principle, augmenting the PM system with this
>     feature allows for any acyclic dependency graph to be described.

That sounds fine to me.  I didn't have it in my original proposal,
since it seemed like using the file system to name bases was
sufficient.  As you saw, it made it into my second proposal, which
tried to do everything in a file-system independent manner.

BTW, it was never resolved on the MLton list or in my mind which of
those proposals was better (elaboration rules directly in terms of
files or via expansion to some smaller file-system independent
language).  I'd love to see some progress on the issue.

>  5. Support for renaming of structure, signature, and functor
>     identifiers would be great, but not very important.

Right, I view it as an essential convenience for users, nothing deep.

> With these properties, it would be possible to construct a tool
> (mlbdep) that takes a mlb-file, reads all the .sml-files, and
> constructs a new optimized mlb-file with the same static and dynamic
> semantics...

I don't understand the point of this optimization.  Is it dead-code
elimination?  Something else?

> A few comments to the static semantics of mlb-files as suggested in
> http://www.mlton.org/pipermail/mlton/2003-July/014421.html: Perhaps it
> would be nicer not to modify the static basis of the Definition but
> instead introduce a new basis map (M say) mapping basis identifiers to
> bases.
...
> BTW, it appears that your mlb-semantics allows nested bases, like
> {bid1 -> {bid2 -> B, F, G, E}, F', G', E'}, to implement the
> mlb-file caching performed by the expansion. Was this the intention?

Looking at it now, it looks like a mistake.  The idea is that the
basis identifiers are there only to cache the bases produced by mlb
files.  But this means they all need to be available to any other mlb
file that will reference them.  So, it the bid definitions should all
be at the top level.  This means that syntax and elaboration rules
should be changed to not allow nested declarations of bids.  I think
that would also address your first point -- it makes more sense for
there to be a simple basis map mapping basis identifiers to bases,
which do not include basis maps as a component.  Also, the clean rule
should clean out just the usual SML basis, not the basis map.

Here's a quick rewrite along those lines.

	<bdec> ::= clean <bdec> end
	         | functor <fctid> = <fctid>
	         | local <bdec> in <bdec>
	         | open <bid>
	         | prog <program> end
	         | <bdec> <bdec>

	<bmdec> ::= basis <bid> = bas <bdec> end
	          | <bdec>
	          | <bmdec> <bmdec>
	
	B in Basis = FunEnv x SigEnv x Env
	M in BasisMap = Bid -> Basis


	Judgement: M, B |- <bmdec> --> M', B'

	M, B |- d --> B'
	----------------------------------------------
	M, B |- basis b = bas d end --> [b |-> B'], {}

	M, B |- d --> B'
	-------------------- 
	M, B |- d --> {}, B'

	M, B |- d1 --> M1, B1   M + M1, B + B1 |- d2 --> M2, B2
	-------------------------------------------------------
	M, B |- d1 d2 --> M1 + M2, B1 + B2


	Judgement: M, B |- <bdec> --> B'

	M, {} |-> d --> B'
	--------------------------
	M, B |- clean d end --> B'

	-------------------------------------------------
	M, B |- functor F = F' --> [F |-> B(F')] in Basis

	M, B |- p1 --> B1   M, B + B1 |- p2 --> B2
	------------------------------------------
	M, B |- local p1 in p2 end --> B2

	-----------------------
	M, B |- open b --> M(b)

	B |- p => B'
	-------------------------
	M, B |- prog p end --> B'
	
	M, B |- d1 --> B1   M, B + B1 |- d2 --> B2
	------------------------------------------
	M, B |- d1 d2 --> B1 + B2

> Also, it would be nice to have the expansion be part of the
> specification. 

Absolutely.

> It would be great if we could end up agreeing on a language and a
> semantics?

That would be nice.

> [2] Martin Elsman. Type-Specialized Serialization with Sharing. IT
> University of Copenhagen. IT University Technical Report
> Series. TR-2004-43. February, 2004. Available from
> http://www.it.edu/people/mael/mypapers/ITU-TR-2004-43.pdf

Thanks for the link to this.  I took a look.  I did have one idea on
the implementation of type dynamic.  I was thinking it would be
cleaner to separate out the equals and hash functions as components,
rather than have to unify the the argument and returns types.  Here's
what I mean.

signature DYNAMIC =
   sig
      type t

      val equals: t * t -> bool
      val hash: t -> word
      val new: {equals: 'a * 'a -> bool,
		hash: 'a -> word} -> ('a -> t) * (t -> 'a option)
   end

structure Dynamic: DYNAMIC =
   struct
      datatype t = T of {clear: unit -> unit,
			 equals: t -> bool,
			 hash: word,
			 store: unit -> unit}

      local
	 fun make f (T r) = f r
      in
	 val hash = make #hash
      end

      fun equals (T {equals, ...}, x) = equals x

      fun 'a new {equals, hash} =
	 let
	    val r: 'a option ref = ref NONE
	    fun dest (T {clear, store, ...}): 'a option =
	       let
		  val () = store ()
		  val res = !r
		  val () = clear ()
	       in
		  res
	       end
	    fun make (a: 'a): t =
	       let
		  val equals =
		     fn x =>
		     case dest x of
			NONE => false
		      | SOME a' => equals (a, a')
	       in
		  T {clear = fn () => r := NONE,
		     equals = equals,
		     hash = hash a,
		     store = fn () => r := SOME a}
	       end
	 in
	    (make, dest)
	 end
   end