[MLton] SML documentation tools

Ray Racine rracine@adelphia.net
Wed, 20 Jul 2005 22:50:52 -0400


On Wed, 2005-07-20 at 16:16 -0700, Stephen Weeks wrote:
> It occurs to me that a powerful way to build an SML documentation
> system by embedding documentation in SML comments would be to
> integrate with -show-basis, so that the description of a value appears
> just above its type signature.

Java 1.5 added the concept of annotations (source file meta-data).  This
concept originally started out as the old JavaDoc system of using '@'
prefixed key words which were treated by the javadoc program to generate
browsable web documentation from source files.

Later a rather clever fellow involved involved in an open source
implementation started using the '@' prefixed of metadate within
comments in source files to do all sorts of things, xdoclet.  It became
a user customizable preprocessor.  This was used for all kinds of
interesting things that one could do in a preprocessor phase, create
doc, symbol/code xrefs, macro expansions, source code templating  and
other treatments of source code syntax itself.

This was formalized into the language spec itself in Java 1.5.  Recently
I poked around on how the metadata annotation system was designed to
work and for something I originally noticed as a "uh thats nice" feature
on the what's new in Java 1.5 doc I quickly gained appreciation of a
really cool concept.  Totally unexpected.  A generic extensible macro +
preprocessor system.  

In a nutshell. 	There is a concept of source file metadata syntax known
as annotations.  It is a self-referential system.  There are some
predefined annotations which are used to define user definable
annotations.  The predefined annotations are defined in terms of
themselves.  An annotation can be specified as applicable to certain
kinds of source code artifacts such as class, method, even package.  The
compiler and all other source code operating tools ignore annotations
they do not understand.  There is a standardization of the mapping of
annotations to java classes with interfaces dictated by the meta-
annotations which one uses to define user defined annotations.  This
happens dynamically as the tool processes the source file.

An example of simplistic use of annotations would be for document
generation from source code files.  One would write a utility which
processes the source files and generates documenting artifacts such as
html, or a pdf or whatever.

But it gets better.  Instead of constantly writing a tool for each set
of user defined annotations one can use a generic tool such as apt.
http://java.sun.com/j2se/1.5.0/docs/guide/apt/GettingStarted.html

You can read about it here and google about annotations for further
info.  Basically the tool processes source files recursively, and will
dynamically map source code annotations to java ADTs, and will invoke
all registered listeners for the annotation.  When the handler is
invoked it receives the annotation metadata AND detailed information
about the source code depending on what source code artifact the
annotation type was defined to be applicable for.  So for example, for
an annotation applicable to a method the registered handler receives a
parsed abstract syntax tree of the method.  The handler can do whatever
it wants with the metadata and the AST including emitting replacement
source syntax, documentation, or even a whole new source file.

As extra cool, the compiler can embed metadata into the compiled code
and there are api's to access the code at runtime and the program can
act on the embedded metadata.

You can do a lot of things including:
 - Emit documentation about source during the compilation phase.
 - Macro expansion.
 - Code templating.  Insert before, after, around code.
 - Runtime tagging.
 - Generation of additional source files.

OK.  What would an analogous system look like for MLton.  The compiler
would be extended to include the source code annotation pre-phases
similar to the java apt tool.  Another option is along the lines of cpp
preprocessor approach.  Lets go with a cpp like utility called spp as
the dynamic aspects are tougher to address in MLton.  

A signature could be defined for an annotation handler.  People write
structure handlers to deal with various annotations which are
contributed and then compiled into the MLton spp tool.  Typical source
code artifacts for annotations in SML would be signature, structure and
fun and val.  Expression level annotations would be interesting.

The spp tool without handlers is a NOP preprocessor pass.  If one added
a documentation annotation handler the spp tool will call the doc
handler for all documentation annotations and will receive the AST for
the SIG, STRUCT, FUN or VAL and generate all kinds of interesting
documentation.  Heck its got the whole AST to work with.

But wait, there's more.

(* @trace (exit, entry, values) *)
fun doit (x,y) = let val z = ... in ... end.

An @trace (exit, entry, values) handler will take the AST of the parsed
syntax for the SML code for a function fun doit and insert print trace
statement a la "Enter function doit with x = 2, y = 3, z = 5." and "Exit
function doit."

(* @function updateit
 * @description updates the global counter reference
 * @trace (mutation) *)
fun updateit (x) = y := x

Annotation @trace (mutation) for a function would have the semantics
"look at the given SML AST and for all reference mutation operations
print the old and new value a la "In updateit reference y was 3 now 6."
will be printed.

(* @show *)
datatype t = S of string | I of int

This annotation would result in auto generating 
fun toString_t x = case x of S s => s | I i => Int.toString i

(* @show
 * @layout )
datatype t = ...
would generate a MLton layout_t function as well as a toString_t

(* @functional_update *)
type t = {fname: string, lname: string} 
would auto generate the appropriate functional record update code.

Right enough rambling.  Stephen was already proposing a fairly
sophisticated system for source code processing to generate
documentation and wasn't all that far short of a full bore annotation
system which is powerful enough to be a documentation system, a macro
system, a aspect oriented programming system, a trace/debug system, and
a macro system.

Ray