[MLton] Emacs mode for editing ML Basis files

Sun, 7 Aug 2005 07:51:45 +0300

Quoting Stephen Weeks <sweeks@sweeks.com>:
> An indentation case I'm not so sure about -- perhaps it would be
> better to ignore the indentation of comments when deciding how to
> indent subsequent lines?

I agree that it is reasonable to ignore comments when trying to
determine a proper indentation level. The main (only?) exception is
when indenting a comment line:

  (* Comment spanning
   * multiple lines.
   *)

> For example, in the following, I would like z.sml where it is, even
> though the comment is in column zero.
> 
> local
> (* This is a comment. *)
>    z.sml
> in   
> end
> 
> Currently, the indentor moves z.sml to column zero.
>
> It is a separate question if the indentor should move the comment to
> column three if I ask it to indent that line.  I'm not sure what to
> do there (currently, the comment is moved).

I'm not sure whether they are easily separable.

Sidenote: Most emacs modes (e.g. all Java, C, C++, Ocaml, Lisp and Scheme
modes I have used) indent comments to the same level as code. SML mode is
the only mode that (at least on my XEmacs), IMO, fails to indent comments
properly (it usually indents them closer to the end of the previous line
than the beginning of code on the previous line). I personally find the
indentation algorithm of SML mode (at least the versions I've used) to be
broken. The reason why I'm saying this is that we may expect very different
things from the indentation algorithm. What I personally want is fully
automated indentation. I just want to press tab on a line or type M x
indent-buffer to (re)indent the whole buffer (or anything between those
extremes).

The way I see it, indentation should be the same whether you indent
individual lines or the whole buffer. In other words, indentation should
be consistent. In the above case, IMO, there should be only one "correct"
indentation for the comment. Whether the comment should start at column 0
or column 3 should be determined by a customization setting.

I have been thinking about different ways to indent ML Basis syntax. The
following outlines the most promising approach (in my mind). The basic
approach is to use a number of "reference" keywords {ann, bas, basis,
functor, let, local, open, signature, structure, *} as reference points
for indentation; indentation of subsequent lines is (roughly) based on the
indentation of those keywords. The set of reference keywords to look for
is chosen based on the start of the line to indent. For example, when indenting
a line starting with the keyword `end', only the reference keywords {ann,
bas, let, local} would be used. The indentation algorithm looks for the
closest reference keyword at the same nesting level. In other words,
nested blocks {ann [in] end, bas end, let [in] end, local [in] end, " ",
(* *)} are skipped (using a stack) when looking (scanning backwards) for the
closest reference keyword ("a poor man's parser"). Only keywords are actually
considered while scanning (annotation strings, id's and paths are skipped).
Once the reference keyword is found, the indentation level (for the line
being indented) is determined by the column of the reference keyword, the
tokens and their columns at the beginning and at the end of the line of the
reference keyword and the first token on the line being intended.
Customization would be based on allowing the user to essentially give a
formula (or rule) for the indentation in terms of those "features".

Consider how to indent the line containing `end' in the following cases
that are supposed to be indented as the user prefers according to her
customization settings:

   basis B = bas
                ...
             end

   basis B = bas
         ...
      end

   basis B = bas
      ...
   end

   basis B =
      bas
         ...
      end

The features gathered by the outlined indentation algorithm should be sufficient
to produce each of the above styles. Note that the first three styles are in
conflict (the middle two less so).

-Vesa Karvonen