This page contains brief explanations of some recurring sources of confusion and problems that SML newbies encounter.
Many confusions about the syntax of SML seem to arise from the use of an interactive REPL (Read-Eval Print Loop) while trying to learn the basics of the language. While writing your first SML programs, you should keep the source code of your programs in a form that is accepted by an SML compiler as a whole.
The and keyword
It is a common mistake to misuse the and keyword or to not know how to introduce mutually recursive definitions. The purpose of the and keyword is to introduce mutually recursive definitions of functions and datatypes. For example,
fun isEven 0w0 = true | isEven 0w1 = false | isEven n = isOdd (n-0w1) and isOdd 0w0 = false | isOdd 0w1 = true | isOdd n = isEven (n-0w1)
datatype decl = VAL of id * pat * expr (* | ... *) and expr = LET of decl * expr (* | ... *)
You can also use and as a shorthand in a couple of other places, but it is not necessary.
It is a common mistake to forget to parenthesize constructed patterns in fun bindings. Consider the following invalid definition:
fun length nil = 0 | length h :: t = 1 + length t
- The pattern `h
t` needs to be parenthesized:
fun length nil = 0 | length (h :: t) = 1 + length t
The parentheses are needed, because a fun definition may have multiple consecutive constructed patterns through currying.
The same applies to nonfix constructors. For example, the parentheses in
fun valOf NONE = raise Option | valOf (SOME x) = x
are required. However, the outermost constructed pattern in a fn or case expression need not be parenthesized, because in those cases there is always just one constructed pattern. So, both
val valOf = fn NONE => raise Option | SOME x => x
fun valOf x = case x of NONE => raise Option | SOME x => x
Declarations and expressions
It is a common mistake to confuse expressions and declarations. Normally an SML source file should only contain declarations. The following are declarations:
datatype dt = ... fun f ... = ... functor Fn (...) = ... infix ... infixr ... local ... in ... end nonfix ... open ... signature SIG = ... structure Struct = ... type t = ... val v = ...
let ... in ... end
isn’t a declaration.
To specify a side-effecting computation in a source file, you can write:
val () = ...
It is a common mistake to write nested case expressions without the necessary parentheses. See UnresolvedBugs for a discussion.
It used to be a common mistake to parenthesize op * as (op *). Before SML’97, *) was considered a comment terminator in SML and caused a syntax error. At the time of writing, SML/NJ still rejects the code. An extra space may be used for portability: (op * ). However, parenthesizing op is redundant, even though it is a widely used convention.
A number of standard operators (+, -, ~, *, <, >, …) and numeric constants are overloaded for some of the numeric types (int, real, word). It is a common surprise that definitions using overloaded operators such as
fun min (x, y) = if y < x then y else x
are not overloaded themselves. SML doesn’t really support (user-defined) overloading or other forms of ad hoc polymorphism. In cases such as the above where the context doesn’t resolve the overloading, expressions using overloaded operators or constants get assigned a default type. The above definition gets the type
val min : int * int -> int
It is a common mistake to use redundant semicolons in SML code. This is probably caused by the fact that in an SML REPL, a semicolon (and enter) is used to signal the REPL that it should evaluate the preceding chunk of code as a unit. In SML source files, semicolons are really needed in only two places. Namely, in expressions of the form
(exp ; ... ; exp)
let ... in exp ; ... ; exp end
Note that semicolons act as expression (or declaration) separators rather than as terminators.
Type Variable Scope