[MLton] GNU configure and autoconf.

Jesper Louis Andersen jlouis@mongers.org
Tue, 11 Jan 2005 17:03:22 +0100


Quoting Stephen Weeks (sweeks@sweeks.com):

> That would be nice.  Not knowing much about them, I simply observe
> that we seem to have done well without them, even on a range of
> platforms, and it is usually best to avoid dependence on a new tool
> when possible.

Recently I recommended staying away from the GNU autotools for MLton.
I promised to write up something which explained why. First, though,
I would like to give a little input on what the GNU autotools is and
what they are doing on projects.

We are more or less accustomed to run ./configure these days when 
installing software from source. configure is a shell-script for gaining
information about the environment. It checks for a couple of things:

* Header file availability.
* Library availability. (By trying to compile small C programs)
* Program availability
* C-library function availability (strn* and strl* type functions come
  to mind). 

When having tested, the output is a config.h file setting a list of
#defines for each function tested, eg #define HAVE_STRLCPY 1,
#define HAVE_LIBGMP 1, etc. Also, the output includes Makefiles, built
from templates, Makefile.in. This is for setting the correct C/Fortran
compiler, setting right CFLAGS, LDFLAGS etc, such that the program can
be compiled on the platform. Furthermore it fills the template with
things such as install paths for the program. This makes you able to
do things like:

./configure --prefix=/some/build/path
gmake install prefix=/some/other/installation/path

which greatly helps various archaic installation systems for software
such as stow, depot or graft. 

In principle, there is no limit on the filetype template files one could
generate. mltonconfig.sml.in would be perfectly doable, but one must
be aware that config.h.in is special, containing lines of the form
#undef uid_t, #undef pid_t, etc, which then gets defined, if the type/
header file/function is available. 

---

The tool for building the configure shell script is called autoconf. 
Autoconf is a collection of files written in the M4 macro processor and
perl. Input to autoconf is a M4 macrofile, often called autoconf.ac. 
The file contains information, as m4 macros for what to check for and
what the program needs. AC_PROG_CC makes a search for a valid ANSI C
compiler, for instance. AC_ISC_POSIX checks for a POSIX environment.
More specialised are AC_CHECK_FUNCS(getaddrinfo), which checks for a
specific function.

Note that you only need autoconf for building the configure script. You
do not need it to run the configure script. This decision was obviously
made because you avoid some odd dependency. You still need it as a
run-dependency though.

---

When writing makefiles, one realises that there is a great deal of
redundancy associated with the process. Therefore automake was built
as a tool. Automake is a tool for generating makefile _templates_ 
from a simple configuration. For instance, look at a little project of
mine and the sizes of the Makefiles, to see the size explosion:

mushroom$ ls -la Makefile*                                                     
-rw-------  1 jlouis  users  133117 Nov 23 20:06 Makefile
-rw-------  1 jlouis  users    5306 Nov 23 20:06 Makefile.am
-rw-------  1 jlouis  users  159585 Nov 23 20:06 Makefile.in
mushroom$ 

Automake is able to build C files, use lex and yacc, build fortran
files and such. 

---

You do not have to use automake and autoconf together.

---

Then, one may ask, why don't I like the autotools? One must keep in 
mind this is a subjective decision, but I will try to back up each
postulate with an argument:

* Autoconf and in particular automake are not general enough.
  Both are built upon the assumption that one builds software in
  fortran, C or C++. Even while you can get the tools to understand
  other build formats, it might take some time to do it.
* The interface changes rapidly. 
  Obviously, this is a bad decision. One cannot use autoconf 2.13 for
  autoconf 2.59-files. The solution for various package systems has
  been to build all versions of autoconf and use these appropriately
  on each package. It also means that using low-level features of
  the macros is a nono, due to the rapid change (this is not strictly
  a problem, since these macros are not part of the API, but if we need
  to rely on them for MLton, we might have a problem).
* The autoconf system supports various old architectures, we do not
  ever want to support. Ultrix and similiar UNIX-systems comes to mind.
  The problem is that this slows down the configure environment 
  considerably.
* autoconf and configure is slow. This is a problem in itself. At the
  NetBSD system we have regression problems with old 35 Mhz sparcs 
  being more than 5 minutes in a ./configure.
* probed values are not shared across different configure scripts
  from different applications.
* Documentation is not sparse, but bad.
  The docmentation of the autoconf system is spread out in 50 GNU-info
  files. You have to read them in order to understand autoconf. The
  problem here is that most people just copy existing scripts from 
  other people than write their own.

---

Automake is definitely worse than autoconf, and is possibly entirely
unusable for MLton. autoconf might be nice to have for the C parts, but
you should set some time off for migration, because it might be a 
hard ride to get everything into shape. 

I remember that John H. Reppy explained the original SML/NJ structure
(2-tiered, A system is a set of features, features implement what we
need). I can see that this is what MLton does these days. With the 
relatively few architectures we support, and the strong link between
the architecture and the compiler, I do not at the time see a reason
to change. 

On the other hand, if we plan on cranking up the number of supported
architectures, it might be nice to have a general way of handling the
build system. Autoconf is bad, but there is no real alternative.
The question is the developtment time involved versus the development 
time needed in the long run.

-- 
jlouis