measurements of ML programs

Stephen Weeks MLton@sourcelight.com
Wed, 11 Oct 2000 09:53:57 -0700 (PDT)


> I am considerin doing measurements of ML programs with the intention
> of informing compiler writers. Here's a list of things that come to
> mind as potentially useful.
> 
> Static counts
> 
> 1. polymorphic vs monomorphic function definitions
> 2. number of free variables in functions
> 3. applications that take function arguments
> 4. applications that return a function as a result
> 5. calls to labels vs. calls to expressions
> 6. total applications
> 7. exceptions raised
> 8. how many comparisons are made in a pattern match (depth of decision tree)
> 
> Dynamic counts
> 
>  9. exception extents (number of dynamic calls from raise to handle)
> 10. calls to polymorphic/monomorphic functions
> 11. dynamic counts of 3-8 above
> 
> For example, the measurements could reveal that, say, 99% of functions
> have < 2 free variables.
> 
> I'd be interested in knowing how relevant are the items in the list
> above for implementors. Are there any other items that would guide
> your implementation if you had a good estimate of how often they occur
> in ML programs? Are any of the ones above not really interesting?

All of the numbers that you mention above are interesting.  However, there is a
big caveat.  If you make these measurements on the source language, there will
be little connection between your numbers and what goes on in an optimizing
compiler.  The problem is that a lot of rewriting goes on inside an optimizing
compiler, and this rewriting can drastically change the numbers.  For example,
inlining might completely remove a polymorphic higher order function, like
compose, thus making its contribution to 1, 2, 3, 4, 5, and 6 above misleading.
Similarly, an exception handling optimization that turns raises into direct
jumps might change numbers 7 and 9.

If your goal is to provide information so that compiler writers can produce
better compilers, I've found that a better approach is to look at the generated
code or some compiler intermediate language and to try to see how the code could
be improved.  For any nontrivial program, there are almost always obvious
inefficiencies that can be found.  From these, you can figure out what
optimizations need to be improved or if new optimizations need to be written.
Obviously, generalizing across compilers is much harder with this approach.

The difference between polymorphic and monomorphic functions and the difference
between calls to labels (known calls) and calls to expressions (unknown calls)
is an area of particular experience to the MLton developers.  The MLton
compilation strategy centers around removing polymorphic calls and unknown
calls.  In our experience, these can be removed to a very high degree.  Here are
a couple of documents that contain some numbers similar to the ones you are
thinking of gathering.  In both of these documents and in all of the
measurements we make internally, we never measure source programs, we always
measure some compiler intermediate language.  As you can see from some of the
numbers, which intermediate language you choose can make a big difference.

http://www.star-lab.com/sweeks/papers/00-esop.ps
p 13	dynamic counts for known vs unknown calls

http://www.star-lab.com/sweeks/papers/98-princeton.ps
p 20	syntax tree sizes before and after monomorphisation
p 25	static counts for known vs unknown calls
p 29	dynamic counts for known vs unknown calls

Good luck, and feel free to send us any more specific questions you may have
about MLton.