measurements of ML programs

Fermin Reig reig@ropas.kaist.ac.kr
Fri, 13 Oct 2000 15:44:23 +0900 (KST)


>> Static counts
>> 
>> 1. polymorphic vs monomorphic function definitions
>> 2. number of free variables in functions
>> 3. applications that take function arguments
>> 4. applications that return a function as a result
>> 5. calls to labels vs. calls to expressions
>> 6. total applications
>> 7. exceptions raised
>> 8. how many comparisons are made in a pattern match (depth of decision tree)
>> 
>> Dynamic counts
>> 
>>  9. exception extents (number of dynamic calls from raise to handle)
>> 10. calls to polymorphic/monomorphic functions
>> 11. dynamic counts of 3-8 above

Hello Stephen,

Thanks for replying.

>All of the numbers that you mention above are interesting.  However,
>there is a big caveat.  If you make these measurements on the source
>language, there will be little connection between your numbers and
>what goes on in an optimizing compiler.

Right. I've done a few measurements of ocaml programs. I examine the
typed-AST, or the lambda language. These are very close to the
source. The idea is not to measure what happens during execution. For
that, it would be best to examine the last IL before code generation
or better the assembly language. But that would measure a combination
of the original ML program plus whatever a particular compiler did to
that program. For ex, a CPS-based compiler would emit quite different
code from a direct compiler for the same program.

>The problem is that a lot of rewriting goes on inside an optimizing
>compiler, and this rewriting can drastically change the numbers.  For
>example, inlining might completely remove a polymorphic higher order
>function, like compose, thus making its contribution to 1, 2, 3, 4,
>5, and 6 above misleading.  Similarly, an exception handling
>optimization that turns raises into direct jumps might change numbers
>7 and 9.
>
>If your goal is to provide information so that compiler writers can
>produce better compilers, ...

Yes, that is the goal, and that's why I want to measure source level
constructs.

> ...I've found that a better approach is to look at the generated
>code or some compiler intermediate language and to try to see how the
>code could be improved.

I guess that this is what most people do in practice. I find it
interesting that you say that looking at the generated code is not
only enough in practice but a better approach. This is mainly why I am
asking compiler implementors: it would be wasteful if I made all these
measurements and nobody really took advantage of them.

>For any nontrivial program, there are almost always obvious
>inefficiencies that can be found.  From these, you can figure out what
>optimizations need to be improved or if new optimizations need to be written.
>Obviously, generalizing across compilers is much harder with this approach.

True. And I wonder if the general approach can suffer from the same
problem. Say I want to measure polymorphic functions to see whether
specialization can be worthwile. If after a certain compiler pass (say
inlining) the numbers change substantially then the original numbers
may not be that useful anymore. In particular, compiler A may do more
aggressive inlining than compiler B, and the number of remaining
polymorphic functions will be quite different in each case.

So given these caveats, what do you think about going ahead with the
measurements?

Regards,

Fermin