[MLton] genetic algorithms for SSA optimization phase ordering

Matthew Fluet fluet@cs.cornell.edu
Mon, 24 Nov 2003 12:05:52 -0500 (EST)


A recent article on using genetic algorithms to optimize gcc flags
(http://www.coyotegulch.com/acovea/index.html) reminded me of Steve's PLDI
trip report, where he mentioned:

> Mark Stephenson of MIT gave a talk, "Meta Optimization: Improving Compiler
> Heuristics with Machine Learning", which I did not see, but I talked
> with him later and he recommended genetic programming for tackling the
> phase ordering problem we have in the MLton optimizer.  He also gave a
> pointer to Cooper's work on the subject ("Optimizing for Reduced Code
> Space using Genetic Algorithms").

I looked up the paper by Cooper et. al.
(http://citeseer.nj.nec.com/cooper99optimizing.html) and thought it would
be fairly easy to adapt to MLton.

First step was to add a -ssa-passes flag, which allows setting the SSA
optimization passes from the command line.  The notation is a ":"
delimited list of passes; so the first few passes from the default SSA
optimization sequence can be given like:
 -ssa-passes "removeUnused:inlineLeaf(20):contify:localFlatten"

Next step was to write a little GA program, which wasn't too difficult,
although there are a lot of tweakable parameters that I'd like to hear
thoughts on the right settings.

I followed Cooper's work pretty closely

A chromosome is a vector of SSA optimization passes.  There are 17
parameterless optimization passes.  The inline pass can be invoked as
NonRecursive, LeafNoLoop, or Leaf, each of which takes some size
parameters.  We know that the shrinker is not idempotent, so I made a
stand-alone invocation of the shrinker an SSA pass.  Finally, in order to
keep the chromosome vectors of constant length (this seems like the right
thing for cross-over mutations), there is a NOP pass.  However,
chromosomes are considered equivalent if they have the same sequence of
non-NOP passes.

Each chromosome is assigned a  real option  fitness value.  A NONE value
means that the chromosome failed in some way; in particular, sequences of
options that lead to compiler errors are given fitness values of NONE.
Some compiler errors are o.k. (e.g., leaving out the polyEqual pass),
while others may indicate bugs.

A population is a set of 20 distinct chromosomes.  The evolutionary step
takes a population to a new population, and proceeds in the following
manner:
 - eliminate all NONE valued fitness chromosomes
 - order the remaining chromosomes (survivors) by fitness
 - divide the survivors into an upper (better) half and a lower (worse) half
 - eliminate the worst chromosome and three additional random chromosomes
     from the lower half.
 - randomly choose two pairs of chromosomes to cross-over; to cross-over,
     simply combine the first half of the first chromosome with the second
     half of the second chromosome, and vice versa.  This leads to 4 new
     chromosomes to replace those eliminated
 - subject all but the best chromosome in the upper half to random
     mutation of 0.05; subject all the chromosomes in the lower half to
     random mutation of 0.10.  A random mutation of p simply maps over
     the chromosome, replacing each pass with a random one with
     probability p.
 - eliminate any duplicate (up to equivalence) chromosomes; bring the
     population back up to 20 by random mutations of 0.05.

A key concern is how to design the fitness function.  Here's what I've
done, basically using the benchmarking infrastructure.  We have three
quantitative values -- compile time (C), size (S), and run time (R), each
assigned a weight CW, SW, RW.  We run the default ssa passes on each of
the benchmarks to determine a baseline CB(bench), SB(bench), and RB(bench)
for each benchmark.  Fitness is defined with respect to this baseline.
We further give a weight to each benchmark W(bench).  Each test chromosome
T is executed on each of the benchmarks to determine CT(bench), SB(bench),
and RB(bench).  The fitness of a test chromosome is given by:

fitness(T) =
  let
    t = Sum_{b in Bench} W(b)
    c = (Sum_{b in Bench} (CT(b) / CB(b)) * W(b)) / t
    s = (Sum_{b in Bench} (ST(b) / SB(b)) * W(b)) / t
    r = (Sum_{b in Bench} (ST(b) / SB(b)) * W(b)) / t
  in
    ((c * CW) + (s * SW) + (r * RW)) / (CW + SW + RW)
  end

The various weights allow one to tradeoff improvements in runtime to size
and compile time increases, and to set more important benchmarks.

Note that the baseline chromosome has fitness 1.0 and the lower the
fitness, the better.

Any suggestions for how to set the weights; or a different fitness
function altogether?

Notice also that fitness shouldn't change between runs; so we keep a
hast-table of chromosomes to fitness, and only run benchmarks whe looking
up a new chromosome.

In any event, no results, other than finding a long standing bug in the
simplifyTypes pass, which was revealed pretty quickly when the ssa passes
were randomized.