local refs

Matthew Fluet Matthew Fluet <fluet@CS.Cornell.EDU>
Tue, 11 Dec 2001 12:51:45 -0500 (EST)


In an effort to understand why localRef is such a slow pass, I made the
restore and shrink passes Control.traceBatch to sum their times over all
applications:

            localRef starting
               multi starting
               multi finished in 0.70 + 1.99 (74% GC)
               restore totals 2.98 + 3.30 (53% GC)
               shrink totals 0.43 + 0.0 (0.0% GC)
            localRef finished in 5.47 + 5.29 (49% GC)

multi looks o.k., I think.  Although, it should be pointed out that a
self-compile isn't really the best test of multi; the fact that main has a
use of Thread_copyCurrent means that every other function that it calls
will be marked funcIsMultiThreaded; the fact that we process the scc
components of the whole-program call graph in topologically sorted order
means that we process main first, and then almost every other function we
process is already set funcIsMultiThreaded, so there is no need to compute
it's local control-flow graph and do scc components to look for multiUsed
labels; instead, we just do an unorded walk over the function and set
everything to multiThreaded and multiUsed.

shrinker looks fine; quite a testament to Steve, because localRef can
sometimes reveal significant opportunities for shrinking (as in the
random/fib/tak program).

That leaves restore and localRef itself.  Obviously, from the timings
above, restore is way too slow.  The restore pass has some Control.trace's
setup, but they're triggered on every application of restore to a
function; i.e., it'll take some work to get useful timings over all
applications of restore.

Even so, localRef itself could probably be sped up a little, because it's
time is 5.47 - 0.70 - 2.98 - 0.43 = 1.36, which might be a little high,
considering that it should just be a quick analysis and rewrite (although,
a self-compile does require checking the thread condition on flattenable
refs, even though only a couple in main are likely to be kicked out by
this).

On the other hand, we aren't that far off of commonSubexp:
            commonSubexp starting
            commonSubexp finished in 4.51 + 4.44 (50% GC)
which is the other analysis that requires computing the dominator tree for
each function.  (restore needs to compute the dominator tree; but, we are
only restore ing 126 out of 1101 functions, so it's still taking too
long.)