a more efficient loopForestSteensgaard

Matthew Fluet fluet@CS.Cornell.EDU
Wed, 19 Dec 2001 15:59:43 -0500 (EST)


> I agree that inLoop is already computed in the scc computation.  My
> only worry is that my original representation was linear in the size
> of the input graph, while storing all the inLoops may be quadratic.
> Of course, we may already be screwed in such a case because it would
> have taken quadratic time to compute (using Steensgaard).  But I would
> hate to force a quadratic representation on other loop forests that
> don't take quadratic time to compute.

But, the flip side of that is that with your representation, computing the
(full) set of nodes in a loop is linear in the size of the input graph.
So, if you ask for the full set of nodes in a loop from each node, you're
still quadratic.

> >  - nodes that are marked as a loopHeader get extra alignment, 
> >     even if they are a fall through label; so I don't think that marking
> >     every node in a trivial scc as a header is a good idea.
> 
> I'm confused about the meaning of "trivial scc".  Is a single node
> with a self edge a trivial scc (I think not)? 

A single node with a self-edge is not a trivial scc.

> Those will appear as
> headers in my forest, which I think makes sense.  

Agreed.

> Single nodes without
> self edges will only ever appear in a notInLoop.  

I was confused by your comment that you would mark as a header nodes in a
trivial scc.  Since you never export that notion of of headerness, then my
above comment isn't really a problem.

> So, I think that
> aligning all of the nodes that appear in headers vectors makes sense.

Agreed.

> > So, at a minimum, I would want an isHeader predicate returned by a loop
> > forest calculation.  The other stuff I could probably compute from the
> > proposed return type.  
> 
> You can set an isHeader property on nodes by a single walk over the
> tree.

True.  And, I do that anyways, because I want the properties assigned to
x86.Label.t's so that I can ditch the Graph.t when I'm done with it.

> >  - x86GenerateTransfers wants to be able to compute the following to
> >     decide how to layout blocks at an Iff:
> >  if neither of the truee's and falsee's blocks have been generated,
> >   then fall thru to the label with the least "loop-distance" from the
> >   Iff's label (where loop distance from l1 to l2 is NONE if l1 and l2 are
> >   in non-nested loops, and SOME (d(l2) - d(l1)) where d(l) is the depth of
> >   the deepest occurence of l in the loop forest); this favors staying in
> >   the same loop or entering nested loops over exiting loops
> 
> "deepest occurrence of l in the loop forest" corresponds to the depth
> of the occurence of l in a notInLoop vector, right?  Assuming so, you
> can compute and store this as property by a walk over the tree.

Yes, you can store "deepest occurence of l in the loop forest", but you
need something else to determine if two nodes are nested relative to each
other.  You really want a representation of the path to the notInLoop
vector, so that you can see if one path is a prefix of the other; if so,
then the node with the longer path is nested inside the other node's loop.

> Since you're computing the distances from the same source (l1), this
> heuristic says to fall through to the more deeply nested loop, right?

Yes.

> I agree with favoring staying or going to a more deeply nested loop
> over exiting, but I am confused why you favor entering a more deeply
> nested loop over staying in the same loop.

No real empirical reason (and, any small permutations on layout of blocks
and how iff's fall don't make any consistent changes to runtimes).  There
are a couple of reasons, though, that I think it makes sense:
 1. The basic premise that loops are likely to be taken, either in
     staying in the loop or entering a loop; so, the right thing for
     branch prediction is to fall into the deeper loop; (think
     matrix-matrix multiply loops)
 2. The inner loop is probably smaller, so (hopefully) putting its blocks
     right here will be cheaper (code-size) than further jumps to and from
     the inner loop if its blocks appear "far" away.

> >  - given a node, x86LiveTransfer wants to be able to compute all of the
> >     nodes in same loop (actually, all of the nodes in the the deepest
> >     nested loop of which the original node is a member)
> 
> This is the only place I see where you need inLoop.  I don't
> understand well enough what you're using it for to know if the
> notInLoop equivalence relation could suffice.

It's being used for a spilling heuristic (or, more accurately, for a
carrying heuristic).  It's just your classic "weight things in loops by 5
times as much" heuristic.  Using notInLoop would be "fine", and I can
pretty much guarantee that you won't see anything in the runtimes (and
certainly not unless you use -native-live-stack true; with
-native-live-stack false, x86LiveTransfer is only working on pseudo-regs
which aren't live across limitCheck (hence, loops) anyways).

> So, I'm still unsure whether we need the inLoop or not.

Maybe not.


I guess, in conclusion, I can switch over to the new loopForestSteensgaard
representation (without inLoop).  As I said above, I doubt the numbers
will come out any different.