once, local-refs, and threads

Matthew Fluet Matthew Fluet <fluet@CS.Cornell.EDU>
Thu, 6 Dec 2001 10:13:28 -0500 (EST)


O.k.  I've been thinking about this a bit more and with a more precise
analysis, we could get better results in both once/constantProp and
localRef.  BTW, my analysis for when a function is called exactly once in
localRef is totally wrong (It's looking for trivial components of the
call-graph, which ignores the possibility that a loop local to a function
could make multiple non-tail calls to function; hell, it ignores the fact
that a function with multiple call sites is called more than once!  I
changed it to just mark the main function as called once.  That's a
conservative approx.). 

First, for local ref, assume a correct predicate once: Func.t->bool that
determines that a function is called exactly once (in either a
single-threaded or multi-threaded program).

n.l. == not localizable
p.l. == potentially localizable

in a multi-threaded program, call a ThreadCopyPoint in a function any
point in the function where the primitive Thread_copy is invoked or any
non-tail call to a function which reaches a ThreadCopyPoint (i.e., either
the called function invoked Thread_copy or calls a function that itself
reaches a ThreadCopyPoint; that is, the appropriate fixed point).

For function f:
 if once(f)
   then any global ref that is only used in function f is p.l. 
 any local ref that is only assigned to or dereferenced is p.l.
 if single-threaded
   then any global or local ref that is p.l. is localizable
   else (* multi-threaded criteria *)
        compute the set of p.l. refs live at each point in the function
        for each p.l. ref R
          if R is live at two (not necessarily distinct) 
             ThreadCopyPoints p1 and p2
             such that p1 reaches a Ref_assign(R, ?) and
                       p2 reaches a Ref_deref(R)
            then R is n.l. (and hence, not p.l.)
        any global or local ref that is p.l. is localizable

The idea is that execution might restart at any Thread_save an
arbitrary number of times and in any order.  If the thread save occurs
down a non-tail call, we might return to the non-tail call point an
arbitrary number of times.

If at each of these ThreadCopyPoints we can only reach Ref_deref, then we
can localize the ref, because the final value will be on the stack for
each copy of the thread.  If at each of these ThreadCopyPoints, we can
only reach Ref_assign, then we can localize the ref (the updates are never
used and will be deleted as dead code by removeUnused).  But, if there are
two ThreadCopyPoints where p1 reaches a deref and p2 reaches an assign,
then we need to leave the ref, because it could be serving as a channel to
pass information between the threads.  Finally, if R is not live at any
thread copy point, then it doesn't escape to the stack and is localizable. 

This sort of analysis would allow localRef to eliminate the "toplevel
handler installed" ref in callcc3, because the ref's last update occurs
before the thread copy.  And, that ref is global to the program, so I
don't think we need to punt on globals in a multi-threaded program.  We
just need once(f) to be correct in a multi-threaded world.  Again, a
conservative approx is just to say main is once and everything else is
not-once.

As an implementation practicality, liveness isn't really required.  A ref
R is live at a ThreadCopyPoint if that point reaches a use of R, which, if
R is p.l., corresponds to either a Ref_deref(R) or Ref_assign(R, ?).  In
an implementation, I would probably use two bool arrays to represent the
occurrences of Ref_assign and Ref_deref for each p.l. ref and do DFS from
each ThreadCopyPoint.  It's the quadratic (in the number of
ThreadCopyPoints) search through the pairs of ThreadCopyPoints that
bothers me.  Thread_saves aren't so bad, but the main function in a self
compile is going to have tons of non-tail calls.