local refs

Matthew Fluet Matthew Fluet <fluet@CS.Cornell.EDU>
Thu, 6 Dec 2001 19:48:00 -0500 (EST)


> Localization rule:
> You can localize a global ref if all uses are within a single
> function and the start label of the function is not multi-used.

This works, but fails to localize global refs in a function that is only
called once, but has local loop including start; e.g., this would happen
in introduceLoops when it introduced a loop in a function with no
arguments.  So, we would also want a multi-used on functions, where
multi-used(f) ==> multi-used(start(f))
but not the converse.

> Ref-flattening rule:
> You can flatten a local ref (x = Ref_ref y) if every use is of the
> form Ref_assign (x, z) or Ref_deref x and all uses of x in a block
> with a multi-threaded label are Ref_assigns or all are Ref_derefs.

I don't buy this at all.

L_A ()
  y = Ref_deref x
  case x of true => L_A | false => L_C
L_B ()
  Ref_assign(x, true)
  L_B ()

Assume all lables are are multi-threaded.  This satisfies your criteria,
but if one thread executes L_A and another thread executes L_B, then we
won't get the desired semantics (i.e., L_A busy-waits until L_B releases
it).

And, on the other hand,

L_A ()
  x = Ref_ref 10
  Ref_assign (x, 20)
  y = Ref_deref x
  L_C ()

is flattenable (if stupid), even if L_A is multi-threaded.

No, we really need an analysis that approximates _which_ thread can
execute which block.  In the first case, its the fact that different
threads can have references to x that make x not flattenable.  In the
second case, it is the fact that only one thread has a reference to x; or,
rather, that each thread that executes L_A has it's own (local) version of
x.

Hmmm... that might not even be true.  What if I had:

L_A ()
  r = Ref_ref 0
  L_B (10)
L_B (x)
  b = x <> 0
  case b of true => L_C | false => L_D
L_C ()
  a = Ref_deref r
  L_C' (a+1) Overflow L_O
L_C' (b)
  Ref_assign (r,b)
  L_B (x-1) Overflow L_O
L_D ()
  y = Ref_deref x
  y
L_O ()
  raise (Overflow)

Let's suppose this is the entire function; i.e., no Thread_copyCurrent()
in it.  Now, thread t1 hold a reference to thread t2; thread t2 begins
executing this function, while in L_C loop it is interrupted; thread t1
begins executing, makes a copy of t2, and switches to it; thread t2' (the
copy) begins executing, while in L_C loop it is interrupted; thread t2
continues executing, but can return from the function with a value greater
than 10.

Yuck.  It seems as though the presence of Thread_copy makes _every_
program point reachable (and _dynamically_ reachable; not just forward in 
the whole-program control flow graph, but also "backwards" along returns
and raises) from _any_ Thread_copyCurrent has the potential for being the
origin of multiple executions.

That being said, and assuming that the multi-threaded predicate abides 
by the previous paragraph, maybe you meant to say:

Ref-flattening rule:
You can flatten a local ref (x = Ref_ref y) if every use is of the
form Ref_assign (x, z) or Ref_deref x and all uses of x in _all_ blocks
with multi-threaded labels are Ref_assigns or all are Ref_derefs.
(that is, once the ref escapes into the multi-threaded world, all uses are
either assigns or derefs.)

This is crappy!!  Maybe I'm missing something about the semantics of
threaded programs; do I really have the ability to make arbitrary copies
of an executing thread, at arbitrary points (modulo where the backend will
really put in interrupt points), and start them running?  Somebody please
tell me it's not this bad!

> This provides a nice separation between localization and flattening.
> Conceptually, globals are first localized if possible.  Then, local
> refs are flattened, if possible.  There is no connection between the
> two.  Localization only depends on multi-used and flattening only
> depends on multi-threaded (and non-escaping).

Agreed.  This actually might be very nice, because it would allow localRef
to all localization before doing any Ref-flattening.  What's currently
there delays shrinking until the final set of globals is available,
because the shrinker won't clear it's properties from the globals, and I'm
generating a new set of globals after processing each function.

> There is one problem with the fact Thread_copyCurrent can occur in the
> middle of a block.  So, a label can be not multi-threaded and still
> have statements (after the Thread_copyCurrent) that are
> multi-threaded.  There are two possible fixes.
> 
> 1. Compute multi-threaded (and multi-used) with a per-statement instead
>    of a per-label granularity.
> 
> 2. Make Thread_copyCurrent a Transfer with only one target?  Then
>    per-label will be granular enough.
> 
> (2) feels right to me, and in keeping with our constant drive to make
> SSA closer to the machine -- after all, there really is a label there
> in the codegen, corresponding to the return from InvokeRuntime.

I agree.  Doing the analysis around statements of the form 
Thread_copyCurrent was going to be a pain.