local refs

Stephen Weeks MLton@sourcelight.com
Thu, 6 Dec 2001 15:41:39 -0800


> I'm currently writing a note describing how I think about localRef,
> multiple threads, and once-ness.  I'll send it soon.

Defn:
A label is exactly-multi-threaded if it may be executed by two
different threads during some run of the program.  The initial call to
main counts as one thread.

Defn:
A label is actually-multi-used if it may be executed more than once
during the execution of the program.

Claim: 
There are linear time analyses for computing (over) approximations to
these properties.  Call the approximations multi-threaded and
multi-used.  It's obvious, since you could always just say every label
is multi-threaded and multi-used.  What I mean is that there are
pretty simple dfs's of control-flow and call-graphs that compute them.
Or, maybe simpler to implement would be one of our usual fixed-points
over the whole program using a two point lattice element associated
with each label.

Fact1: If a label is multi-threaded then it is multi-used.

Fact2: If a program doesn't contain Thread_copyCurrent, then no label
is multi-threaded.

Localization rule:
You can localize a global ref if all uses are within a single
function and the start label of the function is not multi-used.

Ref-flattening rule:
You can flatten a local ref (x = Ref_ref y) if every use is of the
form Ref_assign (x, z) or Ref_deref x and all uses of x in a block
with a multi-threaded label are Ref_assigns or all are Ref_derefs.

This provides a nice separation between localization and flattening.
Conceptually, globals are first localized if possible.  Then, local
refs are flattened, if possible.  There is no connection between the
two.  Localization only depends on multi-used and flattening only
depends on multi-threaded (and non-escaping).

The multi-used analysis can be used by constant propagation as well.
That would be an improvement, since constant propagation currently
assumes only main is not multi-used, and computes over the
control-flow graph of main.

There is one problem with the fact Thread_copyCurrent can occur in the
middle of a block.  So, a label can be not multi-threaded and still
have statements (after the Thread_copyCurrent) that are
multi-threaded.  There are two possible fixes.

1. Compute multi-threaded (and multi-used) with a per-statement instead
   of a per-label granularity.

2. Make Thread_copyCurrent a Transfer with only one target?  Then
   per-label will be granular enough.

(2) feels right to me, and in keeping with our constant drive to make
SSA closer to the machine -- after all, there really is a label there
in the codegen, corresponding to the return from InvokeRuntime.

> Ugh, this gets worse.  Not only must you forbid localization of refs live
> across a block with a thread copy, you must forbid localization of refs
> live across a non tail call to a function which either directly or
> indirectly does a Thread_copy.

Agreed, except for Thread_copyCurrent, not Thread_copy.  Unfortunately,
if the program only uses threads, the only use of Thread_copyCurrent
will be at the beginning of the program, which will make everything
appear multi-threaded.  The analysis would need more smarts to know
that the first time through thread.sml, !func = NONE, and hence that
all the initialization code is not multi-threaded.