SSA

Matthew Fluet fluet@CS.Cornell.EDU
Mon, 22 Oct 2001 14:16:47 -0400 (EDT)


> > My first attempt at combining all the GC_stack and GC_thread data
> > works o.k. for single threaded apps, and even for some
> > multi-threaded apps, but mutex.sml and thread2.sml both were
> > failing.
> 
> I am confused what "first attempt" means.  Is this the scheme that
> can't work because of the resizing argument above?

Yeah; my "first attempt" completely combined GC_stack and GC_thread, and
resultinged in bugs; my "second attempt" only moves exnStack from thread
to stack.

> Having thought about this idea over the weekend before you sent your
> mail, and now seeing your mail, makes me think that we don't want to
> go this direction.  I see only costs (performance and code space) and
> no benefits.  Here are the costs:
> 
> 1. extra static data for frame layouts
> 2. GC_foreachPointerInObject hits pointers twice (and hence forward is
>    slower) or possibly is made more complicated to avoid doing so
>    (although I don't see how) 
> 3. accessing the exnStack field is slower due to the extra indirection

I was kind of coming to this conclusion myself (I just didn't want to
admit it).

(As to the "how" on #2, we keep a "top" for both cont and handlers.  If
they are the same, then we know that the corresponding frame layouts
are for the same frame.  Do a "merge sort" like walk down both frame
layouts and hit slots exactly once.  If they are different, then this
frame only has a cont layout, and process the frame layout as normal.
Agreed, this slows stuff down a little.)

> The only cost of the current approach is that we may have to create
> more stubs to associate framelayouts with each cont-handler pair.

Yes.  I'm taking it that we still pushing to eliminate the
counter-intuitive control flow at non-tail calls that is currently in SSA.

How "bad" do you think taking the explicit union of two frames would be?
The simplest "fix" I can think of is to make the cont record element in
labelInfo of backend.fun into a list of Label.t option * Mlabel.t (i.e.,
for each handler associated with a cont, produce a unique stub), and
then have genCont work pretty much the same as now, except it would union
the liveNoFormals of both the cont and the handler, and iterate over all
kinds of handlers.  And, obviously, the translation of non-tail calls
would need to pick the right stub label given the handler.

The more complicated fix would be to do what I said earlier, which is to
indicate to live.fun that there is a "to be created" block for each
cont/handler pair and let it put in the appropriate control flow edges
relative to this virtual block.

A third option might be to have a final SSA pass that forces the unique
cont/handler requirement, but I think we can get bitten by one of my
earlier examples where one cont just directly jumps to the other, and the
first one inherits the live slots from the handlers of the second cont.

> > mandelbrot has me completely stumped.
> 
> I have no idea.

I put up mandelbrot.cvs and mandelbrot.newer at
http://www.cs.cornell.edu/People/fluet/MLton
If anyone wants to try them on their machine, just to see if the huge
difference still shows up, I'd appreciate it.  Particularly, I'd like to
see if its particular to PentiumIII's.

I'm not so concerned about the particular compiler that produced
mandelbrot.newer, but I am a little concerned that we can lose (or gain)
upwards of 80% run-time for no obvious reason.

> > The failure of vliw under the new compiler is not directly related to the
> > changes I made.  
> ...
> > Anyways, I just found this, so I haven't taken a look at what needs to be
> > done to fix it.
> 
> I'll look into this if you want.  Or, I can let you and I'll move on
> to checkHandlers.

Well, if you haven't already, take a look at the cfg.dot file for
member_1.  The thing that struck me is that the allocation is in a loop
that can either go through a series of non-tail calls or directly loop.
Maybe that doesn't make any difference, but it seemed a little unusual to
me.

> If you're going to be looking at limit checks, here are the known
> problems on my todo.
> 
> 1. It moves checks inside of loops.
> 2. Stack limit checks imply heap limit checks.  These should be
>    separated.  This will remove the heap check from fib.sml.

Also, non-threaded code doesn't need a "looping" limit-check; if we can't
change threads, then a successful GC guarantees that this thread got the
memory it asked for.