good in general, but bad nested loops

Stephen Weeks MLton@sourcelight.com
Tue, 10 Jul 2001 16:49:09 -0700


> But,
> another aspect of this is the transformation that does the branch at the
> end of the loop, rather than at the top.  I think(?) that MLton generally
> puts these branches at the top; CPS would seem to be the right place to
> unroll and reorder these loops.

Unfortunately it's a bit complicated, due to the SSA single-assignment
condition, AKA uniqueness of variable bindings.

Here is the hot loop in cps.

   fun loopF_0 (x_179, x_180) = 
      let
	 val x_183 = MLton_eq (x_180, 0)
      in
	 case x_183 of
	   false => L_85 | true => L_84
      end
   fun L_84 () = 
      loopE_0 (x_179, x_182, x_181)
   fun L_85 () = 
      let
	 val x_177 = Int_sub (x_180, 1)
	 val x_178 = Int_add (x_179, 1)
      in
	 loopF_0 (x_178, x_177)
      end

Note that this is the new layout format that puts all labels at the same level,
so that lexical nesting is not syntactically apparent.  Anyways, if we would
like to unroll loopF once by inlining the call within L_85, we can, but it
requires introducing new formal parameters to express the fact that L_84 and
L_85 are now a join points. 

   fun loopF_0 (x_179, x_180) = 
      let
	 val x_183 = MLton_eq (x_180, 0)
      in
	 case x_183 of
	   false => L_85' | true => L_84'
      end
   fun L_84' () = L_84 (x_179)
   fun L_84'' () = L_84 (x_179')
   fun L_84 (x_179'') = 
      loopE_0 (x_179'', x_182, x_181)
   fun L_85' () = L_85 (x_179, x_180)
   fun L_85''() = L_85 (x_178, x_177)
   fun L_85 (x_179', x_180') = 
      let
	 val x_177 = Int_sub (x_180', 1)
	 val x_178 = Int_add (x_179', 1)
	 val x_183' = MLton_eq (x_177, 0)
      in
	 case x_183' of
	   false => L_85'' | true => L_84''
      end

Blech.  I probably need to read more about SSA.

Anyways, I agree with you that this optimization can and should be expressed at
the CPS level.  Hopefully it's easier than it appears to me right now.

Also, in looking at this example, I see it could be expressed more concisely if
we changed CPS to allow non-value-carrying variants in case transfers to have a
jump to a label with args, instead of just a jump to a nullary label.  Maybe
that would be nice to have in CPS just for compactness reasons.