CVS Commit

Matthew Fluet Matthew Fluet <fluet@CS.Cornell.EDU>
Wed, 12 Sep 2001 18:58:23 -0400 (EDT)


> > I'm not exactly sure what's going on in SML/NJ.  What do you mean by
> > "faked registers"?  
> 
> I think Henry is referring to the area of memory that we use for pseudo
> registers and is suggesting dedicating a register to refer to the start of that
> area.  Or, as an in between position, occassionally using a register to point to
> the start of that area and using the register across multiple loads/stores.

And from Henry:

> Sorry for the confusion.  By `fake registers' I was thinking of the memory
> locations that we pretend are registers.  They are (I thought) just a bank
> of memory locations.  At least that is what SML/NJ does/did.  The point is
> that in getting to those locations, which obviously you do a lot, you could
> just use the absolute memory address, which is large and perhaps slow, or
> you could dedicate a register to point at the block of memory which contains
> the fake registers.  Especially if the register you dedicate is the fp
> register, this makes storing or loading from a fake register very compact
> and perhaps faster.

Oh.  You mean the fact that the addressing mode for a small integer
constant off of a register is a smaller instruction than an offset from a
static label.

Right now any access to a psuedo-reg is done using offset from a static
label.  So, you can grep through assembly and look for things like
movl (localint+(0*4)),%eax

Could we switch to the alternate addressing mode?  It wouldn't be too
hard.  What I would first do is to change backend (and the translations to
C and x86) to combine all types of pseudo-regs into one block.  That
shouldn't be too hard -- just add and remove psudo-regs like we do stack
slots.  Then, rather than recording the number of regs of each type need,
just keep track of the largest size of the block (like max-frame-size).
This might not be a bad idea in general -- it would save a little bit of
space, because one CPS function that uses a lot of integers and another
that uses a lot of doubles wouldn't acrue disjoint pseudo-reg sets.  Hell,
we could even tack these onto the end of the statically allocated globals.
(That is, access to pseudo-regs and globals would all be indices off of
the same base label/registers.)

If we do all that, then it's would be pretty easy to tweak the translation
to x86 to either compute pseudo-regs as offset from static label or as
offset from memloc holding base (which would be compiled to offset from
base register).  Then we could compare.

There are certainly trade-offs.  Dedicating a register to the psuedo-reg
base would make three dedicated registers (stack-top, frontier, and
pseudo-reg base).  Looking at the assembly for conv2.sml (from
regressions), I get the following:
[fluet@lennon tests]$ wc -l conv2.0.S 
  18367 conv2.0.S
[fluet@lennon tests]$ grep "\t" conv2.0.S | wc -l
   4117
[fluet@lennon tests]$ grep local conv2.0.S | wc -l
     89
[fluet@lennon tests]$ grep global conv2.0.S | wc -l
   2994

That's kind of telling.  (grep global actually overcounts by 9 because of
.global directives.)  So, "spilling" back and forth to pseudo-regs isn't
that significant, but loading globals is very significant.  (conv2.sml is
kind of bogus for this comparison, because it's got a gazillion string
constants.)  Here's the same with vector-concat:
[fluet@lennon mlton-2]$ wc -l vector-concat.0.S 
    571 vector-concat.0.S
[fluet@lennon mlton-2]$ grep "\t" vector-concat.0.S | wc -l
    166
[fluet@lennon mlton-2]$ grep "local" vector-concat.0.S | wc -l
      0
[fluet@lennon mlton-2]$ grep "global" vector-concat.0.S | wc -l
     29

Still, accessing globals is significant.