Type-Preserving Garbage Collectors

Daniel Wang danwang@CS.Princeton.EDU
25 Jan 2001 17:34:18 -0500


"Stephen Weeks" <sweeks@intertrust.com> writes:

> Hi Dan.  Henry mentioned your POPL talk to me, so I read through your paper --
> it was very interesting.  Your paper mentions possible problems with code bloat
> due to specialized gc code per type.  Also, Henry mentioned to me the possible
> problem of icache pressure in the gc code with many different object types.
> Just for fun, I instrumented MLton's runtime to count the number of different
> object headers seen.  I was curious to know for a large program like a self
> compile how many there are.  
> 
> Here are the measurements for the CPS syntax tree that is produced after all
> optimization during a self compile.  The header column gives the object header
> and the count column gives the number of objects with that header.
> 
> header		count
> --------	------
> 00000001	302302
> 00020000	3
> 00008000	13035
> 80000001	859615
> 80000002	943930
> 80000003	285951
> 80000004	1437
> 80000005	5026
> 80000006	820
> 80000007	2503
> 80000008	1
> 8000000a	1002
> 80008000	90164
> 80008001	312513
> 80008002	530275
> 80008003	555661
> 80008004	45095
> 80008005	269
> 8000800a	387
> 80010000	1717
> 80010001	67396
> 80010002	4134
> 80018000	1
> 80038000	1
> 
> The upshot is that there aren't that many distinct headers.  There may be more
> object types, because the object type has to recursively describe the object
> types that its pointers point to.  But my guess is that isn't too bad either,
> especially if you make the only the distinctions that are necessary for the gc,
> i.e. between pointers and nonpointers.

Hey, this is nice fodder for my thesis... thanks... There are some tricks
one can play to reduce the number of distinct object types so, i think
you're right that this is perhaps less of an issue than it seems in practice...


> Anyways, I was curious if you had made any progress in porting your stuff to a
> later version of MLton and making it work on larger programs.

Haven't had time to try.. I tried a while, back and noticed that most of the
changes were reording of code in different places... but the one change that
confused me was the removal of polymorphic primitive operators. My code
relied on them to do something dirty, and I didn't have time to think about
how to fix it. I'm still trying to spend most of my time cranking out TeX
rather than new numbers... but when I get back into SML hacking mode
... I'll let you know

> Also, I have one minor correction for the paper.  MLton uses a standard
> *breadth* first Cheney copying collector, not *depth* first as it says in the
> paper.

Really, I must have misunderstood the code at some point... I'll make sure
hmmm... the BFS/DFS difference might explain some more cases where my safe
collector is faster, becuase of locality issues...