Type-Preserving Garbage Collectors

Stephen Weeks MLton@sourcelight.com
Thu, 25 Jan 2001 13:23:15 -0800 (PST)


Hi Dan.  Henry mentioned your POPL talk to me, so I read through your paper --
it was very interesting.  Your paper mentions possible problems with code bloat
due to specialized gc code per type.  Also, Henry mentioned to me the possible
problem of icache pressure in the gc code with many different object types.
Just for fun, I instrumented MLton's runtime to count the number of different
object headers seen.  I was curious to know for a large program like a self
compile how many there are.  

Here are the measurements for the CPS syntax tree that is produced after all
optimization during a self compile.  The header column gives the object header
and the count column gives the number of objects with that header.

header		count
--------	------
00000001	302302
00020000	3
00008000	13035
80000001	859615
80000002	943930
80000003	285951
80000004	1437
80000005	5026
80000006	820
80000007	2503
80000008	1
8000000a	1002
80008000	90164
80008001	312513
80008002	530275
80008003	555661
80008004	45095
80008005	269
8000800a	387
80010000	1717
80010001	67396
80010002	4134
80018000	1
80038000	1

The upshot is that there aren't that many distinct headers.  There may be more
object types, because the object type has to recursively describe the object
types that its pointers point to.  But my guess is that isn't too bad either,
especially if you make the only the distinctions that are necessary for the gc,
i.e. between pointers and nonpointers.

Anyways, I was curious if you had made any progress in porting your stuff to a
later version of MLton and making it work on larger programs.

Also, I have one minor correction for the paper.  MLton uses a standard
*breadth* first Cheney copying collector, not *depth* first as it says in the
paper.