[MLton-devel] Stop-and-copy / mark-compact switch criterion

Fri, 16 Aug 2002 17:45:02 +0200 (MET DST)

Stephen,

A follow up on the GC switch criterion discussion. I agree that
contrary to what I initially suggested, the criterion proposed by
you seems preferable as it certainly avoids paging.

However what I was trying to do was to address a behaviour that I
found surprizing: during a self compile with gc-messages on on a
1Gb machine, there are first stop-and-copy GCs (first third on the
run), followed by a transcient allocation peak which causes a
switch to mark-compact. But then even when live/heap decreases
under 9% the GC still chooses mark-compact. This seems suboptimal
to me.

I think I found a better way to improve the switch criterion: in
fact the precursor to the S&C vs. M&C decision lies in
computeSemiSize(...). This routine nominally assigns a desired
heap size of LIVE_RATIO*live ( = 8*live).

I am not sure this is always optimal as I think that this code
indirectly forces the GC mode to mark-compact as soon as heap/live
<= 9 (where heap = size of RAM * ramslop). Assume heap/live = 9.
This routine will select a fromSpace size of 8/9*heap at the end
of the GC cycle. The mutator runs. New GC, assuming live has not
changed, we meet the criterion for switing to mark-compact as
fromSize = 8/9*heap, which if we add live = 1/9*heap yields
heapsize, thus causing a switch to mark-compact.

Intuitively, if the heap/live ratio is for instance of 5, it seems
preferable to use stop-and-copy with a smaller fromSpace (small
enough to ensure that fromSize+live fits the heap) instead of
using a fromSpace that occupies the entire heap (which forces us
to use mark-compact) as the former is presumably much cheaper that
the later. In the first case we will GC slightly more often with a
cheaper GC.

Can we formalize this ? I propose the following low-cost
modelisation:

------------------------------------------------------------------

Summary:  a  proposal  for  an  improved  criterion  for  choosing
semispace   sizes  that   can   decrease  GC   time  by   favoring
stop-and-copy without paging more.

Definitions:

m	total bytes allocated by mutator
heap	ram size * ramslop
live	live data set in bytes
E       cost(mark-compact)/cost(stop-and-copy) on the same heap
k	heap/live

Assumption: heap>=2*live (we do not consider here paging cases for
which is is clear that mark-compact collection is faster).

1) after  one  stop-and-copy   gc,  there  are  heap-2*live  bytes
available for the mutator (assuming the entire heap is used).

2) after one mark-compact gc, there are heap-live bytes available.

3) if we use only mark-compact, there will be about:

		ngc(m&c) = m/(heap-live)

gcs for the entire computation. Similarly:

		ngc(s&c) = m/(heap-2*live)

The relative efficiency of mark-compact is thus:

		R=(ngc(m&c)*E)/ngc(s&c)

Elementary calculations yield:

		R=(m*E/(heap-live))/(m/(heap-2*live))
		 =(E/(heap-live))/(1/(heap-2*live))
		 =E*(heap-2*live)/(heap-live)
as heap=live*k:
		R=E*(k*live-2*live)/(live*k-live)
		R=E*(k-2)/(k-1)

Numerical validation:

Let E=3 (plausible). We have the following:

k	R
------------
10	2.66
5	2.25
4	2
3	1.5
2.6     1.12
2.5	1.0
2.4	0.86
2.1	0.27

For instance,  this model  predicts that if  the cost of  a single
mark-compact  collection  is three  times  the  cost  of a  single
stop-and-copy  (E=3), then  if the  live ratio  k =  4  then using
mark-compact  will  globally  cost  2  times  the  cost  of  using
stop-and-copy.   If   k<=2.5  then   this   model  predicts   that
mark-compact will yield better performances than stop-and-copy.

End of numerical validation.

We now want  to find analycally for which value of  k we have R=1,
as this will be the threshold. Elementary calculation yield:

		R=1
                <=> E*(k-2)/(k-1)=1
		<=> k=(2*E-1)/(E-1)

Tabulating this function  k of E yields the  following table which
under the  hypotheses above gives  for each value of  the relative
efficiency factor E  the live ratio k up to which  it is faster to
use stop-and-copy.

E	k
------------
1.5     4.0
2	3.0
2.5	2.66
3	2.5
3.5	2.4
4	2.3

For  instance, if  an individual  mark-compact GC  cost 3  times a
stop-and-copy GC, then for live sets of up to heap/2.5 bytes it is
preferable to use stop-and-copy.

This confirms  the intuition  that unless the  live set  is fairly
large, then stop-and-copy is better.

As regards the implementation of this idea, the function computeSemiSize in gc.c (version 1.60):

static W32 computeSemiSize (GC_state s, W64 live) {
	W32 res;

	res = min64 (s->totalRam + s->totalSwap,
			max64 (live * LIVE_RATIO_MIN, 
				min64 (s->ramSlop * s->totalRam,
					live * LIVE_RATIO)));
	res = roundPage (s, res);
	...
	return res;
}

The idea is to choose at most for res (in nominal cases):

1) heap-live (that is ramSlop*totalRam) if we want stop-and-copy to be possible on the next GC cycle
2) heap if we favor mark-compact.

This yields the following code:

static W32 computeSemiSize (GC_state s, W64 live) {
	W32 res;
	double ratio, heap, cutoff, relative_cost_of_mark_compact_collection;

	relative_cost_of_mark_compact_collection = 3; // the E term. Can be adjusted

	cutoff = (2*relative_cost_of_mark_compact_collection-1)/(relative_cost_of_mark_compact_collection-1);

	heap = (s->ramSlop * s->totalRam);

	if (heap/live >= cutoff) /* computed analytically assuming that a mark-compact GC costs
				    RELATIVE_COST_OF_MARK_COMPACT_COLLECTION times a stop-and-copy GC */
	  {
	    ratio = 0.99*((heap/live)-1);
	    if (ratio > LIVE_RATIO) 
	      ratio = LIVE_RATIO;
	  }
	else
	  ratio = LIVE_RATIO;

	res = min64 (s->totalRam + s->totalSwap,
		     max64 (live * LIVE_RATIO_MIN, 
			    min64 (s->ramSlop * s->totalRam,
				   live * ratio)));
	...
	return res;
}

On  a self-compile  (on a  P4, 2.2GHz,  1Gb RAM)  this  yields the
following  timings, with visibly  much more  stop-and-copying than
mark-compactions.

with new tuning:
================

GC time(ms): 36,150 (18.8%) 178.67user

with old tuning
===============

GC time(ms): 59,410 (30.2%) 210.78user

Feedback welcome of course.

Stephen, please feel free to integrate this in your current branch
if you agree (perhaps under the control of a option if you want to
minimize interference with your current developments).

--
Alain Deutsch, CTO              tel.: +33 (0)1 49 65 32 64
PolySpace Technologies          fax.: +33 (0)1 49 65 05 77
mailto:deutsch@POLYSPACE.COM    http://www.polyspace.com

-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone?  Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
MLton-devel mailing list
MLton-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlton-devel