commit email: explicit arrays and limit checks

Stephen Weeks MLton@sourcelight.com
Sun, 17 Feb 2002 18:12:17 -0800


This is a merge of the explicit-arrays-and-limit-checks-branch.  In particular,
it contains code that moves most of the complexities of implementing arrays and
limit checks out of the codegens and into the backend.  In particular, for
arrays, all of the following are now done in the backend:
	number of bytes computation
	word alignment of number of bytes
	initialization of pointer arrays
	space for forwarding pointer for zero length arrays
For limit checks, the LimitCheck transfer was removed from RSSA and MACHINE.
Now there explicit comparisons of the frontier to the limit, and explicit calls
to GC_collect.  Also signal checks explicitly test if the limit = 0.

For arrays, there are now three primitives:
1. array0Const
	for zero length arrays that should be constant lifted
2. array0
	for zero length arrays (of any type)
3. array
	for arrays of length greater than 0
array0Const is like the old array0 -- it is implemented in the constant
propagation pass, and is replaced with calls to (the new) array0 that define
globals.  The new array0 is implemented in the backend, by inserting the
appropriate number of bytes for a forwarding pointer.  The array primitive is as
before, except that it is guaranteed not to be called on zero length arrays.
This means that the array primitive does not have to test for zero, as was done
in the old codegens -- instead the test is done in the basis library, and may be
simplified away by the SSA simplifier.  So, the Array statement in MACHINE has a
much simpler semantics.  It simply sets the header, the number of elements, and
bumps the frontier.  Initialization loops for pointer arrays are inserted by the
backend.

The backend now inserts code for computing the number of bytes used by an
array.  This required adding Word32.{addCheck,mulCheck} primitives, which are
like their Int counterparts, except that mulcheck uses unsigned arithmetic,
unlike Word32.*, which use signed multiply.  Perhaps the names should reflect
the unsigned aspect.  Anyways, these overflow-checking word primitives are
necessary because these computations of array bytes may overflow because
Array.maxLen is now 2^31-1.  I went ahead and added mulCheck and addCheck to
MLton.Word, mostly so could add them to the regressions.  This change also meant
that Arith transfers in SSA, RSSA, and MACHINE now need a type field, because
they can be int or word.  If the number of elements in the array is known at
compile time, the multiplication and word alignment is computed then.  If this
computation overflows at compile time, the backend inserts an explicit call to
allocTooLarge.

For limit check insertion, there is a long comment at the top of limit-check.fun
that explains the various tests.  The main changes involved making the
GC_collect primitive be of type word * int -> unit, where the word is the number
of bytes being requested and the int is really a boolean that says whether to
force or not.  Also, I added operands to refer to various parts of the GC state
(frontier, limit, ...) from within RSSA and MACHINE.  Wrt arrays, it was nice
that the number of bytes is now explicit, since it can be shared by both the
limit check and the array allocation.  The limit check insertion pass will
presumably handle -gc-check {first, every}, but I haven't done it yet.

There is one bug remaining, with the x86-codegen replacing Word32_addCheck of
one with an increment, which doesn't preserve the carry flag, but Matthew will
fix that soon.