[MLton-devel] types for RSSA

Sun, 17 Nov 2002 19:12:35 -0800

In thinking about how to implement the variant-tag optimization, I
came up with what I think is a good way to make RSSA (and eventually
MACHINE) type safe.  Here is an signature snippet of what I have in
mind.

--------------------------------------------------------------------------------
signature RSSA =
   sig
      structure Mtype:
	 sig
	    datatype t =
	       Char
	     | Int
	     | IntOrPointer of {enumSize: int,
				objects: {index: int} vector}
	     | Real
	     | Word
	 end
      structure MemChunk:
	 sig
	    datatype t =
	       T of {components: {mutable: bool,
				  offset: int,
				  ty: Mtype.t} vector,
		     width: int}
	 end
      structure ObjectType:
	 sig
	    datatype t =
	       Array of MemChunk.t
	     | Stack
	     | Tuple of MemChunk.t
	     | Vector of MemChunk.t
	 end
      structure Function:
	 sig
	    type t
	 end
      structure Program:
	 sig
	    datatype t = T of {functions: Function.t list,
			       main: Function.t,
			       objectTypes: ObjectType.t vector}
	 end
   end
--------------------------------------------------------------------------------

Mtypes ("machine types", i.e. simple types that can be held in
registers) are much as they are now, except that IntOrPointer replaces
the old Pointer type, and adds information about what is pointed to.
The indices in the objects vector index into the objectTypes array
that is part of the program.  This array is similar to what is now
used by the runtime system, with object header words containing an
index into an array of object descriptors.  The difference is that a
MemChunk allows the components to be arbitrarily layed out and
includes the full type of each component.  In particular pointer types
are completely specified.

In IntOrPointer, if the objects vector is empty, it is just an enum,
with variants 0, 1, ..., enumSize - 1.  If the enumSize is 0 the
objects vector has one element, then we have a tuple.  Otherwise, we
have a sum type where the variants can be distinguished by the object
type index, since it is disallowed for the same index to appear
twice.

Here's everything that I see that this gives us.

1. Typed RSSA, and eventually MACHINE, which of course should help us
avoid lots of backend errors.

2. Using the object type index is the variant tag.  This supports the
strategy of using consecutive object types for all the variants to
make dispatch fast, but also allows lots of other strategies (I can
see lots of fun tricks trying to make coercions between datatypes free
while maintaining the no duplicate tag requirement in a sum, although
I have no idea if that would be practically useful).

3. More flexible record layouts for memory chunks than just having all
nonpointer data followed by all pointers.  Although, since that
suffers from GC changes and additional GC costs, I would initially impose
restrictions so that the current GC could still work.

4. Flattening of array/vector elements, since vectors and arrays have
a MemChunk instead of an Mtype as their element type.

5. Flattening of refs into tuples, since object fields, can now be
mutable. 

6. Propagation of mutability information to the runtime via the
mutable field in the MemChunk.

7. Propagation of alignment information to the runtime (e.g. for
doubles).

8. Generation of more type-correct C, by usin a C structs for each
MemChunk and C union for IntOrPointer.  Perhaps this would give C some
aliasing info?

Thoughts?

-------------------------------------------------------
This sf.net email is sponsored by: To learn the basics of securing 
your web site with SSL, click here to get a FREE TRIAL of a Thawte 
Server Certificate: http://www.gothawte.com/rd524.html
_______________________________________________
MLton-devel mailing list
MLton-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlton-devel