[MLton] bug with packed representations

Stephen Weeks MLton@mlton.org
Fri, 23 Apr 2004 11:30:14 -0700


> The type-theory weenies, typically distinguish between "casting" and 
> "coercing", where casting a value to another type is a nop with respect to 
> representation, but a coercion may make a representation change. If the 
> primop approach doesn't work (I imagine you need a family of primops to do 
> this right). I'd suggest introducing a new constructor distinct from "Cast" 
> and call it "Coerce".

I agree with Matthew -- our WordX_toWordY primitives handle coercions
nicely.  It was simply a bug in the packed representation stuff that
led to the cast problem.

The Rssa and Machine ILs already have subtyping which lets one change
the type of a value in a safe way.  For example, the type 0x1:32
(which is a singleton type whose only value is the 32-bit integer 1)
is a subtype of the sum type 0x0:32 + 0x1:32.  We use casts to change
the type of a value (not its representation) when the type system
isn't good enough.  Here are the casts we do.

* from a sum type to one of the summands.  This is in general unsafe,
  but the idea is that the representation pass introduces the casts
  only immediately after a test on the tag bits.  So, a little local
  reasoning can prove them safe, and maybe even someday the type
  checker will.

* from an IntInf to a vector or a word.  These are unsafe casts that
  are used in the implementation of IntInf.  We rely on our SML code
  in the basis implementation of IntInf to get these correct.
  However, the new packed representation pass will now do exactly the
  right thing for the type

	datatype t = Big of word vector | Small of Int31.int

  That is, it will represent IntInf exactly as we do now, with a
  single tag bit, where 0 indicates a word vector or 1 indicates a
  31-bit int.  So, we could conceivably clean up our IntInf
  implementation to use this.

* from a sum type to a word so that we can look at the tag bits.  This
  is safe.

* from an array to a vector.  This one is weird because it is
  connected with a change of the object's header.  All quite unsafe.
  We rely on the SML code in the basis library to get this right.

* from a heap pointer to an internal pointer.  This is done for array
  address computations which are unsafe.

* between words and ints.  These should be totally safe.