[MLton] Type of _address?

Wesley W. Terpstra wesley@terpstra.ca
Thu, 21 Jul 2005 21:23:49 +0200


On Thu, Jul 21, 2005 at 08:50:00AM -0700, Stephen Weeks wrote:
> > I am considering changing the _address syntax from:
> > 	_address "x": MLton.Pointer.t;
> > to:	_address "x": MLton.Pointer.t, int;
> 
> I am still not convinced of the need for this change.  I'm not against
> it; I probably just don't understand the C world well enough.

There are lots of issues I also don't know in this area.
I just know that the rule of thumb for bug free code is to avoid casts.

I've heard of systems where pointers to data and code are distinct.
ie: (int*)4 cannot be represented as a (void(*)(int)). Also, I know 
that alignment matters, so if you cast a (char*) to an (int*) and try
to dereference it, you can segfault. There's also a lot of leeway for
compilers to pull the rug from under your feet, because casts to/from
anything other than char are not guaranteed to even be reversible, afaik
(not to mention anything you do has unspecified results).

> Like Matthew, I find it disconcerting that the base type doesn't appear
> in the result. Even more disconcerting, the MLton internals don't
> make use of it in any way. 
>
> Are we doing something wrong? 

Definitely; the C codegen is completely broken.
The output C breaks aliasing rules left, right, and center.

It happens to work because the current gcc is very forgiving. However, 
if we ever try to turn up optimization (or use newer gcc's) this will
quite likely cause some serious trouble. That said, I have no idea how
to go about fixing this given that MLton has it's own memory layout.

> If it matters so little to us, why does it matter so much to C?

C assumes that if two pointers point to different types then they cannot
alias. If you lie to the C compiler about the type, and then cast it and 
use it elsewhere, you can get subtle bugs that cause much grief.

afaik this is not portable (even when sizeof(int) = sizeof(long)):
	int x = 5;
	long* y = (long*)&x;
	*y = 7;
	printf("%d\n", x); // the output of this is undefined

> In any case, if we go this way, because _address and _symbol are now
> specified identically, I feel it argues more for my earlier
> proposal that folds _address into _symbol
> 
>   _symbol "symbol" [define]: ptrTy, cbTy; 

I think you will either want the address or the getter and setter, but
very rarely both. Trying to push _symbol "x" to match _symbol * and 
_address seems to me like a push for symmetry that isn't there.

OTOH, I suppose you might pass a function pointer to a C method, and want
to keep the getter/setter handy. However, by forcing the pointer into the
tuple in order to get a uniform ptyTy, cbTy you break the symmetry of
_symbol and _symbol * which right now both have 2 elements. Your way, one
will have 3 and the other 2.

> this has the nice benefit that from the SML programmers, there is no
> disconcerting unused type.

SML programmers should be disconcerted if they start using void* pointers.

-- 
Wesley W. Terpstra <wesley@terpstra.ca>