MLton.size bug

Stephen Weeks MLton@sourcelight.com
Thu, 26 Apr 2001 17:35:16 -0700 (PDT)


> So, I looked into this a little bit.  The remove-useless pass is also
> causing complications with MLton_size.

I'm not sure I agree that your example demonstrates this (although I did find an 
unrelated bug in remove-useless).

> val printInt = _ffi "printf" : string * int -> int;
> val printString = _ffi "printf" : string * string -> int;
> 
> datatype t = T of int list
> 
> val l = List.tabulate(100, fn i => i)
> val s_l = MLton.size l
> val _ = printInt ("%i\n\000", s_l)
> val x = T l
> val s_x = MLton.size x
> val _ = printInt ("%i\n\000", s_x)
> val total = List.foldl (op +) 0 (case x of T l => l)
> val _ = printInt ("%i\n\000",total)
> 
> I modified useless.fun to do a deepMakeUseful on all the arguments of a
> MLton_size primitive, but the program still yields:
> 
> 1200
> 0
> 4950

I think that this is acceptable behavior.  Although T is used as a destructor in
the source program, that use is simplified away by an optimization pass.  Hence,
T is never used in a pattern match anywhere in the program that remove-unused
sees.  Hence remove-unused concludes that T is useless.  This is OK with me.  I
don't want a call to MLton.size to affect the results of analyses, and hence
change the size of the object it is called on.

> What I discovered was that remove-unused determines that the T
> constructor is useless, and replaces it with a dummy constructor:
...
> 
> Simplify types goes on to change the t_11 type to unit.

Right.  This is all OK.

> Does this have to do with the fact that MLton_size is one of the few
> polymorphic primitives?  

remove-unused doesn't treat MLton_size specially because it doesn't need to.  I
don't see that polymorphism is relevant.

> I notice that remove-unused.fun specially treats
> MLton_eq and MLton_equals -- presumably because a datatype whose elements
> are only every compared for equality cannot be eliminated by collapsing
> the entire datatype into a single dummy constructor.

Right.  remove-unused treats MLton_{eq,equal} specially because polymorphic
equality implicitly contains uses of constructors in patterns.  The idea is that 
the decon function in remove-unused marks the constructors that would be used
by poly-equal.  In writing this and reading the code, I now see that decon is
buggy, because it doesn't recur on the argument types of constructors.  For
example, try feeding the following program to MLton.

  datatype t = A | B
  datatype u = C of t
  val _ = print (concat [Bool.toString (C A = C B), "\n"])

Sadly, MLton will print "true".  This is because the decon function doesn't mark
A and B as being deconstructed.  It should be a simple fix to add the
constructor argument type to the conInfo field and have decon recur on this
type.  It's worth reading through poly-equal and making sure all the other cases
are caught as well.

> Anyways, I tried mimicing the MLton_equals case, and that results in the
> expected output for the above program, still using deepMakeUseful. 

That makes sense, but I don't like that solution. (btw, it's "mimicking")

> On the
> other hand, if I don't use deepMakeUseful, then the second item is still
> 0, because useless changes the type of t_11 to t_11 = T_10 and sets
> val x_1292: t_11 = T_10 ()
> val x_1293: t_11 ref = Ref_ref(t_11) (x_1292)
> val x_1294: int = MLton_size(t_11) (x_1293)

Right.  useless does this for the same reason that remove-unused does -- T is
never used as a destructor, hence there's no need use a constructor argument.

> Not being too familiar with the useless pass,
...
Here's a brief note that might help.

The idea is that you can coerce something that's more useful into something
that's less useful (e.g. a tuple with three useful components can be converted
to a tuple with only two of those components as useful).  Because refs (and
arrays) are mutable, you can't coerce underneath them, so when a ref is coerced
to another, they must be exactly the same (line 147 of useless.fun).

...
> this is my take: MLton_size
> shouldn't mark it's arguments as useful, but it seems to me that when we
> rewrite the program, we could change the type of the MLton_size call to
> the useful-type of it's argument and extract the useful components of the
> argument. 

This sounds vaguely reasonable.  It is complicated by the fact that MLton.size
actually wraps its argument in a ref before passing it to MLton_size.  So you
have to extract and then rewrap with the ref.

One thing we have to do is to figure why the hash table size is zero in the
compiler -v3 output.  I'm not sure whether it's for the same reason as your
example above or some other funny interaction with the useless analysis.  If the 
former, then we may need to rethink size altogether.

> Is this similar to what is done with arrays when some of their
> components aren't useful?

I don't see how.  The array tycon take a single arg.  If the values stored in
the array are determined to be useless, then that arg is turned into unit (lines
316-318 of useless.fun).