[MLton] MLton.hash deeply flawed

Matthew Fluet matthew.fluet at gmail.com
Wed Jan 6 07:10:57 PST 2010


On Sun, Nov 1, 2009 at 7:06 PM, Wesley W. Terpstra <wesley at terpstra.ca> wrote:
> Strings that differ in only a few places don't get unique hash values
> from MLton.hash. In a program where I tried to use MLton.hash I had 38
> collisions out of 8325 distinct input strings. Not good.
>
> val x = "klahjflaskjflaksjfgklajsglkasjglaksjglaksjglaksgjaklsgaslkgjaslgkjas"
> val y = "klahjflbskjflaksjfgklajsglkasjglaksjglaksjglaksgjaklsgaslkgjaslgkjaS"
>
> val () = print (Word32.toString (MLton.hash x) ^ "\n")
> val () = print (Word32.toString (MLton.hash y) ^ "\n")

A late reply, but the rationale was that the application that prompted
the introduction of MLton.hash required a constant time hash function.
 So, MLton.hash only looks at structures to a fixed depth (default 16)
and samples vectors.  A complete, linear time, hash would be useful
too.



More information about the MLton mailing list