wordfreq slowness explained

Stephen Weeks MLton@sourcelight.com
Tue, 12 Jun 2001 23:08:02 -0700


> Silly  bug: the reason that wordfreq is SO slow is that the hash that you use
> is completely broken.  It produces exactly 26  different  hash  values.   The
> reason for that is that in the `loop' function inside it, you do not loop.

Whoops.  That got translated from a String.fold when extricating the code from
my basis into the insane standard basis that doesn't have a String.fold.

> I  fixed  this,  and  the  CPU time went down from .76 CPU seconds to .13 CPU
> seconds.  Assuming that this ratio  holds  on  Doug's  machine,  we  will  be
> exactly the same speed as Perl, but still worse than half the speed of OCaml.

With the MLton Doug has (20000906-2), I see slighlt less than 50% running time
after the fix.  With the latest MLton, I see jut under 30%.

> I'm going to hack a different implementation that is  less  strange  and  see
> what can be done.

Please do.  I'm sure there's a lot of room there for improvement.  I'm going to
try probe hashing and double hashing.