word freq

Stephen Weeks MLton@sourcelight.com
Wed, 13 Jun 2001 14:31:04 -0700


> I  tried  an  alternative implementation of the word frequency thing: no hash
> table.  The idea is that one pass of the merge  sort  can  merge  duplicates,
> adjusting  the counts.  Now how do you do an incremental merge sort: you keep
> a list (of sorted lists) where successive elements are twice as long.   I.e.,
> the  first  element is empty or of length 1, the second is empty or of length
> 2, etc.  Now each time you get a new word, you look at the first slot.  If it
> is  empty,  stick  it in.  If it isn't, merge it, stick the empty list in and
> proceed to the next element.

Interesting.  There is a similar implementation of merge sort in MLton's library
(src/lib/mlton/basic/merge-sort.sml). 
 
> It won't cost you more than n log n time, and there isn't  more  than  log  n
> duplication.   Sadly,  it  isn't horrible, but it is definitely several times
> slower.

Too bad.  Oh well, we are the fourth fastest now on wordfreq anyways.  But
there's probably still room for improvement.