word freq

Henry Cejtin henry@sourcelight.com
Wed, 13 Jun 2001 03:12:12 -0500


I  tried  an  alternative implementation of the word frequency thing: no hash
table.  The idea is that one pass of the merge  sort  can  merge  duplicates,
adjusting  the counts.  Now how do you do an incremental merge sort: you keep
a list (of sorted lists) where successive elements are twice as long.   I.e.,
the  first  element is empty or of length 1, the second is empty or of length
2, etc.  Now each time you get a new word, you look at the first slot.  If it
is  empty,  stick  it in.  If it isn't, merge it, stick the empty list in and
proceed to the next element.

It won't cost you more than n log n time, and there isn't  more  than  log  n
duplication.   Sadly,  it  isn't horrible, but it is definitely several times
slower.