Subj : Re: Searching 40 Million short strings fast To : comp.programming From : Arthur J. O'Dwyer Date : Tue Sep 06 2005 03:40 pm On Tue, 6 Sep 2005, Willem wrote: > > Peter wrote: > ) The most important optimization I do is to look for duplicate suffixes. > ) Rather than store five links for "bring" and six for "string," both > ) can link to a unique "ring" sequence, which reduces the total number of > ) nodes from 390158 to only 52904. > > And what if you want to add "brings" ? You'd have to separate the links > again. Don't you need extra housekeeping to tell that one node is pointed > to from two locations ? You could do that, but in Peter's case I bet we'd just add a new node with the data "brings". Since nodes are never split up, we don't need any extra housekeeping on the nodes that are pointed to multiple times. I haven't been following this whole thread, but has anyone mentioned the keyword "suffix tree" yet? :) That seems to be basically what Peter is describing. As for the OP's problem, I agree with whoever said "hash table." -Arthur .