Subj : Re: Searching 40 Million short strings fast To : comp.programming From : Chris McDonald Date : Tue Sep 06 2005 08:32 am M.Barren@gmail.com writes: >Thanks for your replies, >I believe Tries are the best choice here since I just want to check >existance of a certain string and don't want to associate any data with >that string (as of a hashtable). >Peter, the format I came up after reading some pages about trie >structures , consumes 9 bytes/trie entry (4 bytes for address of child >node, 4 bytes for the address of the next sibling entry, 1 byte of >data). >1. Having strings with length 10 to 12, the data itself will take up >about: 11 x 40M = 440MB. >2. The Trie data will take up about: 9 x 40M =~ 360MB (taking into >account only the bottom childs in the structure) >Space is not a concern in my case, but can you let me know what format >your trie representation uses that is almost half the size of the >actual data ? Hello Michael, It sounds like you're working on a machine with 4-byte pointers, so I suspect that that 9-byte structure will probably require 12 bytes once allocated, and even more if you use a standard implementation of (I'm guessing C here) malloc(). If your language and compiler allow you to align pointers on any byte boundary, you may get away with just 9 bytes, if you undertake your own allocation and management. -- Chris. .