Subj : Searching 40 Million short strings fast To : comp.programming From : M.Barren Date : Mon Sep 05 2005 10:38 pm Hi, I am writing a program that queries a website and extracts certain information from result pages and writes them in a file. These informations are short strings of ascii characters (between 10 to 12 chars) and number of them will eventually be around 40 million. There is a little problem though. Since the searches don't neccersily return unique results, I might get certain results over and over again and if I add them to my file, I will have duplicates which due to the nature of search mechanism of that certain website, will double or even triple the amount of collected data. I want to be able to check if I already have a certain string in my file very quickly. Indexing strings happens to be one of the only (if not the only) way of doing this. But I don't want to use a whole database server for a temporary data collection stage of this program. SqLite is an interesting choice but I still rather use even a simpler method that I can myself implement that is light-wight, fast and specilized *just* to do what i want. If you have any ideas, I'll be more than happy to hear it. Michael .