Subj : Re: Storing 20 million randomly accessible documents in compressed form To : comp.programming From : Gerry Quinn Date : Tue Sep 06 2005 12:34 pm In article <1125946156.818588.253910@g44g2000cwa.googlegroups.com>, gene.ressler@gmail.com says... > If you can use a standard database product (like Adaptive Server > Anywhere) that allows compressed databases, this problem disappears. > > If you can't, then your approach is reasonable (as is Jongware's). Not > sure that with 200,000 zip files you will get 1.7GB on a CD, though. > You'll need a compression ratio of well over 2 to 1. A single zip file > gets about 1.8 to 1 for average text. Have you done any tests to see > what kinds of results zip is getting on your data? That seems very bad for text. I don't doubt a custom compressor could quite easily be written to get well over 2:1, especially for text in just one or a few languages, and a limited number of symbols. Compression rations of 30 or so have been reported for English text, using schemes based on Huffman encoding. - Gerry Quinn .