Subj : Re: Storing 20 million randomly accessible documents in compressed To : comp.programming From : Joe Wright Date : Wed Sep 07 2005 09:48 pm Michael Wojcik wrote: > In article <431e7f8c$0$17734$afc38c87@news.optusnet.com.au>, "DarkD" writes: > >>"Gene" wrote in message >>news:1125946156.818588.253910@g44g2000cwa.googlegroups.com... >> >>>A single zip file gets about 1.8 to 1 for average text. >> >>1.8 to 1? I think you are thinking of the ratio for random ASCII display >>characters. Typical compressed books etc. have a huge ratio of about 30:1 > > > They do not, unless the source representation is extremely bloated. > > I just did a couple of tests with large, highly-redundant ASCII > documents (the Perl 5 change log, for example) and gzip -9 just to > confirm, and didn't see anything better than about 5:1. > > If you believe otherwise, cite a source. > Hi Michael I have a 'folder' of 392 program files (*.c) comprised of 366,692 bytes. Using my favorite zipper.. pkzip x.zip *.c I find x.zip to be 191,539 bytes. That's about 1.91 compression on text files. Another test of a large .dbf table was better.. 08/22/2005 05:03 59,317,319 MBRS.DBF 09/07/2005 20:21 6,141,204 MBRS.ZIP ...for a 9.66 compression. The .dbf format is rich in space characters and the zip algos are very efficient at taking care of oft repeated bytes. 10:1 is about the best compression I have seen. So somewhere between 2:1 and 10:1 seems rational in my small world. I need a lot more information about 30:1 compression. -- Joe Wright "Everything should be made as simple as possible, but not simpler." --- Albert Einstein --- .