Subj : Re: Storing 20 million randomly accessible documents in compressed form To : comp.programming From : Alex Fraser Date : Thu Sep 08 2005 11:22 am "Joe Wright" wrote in message news:1YqdnZ1EH_f7F4LeRVn-2Q@comcast.com... > Michael Wojcik wrote: > > In article <431e7f8c$0$17734$afc38c87@news.optusnet.com.au>, "DarkD" > > writes: > >>"Gene" wrote in message > >>news:1125946156.818588.253910@g44g2000cwa.googlegroups.com... > >> > >>>A single zip file gets about 1.8 to 1 for average text. > >> > >>1.8 to 1? I think you are thinking of the ratio for random ASCII > >>display characters. Typical compressed books etc. have a huge ratio of > >>about 30:1 > > > > They do not, unless the source representation is extremely bloated. > > > > I just did a couple of tests with large, highly-redundant ASCII > > documents (the Perl 5 change log, for example) and gzip -9 just to > > confirm, and didn't see anything better than about 5:1. > > > > If you believe otherwise, cite a source. > > I have a 'folder' of 392 program files (*.c) comprised of 366,692 bytes. > Using my favorite zipper.. > > pkzip x.zip *.c > > I find x.zip to be 191,539 bytes. That's about 1.91 compression on text > files. Each file is independently compressed, and the average size is under 1KB. If you concatenated the files you would probably (depending partly on compression settings) get significantly better compression. 4:1 would not be unusual if there are relatively few bytes in comments/string literals. Alex .