Subj : Re: Storing 20 million randomly accessible documents in compressed To : comp.programming From : Joe Wright Date : Thu Sep 08 2005 10:22 pm Alex Fraser wrote: > "Joe Wright" wrote in message > news:1YqdnZ1EH_f7F4LeRVn-2Q@comcast.com... > >>Michael Wojcik wrote: >> >>>In article <431e7f8c$0$17734$afc38c87@news.optusnet.com.au>, "DarkD" >>> writes: >>> >>>>"Gene" wrote in message >>>>news:1125946156.818588.253910@g44g2000cwa.googlegroups.com... >>>> >>>> >>>>>A single zip file gets about 1.8 to 1 for average text. >>>> >>>>1.8 to 1? I think you are thinking of the ratio for random ASCII >>>>display characters. Typical compressed books etc. have a huge ratio of >>>>about 30:1 >>> >>>They do not, unless the source representation is extremely bloated. >>> >>>I just did a couple of tests with large, highly-redundant ASCII >>>documents (the Perl 5 change log, for example) and gzip -9 just to >>>confirm, and didn't see anything better than about 5:1. >>> >>>If you believe otherwise, cite a source. >> >>I have a 'folder' of 392 program files (*.c) comprised of 366,692 bytes. >>Using my favorite zipper.. >> >>pkzip x.zip *.c >> >>I find x.zip to be 191,539 bytes. That's about 1.91 compression on text >>files. > > > Each file is independently compressed, and the average size is under 1KB. If > you concatenated the files you would probably (depending partly on > compression settings) get significantly better compression. 4:1 would not be > unusual if there are relatively few bytes in comments/string literals. > > Alex > > Why did you respond to my post? I contended that text compression varies normally between about 2:1 and 10:1. I was curious about the allusion to 30:1 compression of text. Anything? -- Joe Wright "Everything should be made as simple as possible, but not simpler." --- Albert Einstein --- .