Subj : Re: Storing 20 million randomly accessible documents in compressed
To   : comp.programming
From : Joe Wright
Date : Thu Sep 08 2005 10:22 pm

Alex Fraser wrote:
> "Joe Wright" <jwright@comcast.net> wrote in message
> news:1YqdnZ1EH_f7F4LeRVn-2Q@comcast.com...
> 
>>Michael Wojcik wrote:
>>
>>>In article <431e7f8c$0$17734$afc38c87@news.optusnet.com.au>, "DarkD"
>>><darkd@NOSPAMoptusworld.com.SPAM.au> writes:
>>>
>>>>"Gene" <gene.ressler@gmail.com> wrote in message
>>>>news:1125946156.818588.253910@g44g2000cwa.googlegroups.com...
>>>>
>>>>
>>>>>A single zip file gets about 1.8 to 1 for average text.
>>>>
>>>>1.8 to 1? I think you are thinking of the ratio for random ASCII
>>>>display characters. Typical compressed books etc. have a huge ratio of
>>>>about 30:1
>>>
>>>They do not, unless the source representation is extremely bloated.
>>>
>>>I just did a couple of tests with large, highly-redundant ASCII
>>>documents (the Perl 5 change log, for example) and gzip -9 just to
>>>confirm, and didn't see anything better than about 5:1.
>>>
>>>If you believe otherwise, cite a source.
>>
>>I have a 'folder' of 392 program files (*.c) comprised of 366,692 bytes.
>>Using my favorite zipper..
>>
>>pkzip x.zip *.c
>>
>>I find x.zip to be 191,539 bytes. That's about 1.91 compression on text
>>files.
> 
> 
> Each file is independently compressed, and the average size is under 1KB. If
> you concatenated the files you would probably (depending partly on
> compression settings) get significantly better compression. 4:1 would not be
> unusual if there are relatively few bytes in comments/string literals.
> 
> Alex
> 
> 
Why did you respond to my post? I contended that text compression varies 
normally between about 2:1 and 10:1. I was curious about the allusion to 
30:1 compression of text. Anything?

-- 
Joe Wright
"Everything should be made as simple as possible, but not simpler."
                     --- Albert Einstein ---

.