[HN Gopher] How are zlib, gzip and zip related?
       ___________________________________________________________________
        
       How are zlib, gzip and zip related?
        
       Author : damagednoob
       Score  : 207 points
       Date   : 2023-11-27 13:39 UTC (9 hours ago)
        
 (HTM) web link (stackoverflow.com)
 (TXT) w3m dump (stackoverflow.com)
        
       | melagonster wrote:
       | For people who first read this: the sweet part is in the comments
       | :)
        
         | dcow wrote:
         | What's even more sad is that the SO community has since
         | consequently destroyed SO as the home for this type of info.
         | This post would now be considered off topic as it's "not a good
         | format for a Q&A site". You'd never see it happen today. Truly
         | sad.
        
           | barrkel wrote:
           | Thing is, it could only be that way in its early days, when
           | the vanguard of users came to it from word of mouth, from
           | following Joel Spolsky or Coding Horror or their joint
           | podcast. The audience is much bigger now and with the long
           | tail of people, the number willing to put effort into good
           | questions is too low, and on-topicness is a simple quality
           | bar which can improve the signal to noise ratio.
        
             | dcow wrote:
             | Except, I doubt anybody would argue that a lower signal to
             | noise ratio has improved the site. (Plus, has the actual
             | metric even improved and how is it measured?) And, did
             | anybody ever stop to ask whether S:N should even be the
             | champion metric in the first place, at a product level?
             | With a philosophy of "Google is our homepage", I honestly
             | don't understand why S:N even matters since search pretty
             | effectively cuts out noise. I guess it makes a mod's life
             | easier though. The site is less useful today than it's ever
             | been. The road to hell...
        
             | Dalewyn wrote:
             | Very broadly, I find the quality/value of a given thing is
             | inversely proportional to how many people are involved.
             | 
             | So with regards to the internet: The 90s and early 00s were
             | great, then the internet became mainstream and it all just
             | became Cable TV 2.0.
        
             | s_dev wrote:
             | They had a voting system. By having mods decide what was
             | and what wasn't a 'good' question undermined the whole
             | point of the voting system. Mods should use their powers to
             | filter out hate/spam/trolling/egregiously off topic issues
             | not determine relevance/usefulness. As others have pointed
             | out SO was a site with great answers but awful for asking
             | questions. This is why ChatGPT is eating SO for breakfast.
             | 
             | Even if a question was super similar to one that was
             | previously asked has value in exactly that it might be
             | phrased slightly better and be a closer match to what
             | people were Googling.
        
           | twic wrote:
           | A rephrasing of this might be on-topic on retrocomputing:
           | https://retrocomputing.stackexchange.com/q/3083/21450
           | 
           | But almost nobody reads that.
        
           | zxt_tzx wrote:
           | Relatedly, I have seen the graph showing the dip in SO
           | traffic by ~30% if I'm not wrong (and the corresponding hot
           | takes that attribute that to the rise of LLMs).
           | 
           | I know most people are pessimistic that LLMs will lead to SO
           | and the web in general to be overrun by hallucinated content
           | and AI-training-on-AI-ouroboros, but I wonder if it might
           | instead allow for curious people to query an endlessly
           | patient AI assistant about exactly this kind of information.
           | (A custom GPT perhaps?)
        
             | dcow wrote:
             | GPT info tools will fully replace SO in most dev workflows
             | if it hasn't already.
        
               | norenh wrote:
               | And what will GPT info tools learn from, once the public
               | curated sources are gone?
        
               | dylan604 wrote:
               | By then, AGI will be ready, right?
        
               | dcow wrote:
               | Probably the great swaths of documentation out there that
               | for most use cases people need not waste time sifting
               | through if a computer can do it faster...
        
               | hawski wrote:
               | Isn't it fun, that ChatGPT's success poisoned the well
               | for everyone else? :)
        
           | BeetleB wrote:
           | This is somewhat revisionist. They would mark stuff like this
           | as off topic even in the early days.
        
         | miyuru wrote:
         | His stackexchange profile is a gold mine itslef.
         | 
         | https://stackexchange.com/users/1136690/mark-adler#top-answe...
        
       | ctur wrote:
       | What a great historical summary. Compression has moved on now but
       | having grown up marveling at PKZip and maximizing usable space on
       | very early computers, as well as compression in modems (v42bis
       | ftw!), this field has always seemed magical.
       | 
       | These days it generally is better to prefer Zstandard to
       | zlib/gzip for many reasons. And if you need seekable format,
       | consider squashfs as a reasonable choice. These stand on the
       | shoulders of the giants of zlib and zip but do indeed stand much
       | higher in the modern world.
        
         | michaelrpeskin wrote:
         | I had forgotten about modem compression. Back in the BBS days
         | when you had to upload files to get new files, you usually had
         | a ratio (20 bytes download for every byte you uploaded). I
         | would always use the PKZIP no compression option for the
         | archive to upload because Z-Modem would take care of
         | compression over the wire. So I didn't burn my daily time limit
         | by uploading a large file and I got more credit for my download
         | ratios.
         | 
         | I was a silly kid.
        
           | EvanAnderson wrote:
           | That's really clever and likely would have gone unnoticed by
           | a lot of sysops!
        
         | lxgr wrote:
         | > These days it generally is better to prefer Zstandard to
         | zlib/gzip for many reasons.
         | 
         | I'd agree for new applications, but just like MP3, .gz files
         | (and by extension .tar.gz/.tgz) and zlib streams will probably
         | be around for a long time for compatibility reasons.
        
         | pvorb wrote:
         | I think zlib/gzip still has its place these days. It's still a
         | decent choice for most use cases. If you don't know what usage
         | patterns your program will see, zlib still might be a good
         | choice. Plus, it's supported virtually everywhere, which makes
         | it interesting for long-term storage. Often, using one of the
         | modern alternatives is not worth the hassle.
        
       | dustypotato wrote:
       | Found this hilarious:
       | 
       | > This post is packed with so much history and information that I
       | feel like some citations need be added
       | 
       | > I am the reference
       | 
       | (extracted a part of the conversation)
        
         | tyingq wrote:
         | Maybe a spoiler, but the "I" in "I am the reference" is Mark
         | Adler:
         | 
         | https://en.wikipedia.org/wiki/Mark_Adler
        
           | signaru wrote:
           | It's awesome how he is active on stack overflow for almost
           | anything DEFLATE related. I once tried stuffing deflate
           | compressed vector graphics into PDFs. Among other things, it
           | turns out an Adler-32 checksum is necessary for compliance
           | (some newer PDF viewers will ignore its absence though).
        
         | whalesalad wrote:
         | Reminds me of when I was inadvertently arguing here on HN with
         | the inventor of the actor model about what actors are
        
           | demondemidi wrote:
           | That sounds like something I'd do too. If that makes you feel
           | better.
        
         | FartyMcFarter wrote:
         | "I'm the one who knocks".
        
           | matheusmoreira wrote:
           | "I am the hype."
        
         | gmgmgmgmgm wrote:
         | That's disallowed on Wikipedia. There, you must reference some
         | "source". That "source" doesn't need to be reliable or correct,
         | it just needs to be some random website that's not the actual
         | person. First sources are disallowed.
        
           | kibwen wrote:
           | And that's for good reason. Encyclopedias are supposed to be
           | tertiary sources, not primary sources. Having an explicit
           | cited reference makes it easier to judge the veracity of a
           | statement compared to digging through the page history to
           | figure out if a line was added by a person who happens to be
           | an expert.
        
             | msla wrote:
             | And then there's impostors, which people who denigrate
             | sourcing rules never seem to even think of.
        
             | JadeNB wrote:
             | > And that's for good reason. Encyclopedias are supposed to
             | be tertiary sources, not primary sources. Having an
             | explicit cited reference makes it easier to judge the
             | veracity of a statement compared to digging through the
             | page history to figure out if a line was added by a person
             | who happens to be an expert.
             | 
             | But why is a reference to "[1] Blog post by XXX" (or, even
             | worse, "[1] Blog post by YYY based on their tentative
             | understanding of XXX") a more authoritative source than
             | "[1] Added to Wikipedia personally by XXX"? Of course,
             | Wikipedia potentially has no proof that the editor was
             | actually XXX in the latter case; but they have even less
             | proof that a blog post purporting to be by XXX actually is.
        
               | kibwen wrote:
               | _> Wikipedia potentially has no proof that the editor was
               | actually XXX in the latter case; but they have even less
               | proof that a blog post purporting to be by XXX actually
               | is._
               | 
               | Wikipedia is not an authoritative identity layer, it
               | provides no proof of identity and is thus strictly weaker
               | than any other proof you can come up with. If you don't
               | trust any arbitrary website that Wikipedia cites, then
               | you have no more reason to trust any arbitrary Wikipedia
               | editor.
               | 
               | As for what tertiary sources are and why they prefer not
               | to cite primary sources in the first place, Wikipedia
               | goes over this in their own guidelines: https://en.m.wiki
               | pedia.org/wiki/Wikipedia:No_original_resear...
        
           | bombela wrote:
           | I learned this when I tried correcting the wikipedia page on
           | Docker. I literally wrote the first prototype. But this
           | wasn't enough source for wikipedia. And to this day the
           | English page is still not truthfull (interestingly enough,
           | the french version is closer to the truth).
        
             | nerdponx wrote:
             | You could publish a little webpage called "An historical
             | note about the Docker prototype" under your own name, which
             | you could then cite on Wikipedia.
             | 
             | I think it makes perfect sense as a general and strict
             | policy for an encyclopedia. It would simply be too hard to
             | audit every case to check if it's someone like you, or a
             | crank.
        
               | dTal wrote:
               | I don't see how requiring someone to set up a little
               | webpage filters out cranks. If anything I might expect it
               | to favor them.
        
               | nerdponx wrote:
               | The idea is that it's a separate, distinct source, which
               | exists outside of and independently from the encyclopedia
               | itself, and can be archived, mirrored, etc. Its veracity
               | and usefulness can then be debated or discussed as
               | needed.
        
               | bombela wrote:
               | Maybe I should write the story as a comment on hacker
               | news, and link to it ;)
               | 
               | Joke aside, I should probably take up on your advice.
        
               | a1369209993 wrote:
               | Yes. Do this (make sure it's _not_ a top-level
               | submission) and cite the HN comment specifically. Stupid
               | rules deserve stupid compliance.
        
       | wiredfool wrote:
       | The real question is: how are zlib and libz related?
        
         | o11c wrote:
         | zlib is the name of the project. libz is an implementation-
         | detail name of the library on Unix-like systems.
        
           | pdw wrote:
           | Similar: Xlib and libX11.
        
       | cout wrote:
       | Interesting -- I did not realize that the zip format supports
       | lzma, bzip2, and zstd. What software supports those compression
       | methods? Can Windows Explorer read zip files produced with those
       | compression methods?
       | 
       | (I have been using 7zip for about 15 years to produce archive
       | files that have an index and can quickly extract a single file
       | and can use multiple cores for compression, but I would love to
       | have an alternative, if one exists).
        
         | ForkMeOnTinder wrote:
         | 7zip has a dropdown called "Compression method" in the "Add to
         | Archive" dialog that lets you choose.
        
         | pixl97 wrote:
         | Until windows 11, no, windows zip only seems to deal with
         | COMPRESS/DEFLATE zip files.
        
       | encom wrote:
       | (2013)
        
       | emmelaich wrote:
       | Fun fact: in a sense. gzip can have multiple files, but not in a
       | specially useful way ...                   $ echo meow >cat
       | $ echo woof > dog
       | $ gzip cat
       | $ gzip dog
       | $ cat cat.gz dog.gz >animals.gz
       | $ gunzip animals.gz
       | $ cat animals
       | meow
       | woof
        
         | koolba wrote:
         | > ... but not in a specially useful way ...
         | 
         | It can be very useful:
         | https://github.com/google/crfs#introducing-stargz
        
           | DigiDigiorno wrote:
           | It is specially useful, it is not especially/generally useful
           | lol
           | 
           | It could be a typo, though I think when we say something
           | "isn't specially/specifically/particularly useful" we mean
           | "compared to the set of all features, specifically this
           | subset feature is not that useful" not that the feature isn't
           | useful for specific things
        
         | lxgr wrote:
         | Wow, that's surprising (at least to me)!
         | 
         | Is there a limit in the default gunzip implementation? I'm
         | aware of the concept of ZIP/tar bombs, but I wouldn't have
         | expected gunzip to ever produce more than one output file, at
         | least when invoked without options.
        
           | tedunangst wrote:
           | It only produces one output. It's just a stream of data.
        
             | lxgr wrote:
             | Ah, I somehow imagined a second `cat` in there. That makes
             | more sense, thank you!
        
       | HexDecOctBin wrote:
       | Is there an archive format that supports appending diff's of an
       | existing file, so that multiple versions of the same file are
       | stored? PKZIP has a proprietary extension (supposedly), but I
       | couldn't find any open version of that.
       | 
       | (I was thinking of a creating a version control system whose .git
       | directory equivalent is basically an archive file that can easily
       | be emailed, etc.)
        
       | raggi wrote:
       | The answer is good, but is missing a key section:
       | 
       | Salty form: They're all quite slow compared to modern
       | competitors.
        
         | levzettelin wrote:
         | What are some of those modern competitors?
        
           | scq wrote:
           | zstd is over 4x faster than zlib, while having a better
           | compression ratio.
           | 
           | http://facebook.github.io/zstd/
        
           | raggi wrote:
           | For zlib compatible workloads, there are cloudflare patches
           | and chromium forks, intel forks, and zlib-ng which are
           | compatible but >50% faster. (I think the cloudflare patches
           | eventually made it into upstream zlib, but you may not see
           | that in your distro for a decade).
           | 
           | lz4 and zstd have both been very popular since their release,
           | they're similar and by the same author, though zstd has had
           | more thorough testing and fuzzing, and is more featureful.
           | lz4 maintains an extremely fast decompression speed.
           | 
           | Snappy also performs very well, with zstd and snappy having
           | very close performance with tuning to achieve comparable
           | compression levels.
           | 
           | In recent years Zstd has started to make heavy inroads in
           | broader usage in OSS with a number of distro package managers
           | moving to it and observing substantial benefits. There are
           | HTTP extensions to make it available which Chrome originally
           | resisted but I believe it's now finally coming there too
           | (https://chromestatus.com/feature/6186023867908096).
           | 
           | In gaming circles there's also Oodle and friends from RAD
           | tools which are now available in Unreal engine as builtin
           | compression offerings (since 4.27+). You could see the
           | effects of this in for example Ark Survival Evolved (250GB)
           | -> Ark Survival Ascended (75GB, with richer models &
           | textures), and associated improved load times.
        
       | exposition wrote:
       | There's also pzip/punzip (https://github.com/ybirader) for those
       | wanting more performant (concurrent) zip/unzip.
       | 
       | Disclaimer: I'm the author.
        
       ___________________________________________________________________
       (page generated 2023-11-27 23:00 UTC)