[HN Gopher] How are zlib, gzip and zip related?
___________________________________________________________________
How are zlib, gzip and zip related?
Author : damagednoob
Score : 207 points
Date : 2023-11-27 13:39 UTC (9 hours ago)
(HTM) web link (stackoverflow.com)
(TXT) w3m dump (stackoverflow.com)
| melagonster wrote:
| For people who first read this: the sweet part is in the comments
| :)
| dcow wrote:
| What's even more sad is that the SO community has since
| consequently destroyed SO as the home for this type of info.
| This post would now be considered off topic as it's "not a good
| format for a Q&A site". You'd never see it happen today. Truly
| sad.
| barrkel wrote:
| Thing is, it could only be that way in its early days, when
| the vanguard of users came to it from word of mouth, from
| following Joel Spolsky or Coding Horror or their joint
| podcast. The audience is much bigger now and with the long
| tail of people, the number willing to put effort into good
| questions is too low, and on-topicness is a simple quality
| bar which can improve the signal to noise ratio.
| dcow wrote:
| Except, I doubt anybody would argue that a lower signal to
| noise ratio has improved the site. (Plus, has the actual
| metric even improved and how is it measured?) And, did
| anybody ever stop to ask whether S:N should even be the
| champion metric in the first place, at a product level?
| With a philosophy of "Google is our homepage", I honestly
| don't understand why S:N even matters since search pretty
| effectively cuts out noise. I guess it makes a mod's life
| easier though. The site is less useful today than it's ever
| been. The road to hell...
| Dalewyn wrote:
| Very broadly, I find the quality/value of a given thing is
| inversely proportional to how many people are involved.
|
| So with regards to the internet: The 90s and early 00s were
| great, then the internet became mainstream and it all just
| became Cable TV 2.0.
| s_dev wrote:
| They had a voting system. By having mods decide what was
| and what wasn't a 'good' question undermined the whole
| point of the voting system. Mods should use their powers to
| filter out hate/spam/trolling/egregiously off topic issues
| not determine relevance/usefulness. As others have pointed
| out SO was a site with great answers but awful for asking
| questions. This is why ChatGPT is eating SO for breakfast.
|
| Even if a question was super similar to one that was
| previously asked has value in exactly that it might be
| phrased slightly better and be a closer match to what
| people were Googling.
| twic wrote:
| A rephrasing of this might be on-topic on retrocomputing:
| https://retrocomputing.stackexchange.com/q/3083/21450
|
| But almost nobody reads that.
| zxt_tzx wrote:
| Relatedly, I have seen the graph showing the dip in SO
| traffic by ~30% if I'm not wrong (and the corresponding hot
| takes that attribute that to the rise of LLMs).
|
| I know most people are pessimistic that LLMs will lead to SO
| and the web in general to be overrun by hallucinated content
| and AI-training-on-AI-ouroboros, but I wonder if it might
| instead allow for curious people to query an endlessly
| patient AI assistant about exactly this kind of information.
| (A custom GPT perhaps?)
| dcow wrote:
| GPT info tools will fully replace SO in most dev workflows
| if it hasn't already.
| norenh wrote:
| And what will GPT info tools learn from, once the public
| curated sources are gone?
| dylan604 wrote:
| By then, AGI will be ready, right?
| dcow wrote:
| Probably the great swaths of documentation out there that
| for most use cases people need not waste time sifting
| through if a computer can do it faster...
| hawski wrote:
| Isn't it fun, that ChatGPT's success poisoned the well
| for everyone else? :)
| BeetleB wrote:
| This is somewhat revisionist. They would mark stuff like this
| as off topic even in the early days.
| miyuru wrote:
| His stackexchange profile is a gold mine itslef.
|
| https://stackexchange.com/users/1136690/mark-adler#top-answe...
| ctur wrote:
| What a great historical summary. Compression has moved on now but
| having grown up marveling at PKZip and maximizing usable space on
| very early computers, as well as compression in modems (v42bis
| ftw!), this field has always seemed magical.
|
| These days it generally is better to prefer Zstandard to
| zlib/gzip for many reasons. And if you need seekable format,
| consider squashfs as a reasonable choice. These stand on the
| shoulders of the giants of zlib and zip but do indeed stand much
| higher in the modern world.
| michaelrpeskin wrote:
| I had forgotten about modem compression. Back in the BBS days
| when you had to upload files to get new files, you usually had
| a ratio (20 bytes download for every byte you uploaded). I
| would always use the PKZIP no compression option for the
| archive to upload because Z-Modem would take care of
| compression over the wire. So I didn't burn my daily time limit
| by uploading a large file and I got more credit for my download
| ratios.
|
| I was a silly kid.
| EvanAnderson wrote:
| That's really clever and likely would have gone unnoticed by
| a lot of sysops!
| lxgr wrote:
| > These days it generally is better to prefer Zstandard to
| zlib/gzip for many reasons.
|
| I'd agree for new applications, but just like MP3, .gz files
| (and by extension .tar.gz/.tgz) and zlib streams will probably
| be around for a long time for compatibility reasons.
| pvorb wrote:
| I think zlib/gzip still has its place these days. It's still a
| decent choice for most use cases. If you don't know what usage
| patterns your program will see, zlib still might be a good
| choice. Plus, it's supported virtually everywhere, which makes
| it interesting for long-term storage. Often, using one of the
| modern alternatives is not worth the hassle.
| dustypotato wrote:
| Found this hilarious:
|
| > This post is packed with so much history and information that I
| feel like some citations need be added
|
| > I am the reference
|
| (extracted a part of the conversation)
| tyingq wrote:
| Maybe a spoiler, but the "I" in "I am the reference" is Mark
| Adler:
|
| https://en.wikipedia.org/wiki/Mark_Adler
| signaru wrote:
| It's awesome how he is active on stack overflow for almost
| anything DEFLATE related. I once tried stuffing deflate
| compressed vector graphics into PDFs. Among other things, it
| turns out an Adler-32 checksum is necessary for compliance
| (some newer PDF viewers will ignore its absence though).
| whalesalad wrote:
| Reminds me of when I was inadvertently arguing here on HN with
| the inventor of the actor model about what actors are
| demondemidi wrote:
| That sounds like something I'd do too. If that makes you feel
| better.
| FartyMcFarter wrote:
| "I'm the one who knocks".
| matheusmoreira wrote:
| "I am the hype."
| gmgmgmgmgm wrote:
| That's disallowed on Wikipedia. There, you must reference some
| "source". That "source" doesn't need to be reliable or correct,
| it just needs to be some random website that's not the actual
| person. First sources are disallowed.
| kibwen wrote:
| And that's for good reason. Encyclopedias are supposed to be
| tertiary sources, not primary sources. Having an explicit
| cited reference makes it easier to judge the veracity of a
| statement compared to digging through the page history to
| figure out if a line was added by a person who happens to be
| an expert.
| msla wrote:
| And then there's impostors, which people who denigrate
| sourcing rules never seem to even think of.
| JadeNB wrote:
| > And that's for good reason. Encyclopedias are supposed to
| be tertiary sources, not primary sources. Having an
| explicit cited reference makes it easier to judge the
| veracity of a statement compared to digging through the
| page history to figure out if a line was added by a person
| who happens to be an expert.
|
| But why is a reference to "[1] Blog post by XXX" (or, even
| worse, "[1] Blog post by YYY based on their tentative
| understanding of XXX") a more authoritative source than
| "[1] Added to Wikipedia personally by XXX"? Of course,
| Wikipedia potentially has no proof that the editor was
| actually XXX in the latter case; but they have even less
| proof that a blog post purporting to be by XXX actually is.
| kibwen wrote:
| _> Wikipedia potentially has no proof that the editor was
| actually XXX in the latter case; but they have even less
| proof that a blog post purporting to be by XXX actually
| is._
|
| Wikipedia is not an authoritative identity layer, it
| provides no proof of identity and is thus strictly weaker
| than any other proof you can come up with. If you don't
| trust any arbitrary website that Wikipedia cites, then
| you have no more reason to trust any arbitrary Wikipedia
| editor.
|
| As for what tertiary sources are and why they prefer not
| to cite primary sources in the first place, Wikipedia
| goes over this in their own guidelines: https://en.m.wiki
| pedia.org/wiki/Wikipedia:No_original_resear...
| bombela wrote:
| I learned this when I tried correcting the wikipedia page on
| Docker. I literally wrote the first prototype. But this
| wasn't enough source for wikipedia. And to this day the
| English page is still not truthfull (interestingly enough,
| the french version is closer to the truth).
| nerdponx wrote:
| You could publish a little webpage called "An historical
| note about the Docker prototype" under your own name, which
| you could then cite on Wikipedia.
|
| I think it makes perfect sense as a general and strict
| policy for an encyclopedia. It would simply be too hard to
| audit every case to check if it's someone like you, or a
| crank.
| dTal wrote:
| I don't see how requiring someone to set up a little
| webpage filters out cranks. If anything I might expect it
| to favor them.
| nerdponx wrote:
| The idea is that it's a separate, distinct source, which
| exists outside of and independently from the encyclopedia
| itself, and can be archived, mirrored, etc. Its veracity
| and usefulness can then be debated or discussed as
| needed.
| bombela wrote:
| Maybe I should write the story as a comment on hacker
| news, and link to it ;)
|
| Joke aside, I should probably take up on your advice.
| a1369209993 wrote:
| Yes. Do this (make sure it's _not_ a top-level
| submission) and cite the HN comment specifically. Stupid
| rules deserve stupid compliance.
| wiredfool wrote:
| The real question is: how are zlib and libz related?
| o11c wrote:
| zlib is the name of the project. libz is an implementation-
| detail name of the library on Unix-like systems.
| pdw wrote:
| Similar: Xlib and libX11.
| cout wrote:
| Interesting -- I did not realize that the zip format supports
| lzma, bzip2, and zstd. What software supports those compression
| methods? Can Windows Explorer read zip files produced with those
| compression methods?
|
| (I have been using 7zip for about 15 years to produce archive
| files that have an index and can quickly extract a single file
| and can use multiple cores for compression, but I would love to
| have an alternative, if one exists).
| ForkMeOnTinder wrote:
| 7zip has a dropdown called "Compression method" in the "Add to
| Archive" dialog that lets you choose.
| pixl97 wrote:
| Until windows 11, no, windows zip only seems to deal with
| COMPRESS/DEFLATE zip files.
| encom wrote:
| (2013)
| emmelaich wrote:
| Fun fact: in a sense. gzip can have multiple files, but not in a
| specially useful way ... $ echo meow >cat
| $ echo woof > dog
| $ gzip cat
| $ gzip dog
| $ cat cat.gz dog.gz >animals.gz
| $ gunzip animals.gz
| $ cat animals
| meow
| woof
| koolba wrote:
| > ... but not in a specially useful way ...
|
| It can be very useful:
| https://github.com/google/crfs#introducing-stargz
| DigiDigiorno wrote:
| It is specially useful, it is not especially/generally useful
| lol
|
| It could be a typo, though I think when we say something
| "isn't specially/specifically/particularly useful" we mean
| "compared to the set of all features, specifically this
| subset feature is not that useful" not that the feature isn't
| useful for specific things
| lxgr wrote:
| Wow, that's surprising (at least to me)!
|
| Is there a limit in the default gunzip implementation? I'm
| aware of the concept of ZIP/tar bombs, but I wouldn't have
| expected gunzip to ever produce more than one output file, at
| least when invoked without options.
| tedunangst wrote:
| It only produces one output. It's just a stream of data.
| lxgr wrote:
| Ah, I somehow imagined a second `cat` in there. That makes
| more sense, thank you!
| HexDecOctBin wrote:
| Is there an archive format that supports appending diff's of an
| existing file, so that multiple versions of the same file are
| stored? PKZIP has a proprietary extension (supposedly), but I
| couldn't find any open version of that.
|
| (I was thinking of a creating a version control system whose .git
| directory equivalent is basically an archive file that can easily
| be emailed, etc.)
| raggi wrote:
| The answer is good, but is missing a key section:
|
| Salty form: They're all quite slow compared to modern
| competitors.
| levzettelin wrote:
| What are some of those modern competitors?
| scq wrote:
| zstd is over 4x faster than zlib, while having a better
| compression ratio.
|
| http://facebook.github.io/zstd/
| raggi wrote:
| For zlib compatible workloads, there are cloudflare patches
| and chromium forks, intel forks, and zlib-ng which are
| compatible but >50% faster. (I think the cloudflare patches
| eventually made it into upstream zlib, but you may not see
| that in your distro for a decade).
|
| lz4 and zstd have both been very popular since their release,
| they're similar and by the same author, though zstd has had
| more thorough testing and fuzzing, and is more featureful.
| lz4 maintains an extremely fast decompression speed.
|
| Snappy also performs very well, with zstd and snappy having
| very close performance with tuning to achieve comparable
| compression levels.
|
| In recent years Zstd has started to make heavy inroads in
| broader usage in OSS with a number of distro package managers
| moving to it and observing substantial benefits. There are
| HTTP extensions to make it available which Chrome originally
| resisted but I believe it's now finally coming there too
| (https://chromestatus.com/feature/6186023867908096).
|
| In gaming circles there's also Oodle and friends from RAD
| tools which are now available in Unreal engine as builtin
| compression offerings (since 4.27+). You could see the
| effects of this in for example Ark Survival Evolved (250GB)
| -> Ark Survival Ascended (75GB, with richer models &
| textures), and associated improved load times.
| exposition wrote:
| There's also pzip/punzip (https://github.com/ybirader) for those
| wanting more performant (concurrent) zip/unzip.
|
| Disclaimer: I'm the author.
___________________________________________________________________
(page generated 2023-11-27 23:00 UTC)