[HN Gopher] Preventing ZIP parser confusion attacks on Python pa...
___________________________________________________________________
Preventing ZIP parser confusion attacks on Python package
installers
Author : miketheman
Score : 41 points
Date : 2025-08-07 16:16 UTC (6 hours ago)
(HTM) web link (blog.pypi.org)
(TXT) w3m dump (blog.pypi.org)
| jspiner wrote:
| Thank you for the interesting article.
| captn3m0 wrote:
| Now I am curious at whether these ZIP confusion attacks are
| mitigated at other registries that use ZIPs? Are there any such?
| calebbrown wrote:
| Apart from Python Wheels, the other popular ecosystems using
| zip files are Java jar files, and NuGet.
|
| Of these Java is the most interesting as there a few JDKs
| commonly in use.
|
| But I'm also interested in various security scanners that are
| built in other languages that can be fooled.
| zahlman wrote:
| Does NPM not use zip files?
|
| (Search results for `npm package format` are entirely not
| useful for figuring out what an NPM package actually consists
| of, beyond containing a `package.json` file. `pypi package
| format` results look wildly different; the first result I get
| is https://packaging.python.org/en/latest/discussions/package
| -f... which is quite comprehensive about the exact
| information I want -- disregarding for a moment the fact that
| I already know this stuff ;) The NPM search results, for me,
| start with a Geeks4Geeks tutorial on creating a package. Is
| there even anything analogous to the Python Packaging
| Authority -- misunderstood and not-actually-authoritative as
| it is -- for NPM?)
| zahlman wrote:
| > This has been done in response to the discovery that the
| popular installer uv has a different extraction behavior to many
| Python-based installers that use the ZIP parser implementation
| provided by the zipfile standard library module.
|
| > For maintainers of installer projects: Ensure that your ZIP
| implementation follows the ZIP standard and checks the Central
| Directory before proceeding with decompression. See the CPython
| zipfile module for a ZIP implementation that implements this
| logic. Begin checking the RECORD file against ZIP contents and
| erroring or warning the user that the wheel is incorrectly
| formatted.
|
| Good to know that I won't need to work around any issues with
| `zipfile` -- and it would be rather absurd for any Python-based
| installer to use anything else to do the decompression. (Checking
| RECORD for consistency is straightforward, although of course it
| takes time.)
|
| ... but surely uv got its zip-decompression logic from a crate
| rather than hand-rolling it? How many other Rust projects out
| there might have questionable handling of zip files?
|
| > PyPI already implements ZIP and tarball compression-bomb
| detection as a part of upload processing.
|
| ... The implication is that `zipfile` doesn't handle this. But
| perhaps it can't really? Are there valid uses for zips that work
| that way? (Or maybe there isn't a clear rule for what counts as a
| "bomb", and PyPI has to choose a threshold value?)
| lexicality wrote:
| > but surely uv got its zip-decompression logic from a crate
| rather than hand-rolling it?
|
| well... https://github.com/astral-sh/rs-async-zip
| zahlman wrote:
| Interesting. (I have neither the familarity with Rust, nor
| the willingness to spend time on it, to decide how much of
| this is the fault of the original vs the fork.)
| woodruffw wrote:
| > and it would be rather absurd for any Python-based installer
| to use anything else to do the decompression.
|
| You'd reasonably think, but it's difficult to assert this: a
| _lot_ of people use third-party tooling (uv, but also a lot of
| hand-rolled stuff), and Python packages aren 't always
| processed in a straight-line-from-the-index manner.
|
| (I think a good reference example of this is security scanners:
| a scanner might fetch a wheel ZIP and analyze it, and use
| whatever ZIP implementation it pleases.)
|
| It's also worth noting that _one_ of the differentials here
| concerns the Central Directory, but the other one is more
| pernicious: the ZIP APPNOTE[1] isn 't really clear about how
| implementations should key from to EOCDR back to the local file
| entries, and implementations have (reasonably, IMO) interpreted
| the language differently. Python's zipfile chooses to do it in
| one way that I think is justifiable, but it's a "true"
| differential in the sense that there's no golden answer.
|
| > (Or maybe there isn't a clear rule for what counts as a
| "bomb", and PyPI has to choose a threshold value?)
|
| Yes, it's this. There are legitimate uses for high-ratio
| archives (e.g. compressed OS images), but Python package
| distributions are (generally) not one of them. PyPI has its own
| compression ratio that's intended to be a sweet spot between
| "that was compressed really well" and "someone is trying to
| ZIP-bomb the index."
|
| [1]:
| https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
| zahlman wrote:
| > You'd reasonably think, but it's difficult to assert this:
| a lot of people use third-party tooling (uv, but also a lot
| of hand-rolled stuff),
|
| I mean, for people (like myself) explicitly attempting to
| implement alternatives to pip. And to my understanding, pip
| itself does use `zipfile` as well.
|
| Are you proposing that there are people out there making
| package installers for personal use?
|
| > and Python packages aren't always processed in a straight-
| line-from-the-index manner.
|
| I don't know what you have in mind here.
| woodruffw wrote:
| > Are you proposing that there are people out there making
| package installers for personal use?
|
| I gave an example in the original comment: there's a _lot_
| of random ass tooling out there that treats Python wheels
| as a mostly opaque archive, and unpacks /repacks them in
| various ways. The original PEP behind wheels also
| (implicitly) expects this, since it refers to extraction
| with a "ZIP client" and not Python's zipfile specifically.
|
| I think security scanners are a simple example, but Linux
| distros, Homebrew, etc. all also process Python package
| distributions in ways that mostly just assume a ZIP
| container, without additionally trying to exactly match how
| Python's `zipfile` behaves.
|
| > I don't know what you have in mind here.
|
| The security scanner example from the original comment.
___________________________________________________________________
(page generated 2025-08-07 23:01 UTC)