[HN Gopher] Preventing ZIP parser confusion attacks on Python pa...
       ___________________________________________________________________
        
       Preventing ZIP parser confusion attacks on Python package
       installers
        
       Author : miketheman
       Score  : 41 points
       Date   : 2025-08-07 16:16 UTC (6 hours ago)
        
 (HTM) web link (blog.pypi.org)
 (TXT) w3m dump (blog.pypi.org)
        
       | jspiner wrote:
       | Thank you for the interesting article.
        
       | captn3m0 wrote:
       | Now I am curious at whether these ZIP confusion attacks are
       | mitigated at other registries that use ZIPs? Are there any such?
        
         | calebbrown wrote:
         | Apart from Python Wheels, the other popular ecosystems using
         | zip files are Java jar files, and NuGet.
         | 
         | Of these Java is the most interesting as there a few JDKs
         | commonly in use.
         | 
         | But I'm also interested in various security scanners that are
         | built in other languages that can be fooled.
        
           | zahlman wrote:
           | Does NPM not use zip files?
           | 
           | (Search results for `npm package format` are entirely not
           | useful for figuring out what an NPM package actually consists
           | of, beyond containing a `package.json` file. `pypi package
           | format` results look wildly different; the first result I get
           | is https://packaging.python.org/en/latest/discussions/package
           | -f... which is quite comprehensive about the exact
           | information I want -- disregarding for a moment the fact that
           | I already know this stuff ;) The NPM search results, for me,
           | start with a Geeks4Geeks tutorial on creating a package. Is
           | there even anything analogous to the Python Packaging
           | Authority -- misunderstood and not-actually-authoritative as
           | it is -- for NPM?)
        
       | zahlman wrote:
       | > This has been done in response to the discovery that the
       | popular installer uv has a different extraction behavior to many
       | Python-based installers that use the ZIP parser implementation
       | provided by the zipfile standard library module.
       | 
       | > For maintainers of installer projects: Ensure that your ZIP
       | implementation follows the ZIP standard and checks the Central
       | Directory before proceeding with decompression. See the CPython
       | zipfile module for a ZIP implementation that implements this
       | logic. Begin checking the RECORD file against ZIP contents and
       | erroring or warning the user that the wheel is incorrectly
       | formatted.
       | 
       | Good to know that I won't need to work around any issues with
       | `zipfile` -- and it would be rather absurd for any Python-based
       | installer to use anything else to do the decompression. (Checking
       | RECORD for consistency is straightforward, although of course it
       | takes time.)
       | 
       | ... but surely uv got its zip-decompression logic from a crate
       | rather than hand-rolling it? How many other Rust projects out
       | there might have questionable handling of zip files?
       | 
       | > PyPI already implements ZIP and tarball compression-bomb
       | detection as a part of upload processing.
       | 
       | ... The implication is that `zipfile` doesn't handle this. But
       | perhaps it can't really? Are there valid uses for zips that work
       | that way? (Or maybe there isn't a clear rule for what counts as a
       | "bomb", and PyPI has to choose a threshold value?)
        
         | lexicality wrote:
         | > but surely uv got its zip-decompression logic from a crate
         | rather than hand-rolling it?
         | 
         | well... https://github.com/astral-sh/rs-async-zip
        
           | zahlman wrote:
           | Interesting. (I have neither the familarity with Rust, nor
           | the willingness to spend time on it, to decide how much of
           | this is the fault of the original vs the fork.)
        
         | woodruffw wrote:
         | > and it would be rather absurd for any Python-based installer
         | to use anything else to do the decompression.
         | 
         | You'd reasonably think, but it's difficult to assert this: a
         | _lot_ of people use third-party tooling (uv, but also a lot of
         | hand-rolled stuff), and Python packages aren 't always
         | processed in a straight-line-from-the-index manner.
         | 
         | (I think a good reference example of this is security scanners:
         | a scanner might fetch a wheel ZIP and analyze it, and use
         | whatever ZIP implementation it pleases.)
         | 
         | It's also worth noting that _one_ of the differentials here
         | concerns the Central Directory, but the other one is more
         | pernicious: the ZIP APPNOTE[1] isn 't really clear about how
         | implementations should key from to EOCDR back to the local file
         | entries, and implementations have (reasonably, IMO) interpreted
         | the language differently. Python's zipfile chooses to do it in
         | one way that I think is justifiable, but it's a "true"
         | differential in the sense that there's no golden answer.
         | 
         | > (Or maybe there isn't a clear rule for what counts as a
         | "bomb", and PyPI has to choose a threshold value?)
         | 
         | Yes, it's this. There are legitimate uses for high-ratio
         | archives (e.g. compressed OS images), but Python package
         | distributions are (generally) not one of them. PyPI has its own
         | compression ratio that's intended to be a sweet spot between
         | "that was compressed really well" and "someone is trying to
         | ZIP-bomb the index."
         | 
         | [1]:
         | https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
        
           | zahlman wrote:
           | > You'd reasonably think, but it's difficult to assert this:
           | a lot of people use third-party tooling (uv, but also a lot
           | of hand-rolled stuff),
           | 
           | I mean, for people (like myself) explicitly attempting to
           | implement alternatives to pip. And to my understanding, pip
           | itself does use `zipfile` as well.
           | 
           | Are you proposing that there are people out there making
           | package installers for personal use?
           | 
           | > and Python packages aren't always processed in a straight-
           | line-from-the-index manner.
           | 
           | I don't know what you have in mind here.
        
             | woodruffw wrote:
             | > Are you proposing that there are people out there making
             | package installers for personal use?
             | 
             | I gave an example in the original comment: there's a _lot_
             | of random ass tooling out there that treats Python wheels
             | as a mostly opaque archive, and unpacks /repacks them in
             | various ways. The original PEP behind wheels also
             | (implicitly) expects this, since it refers to extraction
             | with a "ZIP client" and not Python's zipfile specifically.
             | 
             | I think security scanners are a simple example, but Linux
             | distros, Homebrew, etc. all also process Python package
             | distributions in ways that mostly just assume a ZIP
             | container, without additionally trying to exactly match how
             | Python's `zipfile` behaves.
             | 
             | > I don't know what you have in mind here.
             | 
             | The security scanner example from the original comment.
        
       ___________________________________________________________________
       (page generated 2025-08-07 23:01 UTC)