[HN Gopher] Bundling binary tools in Python wheels
       ___________________________________________________________________
        
       Bundling binary tools in Python wheels
        
       Author : pcr910303
       Score  : 93 points
       Date   : 2022-06-17 12:00 UTC (11 hours ago)
        
 (HTM) web link (simonwillison.net)
 (TXT) w3m dump (simonwillison.net)
        
       | physicsguy wrote:
       | The issues with this were always:
       | 
       | * size of wheels you can upload is constrained by PyPi
       | 
       | * difficult to support multiple versions across multiple
       | operating systems, unless you provide a source distribution,
       | which is then...
       | 
       | * Still a nightmare on Windows
        
         | jborean93 wrote:
         | > size of wheels you can upload is constrained by PyPi
         | 
         | I feel PyPi is pretty generous with their limits. You can even
         | request more once you hit the ceiling, i think it's around 60MB
         | [1]. There are some wheels that are crazy large, tensorflow-gpu
         | [2] are around 500MB each. I think there's discussions out
         | there to try and find ways of alleviating this problem on PyPi.
         | 
         | > difficult to support multiple versions across multiple
         | operating systems, unless you provide a source distribution,
         | which is then...
         | 
         | This can be a problem but I've found that recently the problem
         | has improved quite a lot. You can create manylinux wheels for
         | both x86, x64, and arm64 which cover pretty a lot of the Linux
         | distributions using glibc. A musllinux tag was recently added
         | to cover musl based distributions like Alpine. MacOS wheels
         | support both x64, arm64, and can even be a universal2 wheel.
         | Windows is still purely x86 or x64 for now but I've seen some
         | people work on arm64 support support in CPython and once that's
         | in I'm sure PyPi won't be too far around. There are also some
         | great tools like cibuildwheel [3] that make building and
         | testing these wheels pretty simple.
         | 
         | > Still a nightmare on Windows
         | 
         | I'm actually curious what is a nightmare about Windows. I found
         | that Windows is probably the easiest of all the platforms to
         | build and upload wheels for. You aren't limited to a tiny
         | subset of system libs, like you are on Linux, and building them
         | is mostly the same process. Probably the hardest thing is
         | ensuring you have tue correct vs build kit installed but that's
         | not insurmountable.
         | 
         | [1] https://pypi.org/help/#file-size-limit
         | 
         | [2] https://pypi.org/project/tensorflow-gpu/#files
         | 
         | [3] https://github.com/pypa/cibuildwheel
        
       | zackees wrote:
       | I've done this for static-ffmpeg with the added bonus of
       | downloading ffmpeg binaries on first use.
        
         | remram wrote:
         | That's a bit different than putting it in the wheel. It will
         | not work with tools for vendoring, caching, or mirroring
         | dependencies. It will also stop working if the separate place
         | you get the binaries from breaks or runs out of money or purges
         | that specific version from their site.
        
         | [deleted]
        
       | homarp wrote:
       | next step is to use https://github.com/jart/cosmopolitan to build
       | the binary
        
         | michelpp wrote:
         | Awesome I'm gonna check this out for my bundle!
        
       | the__alchemist wrote:
       | I built a Python version and dependency manager; program is a
       | standalone executable coded in Rust. One installation method is
       | as a wheel, by using this trick. Hosted on PyPi, users can
       | install with Pip etc.
        
       | mid-kid wrote:
       | This sort of thing sounds like something better left to a proper
       | package manager like Nix. Reproducibility and a complete top-to-
       | bottom dependency chain are key in making sure a package works
       | well into the future.
        
         | Spivak wrote:
         | Which is all nice in theory except you have to meet people
         | where they are and that's using pip and publishing on Pypi.
         | Unless you can auto-nixify an arbitrary Python package, and in
         | this case an arbitrary binary package, you're gonna end up
         | doing a lot of work to get that reproducibility.
         | 
         | I don't use pip because I like pip, I use it because that's
         | where the software I want is.
        
       | philipov wrote:
       | Suppose you're trying to distribute a piece of binary software
       | that comes with different features, such as a build that includes
       | only client applications, while another one also includes both
       | client and server applications. If they both depend on the same
       | shared library that needs to be included with the distribution,
       | and come with the same python API, making it so you can't
       | separate features into isolated packages, what's the best
       | practice for packaging each bundle as wheels?
       | 
       | Would you name one package `software_server`, another
       | `software_client`, but call the shared python component
       | `software` in both of them?
       | 
       | Also, how do you manage versioning when the version of the binary
       | software is independent from the version of the python wrapper
       | around it?
        
         | maxnoe wrote:
         | This is not possible using wheels. Binary wheels cannot depend
         | on shared libraries outside the wheel besides a small number of
         | System libraries.
         | 
         | If you want to distribute a python package like this, you could
         | use the conda forge eco system and have a package for the base
         | library and then other packages depending on that, e.g.
         | 
         | foo-core foo-server foo-client
        
           | philipov wrote:
           | You can't depend on shared libraries outside the wheel, but
           | you can include the library inside the wheel. The
           | applications all expect the shared library they depend on to
           | be in a path relative the location of the application, which
           | prevents putting the shared library in a separate package
           | anyway. So how would you handle this situation with wheels if
           | you don't have the option of forcing users into a specific
           | python distribution?
        
             | maxnoe wrote:
             | Maybe I misunderstood. After rereading, I think so...
             | 
             | I thought you wanted to distributed a shared library im one
             | package and then have other packages depend on that.
             | 
             | So it's not clear to me what your actual Situation is
        
       | epistasis wrote:
       | About 10 years ago, before Docker, and before (I knew about?)
       | conda, I was using Python packages with binaries inside to create
       | reproducible virtual environments with consistently versioned
       | binary tools to go along with all our Python code. We were
       | avoiding virtualization and running on bare metal so this was the
       | best way to distribute code across a cluster.
       | 
       | I added small setup.py scripts to a dozen open source C and Java
       | packages, and deposited them on an internal package server.
       | 
       | It worked OK, and unlike Conda, to this day you can still run an
       | install without burning through CPU-hours of SAT-solving.
       | 
       | Biggest problem was if there were packages that wanted different
       | versions of the same dependency.
        
         | prpl wrote:
         | FWIW conda has mamba as an option now which (mostly) fixed the
         | slow SAT silver
        
           | epistasis wrote:
           | Mamba is completely unusable for me too.
        
         | divbzero wrote:
         | > _Biggest problem was if there were packages that wanted
         | different versions of the same dependency._
         | 
         | How often and in what contexts would you encounter this version
         | dependency issue? Was there a good solution or would you work
         | around the issue by finding compatible versions?
        
           | epistasis wrote:
           | This never presented a real problem in practice, but IIRC pip
           | just silently installs over the other version.
           | 
           | And since we had a closed system, it was entirely on us to
           | set it up correctly. It was just a discovery a couple years
           | in that we hadn't properly thought of ahead of time.
        
       | woodruffw wrote:
       | In a similar vein: Python projects can use `auditwheel` to
       | automatically relocate (fixup RPATHs) and vendor their "system"
       | dependencies, such as a specific version of `zlib` or
       | `libffi`[1].
       | 
       | [1]: https://github.com/pypa/auditwheel
        
       | samwillis wrote:
       | I have just used this trick to package Node.js as a Python
       | package [0], so you can just do:                 pip install
       | nodejs-bin
       | 
       | to install it.
       | 
       | This works particularly well in combination with Python virtual
       | environments (or docker), it will allow you to have a specific
       | version of Node for a particular project without additional
       | tooling. Just add "nodejs-bin" to your requirements.txt.
       | 
       | You can also add it as an optional dependency on another Python
       | package. That's my plan with the full stack
       | Python/Django/Apline.js framework I'm building (Tetra[1]), that
       | way users will be able to optional do this to setup the full
       | project including Node.js:                 pip install
       | tetraframework[full]
       | 
       | The framework uses esbuild, and will use PostCSS. It was either
       | package them with the framework (including Node.js), or create a
       | way to easily install Node.js for Python developers using the
       | tools they are most used to.
       | 
       | It's brand new, I have only packed Node.js v16 so far. Once I'm
       | happy there are no show stopping issues the plan is to release
       | wheels for current+LTS Node.js versions as they come out.
       | 
       | Developer UX is my number one priority and removing the need for
       | two toolchains is an important part of that.
       | 
       | [0]: https://pypi.org/project/nodejs-bin/
       | 
       | [1]: http://tetraframework.com
        
       | michelpp wrote:
       | This is also the basis for the postgresql-wheel package, a Python
       | wheel that contains an entire local PostgreSQL binary
       | installation:
       | 
       | https://github.com/michelp/postgresql-wheel
       | 
       | Out of sheer laziness I used CFFI to encapsulate the build, even
       | though there is no dynamic linking involved. There's probably a
       | simpler way but this approach has worked very well for disposable
       | database tests.
        
         | randlet wrote:
         | Very cool! Might be useful in CI scenarios.
        
       | ssl232 wrote:
       | The Python wheel "ecosystem" is rather nice these days. You can
       | create platform wheels and bundle in shared objects and binaries
       | compiled for those platforms, automated by tools like auditwheel
       | (Linux) and delocate-wheel (OSX). These tools even rewrite the
       | wheel's manifest and update the shared object RPATH values and
       | mangle the names to ensure they don't get clobbered by stuff in
       | the system library path. The auditwheel tool converts a "linux"
       | wheel into a "manylinux" wheel which is guaranteed by the spec to
       | run on certain minimum Linux kernels.
       | 
       | Wheels are merely zips with a different extension, so worst case
       | for projects where these automation tools fail you you can do it
       | yourself. You simply need to be careful to update the manifest
       | and make sure shared objects are loaded in the correct order.
       | 
       | On the user side, a `pip install` should automatically grab the
       | relevant platform wheel from pypi.org, or, if there is not one
       | available, fall back to trying to compile from source. With PEP
       | 517 this all happens in a reproduceable, isolated build
       | environment so if you manage to get it building on your CI
       | pipeline it is likely to work on the end user's machine.
       | 
       | I'm not sure what the state of wheels on Windows is these days.
       | Is it possible to bundle DLLs in Windows wheels and have it "just
       | work" the way it does for (most) Linux distros and OSX?
        
         | dannyz wrote:
         | We bundle DLLs in our wheels in such a way that it "just works"
         | for the user but it kind of feels like a hack. First a main DLL
         | library is built completely separately from the wheel. Then a
         | binary wheel is built where the .pyd file basically just
         | directly calls functions from the main DLL. The main DLL is
         | then just manually included in the wheel during the build step.
         | Any dependent DLLs can also be just manually included inside
         | the wheel as well.
        
         | greenshackle2 wrote:
         | Still no way to publish a wheel for musl-based Linux though, is
         | there?
        
           | woodruffw wrote:
           | musl-based wheels can use the `musllinux` tag, as specified
           | in PEP 656[1].
           | 
           | For example, cryptography publishes musl wheels[2].
           | 
           | [1]: https://peps.python.org/pep-0656/
           | 
           | [2]: https://pypi.org/project/cryptography/#files
        
         | westurner wrote:
         | FWICS, wheel has no cryptographic signatures at present:
         | 
         | The minimal cryptographic signature support in the `wheel`
         | reference implementation was removed by dholth;
         | 
         | The GPG ASC signature upload support present in legacy PyPI and
         | then the warehouse was removed by dstufft;
         | 
         | "e2e" TUF is not yet implemented for PyPI, which signs
         | everything uploaded with a key necessarily held in RAM; but
         | there's no "e2e" because packages aren't signed before being
         | uploaded to PyPI. Does twine download and check PyPI's TUF
         | signature for whatever was uploaded?
         | 
         | I honestly haven't looked at conda's fairly new package signing
         | support yet.
         | 
         | FWIR, in comparison to legacy python eggs with setup.py files,
         | wheels aren't supposed to execute code as the user installing
         | the package.
         | 
         | From https://news.ycombinator.com/item?id=30549331 :
         | 
         |  _https://github.com/pypa/cibuildwheel :_
         | 
         | >>> _Build Python wheels for all the platforms on CI with
         | minimal configuration._
         | 
         | >>> _Python wheels are great. Building them across Mac, Linux,
         | Windows, on multiple versions of Python, is not._
         | 
         | >>> _cibuildwheel is here to help. cibuildwheel runs on your CI
         | server - currently it supports GitHub Actions, Azure Pipelines,
         | Travis CI, AppVeyor, CircleCI, and GitLab CI - and it builds
         | and tests your wheels across all of your platforms_
        
           | woodruffw wrote:
           | You're right, both the infrastructure and metadata for
           | cryptographic signatures on Python packages (both wheels and
           | sdists) isn't quite there yet.
           | 
           | At the moment, we're working towards the "e2e" scheme you've
           | described by adding support for Sigstore[1] certificates and
           | signatures, which will allow any number of identities
           | (including email addresses and individual GitHub release
           | workflows) to sign for packages. The integrity/availability
           | of those signing artifacts will in turn be enforced through
           | TUF, like you mentioned.
           | 
           | You can follow some of the related Sigstore-in-Python work
           | here[2], and the ongoing Warehouse (PyPI) TUF work here[3].
           | We're also working on adding OpenID Connect token
           | consumption[4] to Warehouse itself, meaning that you'll be
           | able to bootstrap from a trusted GitHub workflow to a PyPI
           | release token without needing to share any secrets.
           | 
           | [1]: https://www.sigstore.dev/
           | 
           | [2]: https://github.com/sigstore/sigstore-python
           | 
           | [3]: https://github.com/pypa/warehouse/pull/10870
           | 
           | [4]: https://github.com/pypa/warehouse/pull/11272
        
       ___________________________________________________________________
       (page generated 2022-06-17 23:01 UTC)