[HN Gopher] Bundling binary tools in Python wheels
___________________________________________________________________
Bundling binary tools in Python wheels
Author : pcr910303
Score : 93 points
Date : 2022-06-17 12:00 UTC (11 hours ago)
(HTM) web link (simonwillison.net)
(TXT) w3m dump (simonwillison.net)
| physicsguy wrote:
| The issues with this were always:
|
| * size of wheels you can upload is constrained by PyPi
|
| * difficult to support multiple versions across multiple
| operating systems, unless you provide a source distribution,
| which is then...
|
| * Still a nightmare on Windows
| jborean93 wrote:
| > size of wheels you can upload is constrained by PyPi
|
| I feel PyPi is pretty generous with their limits. You can even
| request more once you hit the ceiling, i think it's around 60MB
| [1]. There are some wheels that are crazy large, tensorflow-gpu
| [2] are around 500MB each. I think there's discussions out
| there to try and find ways of alleviating this problem on PyPi.
|
| > difficult to support multiple versions across multiple
| operating systems, unless you provide a source distribution,
| which is then...
|
| This can be a problem but I've found that recently the problem
| has improved quite a lot. You can create manylinux wheels for
| both x86, x64, and arm64 which cover pretty a lot of the Linux
| distributions using glibc. A musllinux tag was recently added
| to cover musl based distributions like Alpine. MacOS wheels
| support both x64, arm64, and can even be a universal2 wheel.
| Windows is still purely x86 or x64 for now but I've seen some
| people work on arm64 support support in CPython and once that's
| in I'm sure PyPi won't be too far around. There are also some
| great tools like cibuildwheel [3] that make building and
| testing these wheels pretty simple.
|
| > Still a nightmare on Windows
|
| I'm actually curious what is a nightmare about Windows. I found
| that Windows is probably the easiest of all the platforms to
| build and upload wheels for. You aren't limited to a tiny
| subset of system libs, like you are on Linux, and building them
| is mostly the same process. Probably the hardest thing is
| ensuring you have tue correct vs build kit installed but that's
| not insurmountable.
|
| [1] https://pypi.org/help/#file-size-limit
|
| [2] https://pypi.org/project/tensorflow-gpu/#files
|
| [3] https://github.com/pypa/cibuildwheel
| zackees wrote:
| I've done this for static-ffmpeg with the added bonus of
| downloading ffmpeg binaries on first use.
| remram wrote:
| That's a bit different than putting it in the wheel. It will
| not work with tools for vendoring, caching, or mirroring
| dependencies. It will also stop working if the separate place
| you get the binaries from breaks or runs out of money or purges
| that specific version from their site.
| [deleted]
| homarp wrote:
| next step is to use https://github.com/jart/cosmopolitan to build
| the binary
| michelpp wrote:
| Awesome I'm gonna check this out for my bundle!
| the__alchemist wrote:
| I built a Python version and dependency manager; program is a
| standalone executable coded in Rust. One installation method is
| as a wheel, by using this trick. Hosted on PyPi, users can
| install with Pip etc.
| mid-kid wrote:
| This sort of thing sounds like something better left to a proper
| package manager like Nix. Reproducibility and a complete top-to-
| bottom dependency chain are key in making sure a package works
| well into the future.
| Spivak wrote:
| Which is all nice in theory except you have to meet people
| where they are and that's using pip and publishing on Pypi.
| Unless you can auto-nixify an arbitrary Python package, and in
| this case an arbitrary binary package, you're gonna end up
| doing a lot of work to get that reproducibility.
|
| I don't use pip because I like pip, I use it because that's
| where the software I want is.
| philipov wrote:
| Suppose you're trying to distribute a piece of binary software
| that comes with different features, such as a build that includes
| only client applications, while another one also includes both
| client and server applications. If they both depend on the same
| shared library that needs to be included with the distribution,
| and come with the same python API, making it so you can't
| separate features into isolated packages, what's the best
| practice for packaging each bundle as wheels?
|
| Would you name one package `software_server`, another
| `software_client`, but call the shared python component
| `software` in both of them?
|
| Also, how do you manage versioning when the version of the binary
| software is independent from the version of the python wrapper
| around it?
| maxnoe wrote:
| This is not possible using wheels. Binary wheels cannot depend
| on shared libraries outside the wheel besides a small number of
| System libraries.
|
| If you want to distribute a python package like this, you could
| use the conda forge eco system and have a package for the base
| library and then other packages depending on that, e.g.
|
| foo-core foo-server foo-client
| philipov wrote:
| You can't depend on shared libraries outside the wheel, but
| you can include the library inside the wheel. The
| applications all expect the shared library they depend on to
| be in a path relative the location of the application, which
| prevents putting the shared library in a separate package
| anyway. So how would you handle this situation with wheels if
| you don't have the option of forcing users into a specific
| python distribution?
| maxnoe wrote:
| Maybe I misunderstood. After rereading, I think so...
|
| I thought you wanted to distributed a shared library im one
| package and then have other packages depend on that.
|
| So it's not clear to me what your actual Situation is
| epistasis wrote:
| About 10 years ago, before Docker, and before (I knew about?)
| conda, I was using Python packages with binaries inside to create
| reproducible virtual environments with consistently versioned
| binary tools to go along with all our Python code. We were
| avoiding virtualization and running on bare metal so this was the
| best way to distribute code across a cluster.
|
| I added small setup.py scripts to a dozen open source C and Java
| packages, and deposited them on an internal package server.
|
| It worked OK, and unlike Conda, to this day you can still run an
| install without burning through CPU-hours of SAT-solving.
|
| Biggest problem was if there were packages that wanted different
| versions of the same dependency.
| prpl wrote:
| FWIW conda has mamba as an option now which (mostly) fixed the
| slow SAT silver
| epistasis wrote:
| Mamba is completely unusable for me too.
| divbzero wrote:
| > _Biggest problem was if there were packages that wanted
| different versions of the same dependency._
|
| How often and in what contexts would you encounter this version
| dependency issue? Was there a good solution or would you work
| around the issue by finding compatible versions?
| epistasis wrote:
| This never presented a real problem in practice, but IIRC pip
| just silently installs over the other version.
|
| And since we had a closed system, it was entirely on us to
| set it up correctly. It was just a discovery a couple years
| in that we hadn't properly thought of ahead of time.
| woodruffw wrote:
| In a similar vein: Python projects can use `auditwheel` to
| automatically relocate (fixup RPATHs) and vendor their "system"
| dependencies, such as a specific version of `zlib` or
| `libffi`[1].
|
| [1]: https://github.com/pypa/auditwheel
| samwillis wrote:
| I have just used this trick to package Node.js as a Python
| package [0], so you can just do: pip install
| nodejs-bin
|
| to install it.
|
| This works particularly well in combination with Python virtual
| environments (or docker), it will allow you to have a specific
| version of Node for a particular project without additional
| tooling. Just add "nodejs-bin" to your requirements.txt.
|
| You can also add it as an optional dependency on another Python
| package. That's my plan with the full stack
| Python/Django/Apline.js framework I'm building (Tetra[1]), that
| way users will be able to optional do this to setup the full
| project including Node.js: pip install
| tetraframework[full]
|
| The framework uses esbuild, and will use PostCSS. It was either
| package them with the framework (including Node.js), or create a
| way to easily install Node.js for Python developers using the
| tools they are most used to.
|
| It's brand new, I have only packed Node.js v16 so far. Once I'm
| happy there are no show stopping issues the plan is to release
| wheels for current+LTS Node.js versions as they come out.
|
| Developer UX is my number one priority and removing the need for
| two toolchains is an important part of that.
|
| [0]: https://pypi.org/project/nodejs-bin/
|
| [1]: http://tetraframework.com
| michelpp wrote:
| This is also the basis for the postgresql-wheel package, a Python
| wheel that contains an entire local PostgreSQL binary
| installation:
|
| https://github.com/michelp/postgresql-wheel
|
| Out of sheer laziness I used CFFI to encapsulate the build, even
| though there is no dynamic linking involved. There's probably a
| simpler way but this approach has worked very well for disposable
| database tests.
| randlet wrote:
| Very cool! Might be useful in CI scenarios.
| ssl232 wrote:
| The Python wheel "ecosystem" is rather nice these days. You can
| create platform wheels and bundle in shared objects and binaries
| compiled for those platforms, automated by tools like auditwheel
| (Linux) and delocate-wheel (OSX). These tools even rewrite the
| wheel's manifest and update the shared object RPATH values and
| mangle the names to ensure they don't get clobbered by stuff in
| the system library path. The auditwheel tool converts a "linux"
| wheel into a "manylinux" wheel which is guaranteed by the spec to
| run on certain minimum Linux kernels.
|
| Wheels are merely zips with a different extension, so worst case
| for projects where these automation tools fail you you can do it
| yourself. You simply need to be careful to update the manifest
| and make sure shared objects are loaded in the correct order.
|
| On the user side, a `pip install` should automatically grab the
| relevant platform wheel from pypi.org, or, if there is not one
| available, fall back to trying to compile from source. With PEP
| 517 this all happens in a reproduceable, isolated build
| environment so if you manage to get it building on your CI
| pipeline it is likely to work on the end user's machine.
|
| I'm not sure what the state of wheels on Windows is these days.
| Is it possible to bundle DLLs in Windows wheels and have it "just
| work" the way it does for (most) Linux distros and OSX?
| dannyz wrote:
| We bundle DLLs in our wheels in such a way that it "just works"
| for the user but it kind of feels like a hack. First a main DLL
| library is built completely separately from the wheel. Then a
| binary wheel is built where the .pyd file basically just
| directly calls functions from the main DLL. The main DLL is
| then just manually included in the wheel during the build step.
| Any dependent DLLs can also be just manually included inside
| the wheel as well.
| greenshackle2 wrote:
| Still no way to publish a wheel for musl-based Linux though, is
| there?
| woodruffw wrote:
| musl-based wheels can use the `musllinux` tag, as specified
| in PEP 656[1].
|
| For example, cryptography publishes musl wheels[2].
|
| [1]: https://peps.python.org/pep-0656/
|
| [2]: https://pypi.org/project/cryptography/#files
| westurner wrote:
| FWICS, wheel has no cryptographic signatures at present:
|
| The minimal cryptographic signature support in the `wheel`
| reference implementation was removed by dholth;
|
| The GPG ASC signature upload support present in legacy PyPI and
| then the warehouse was removed by dstufft;
|
| "e2e" TUF is not yet implemented for PyPI, which signs
| everything uploaded with a key necessarily held in RAM; but
| there's no "e2e" because packages aren't signed before being
| uploaded to PyPI. Does twine download and check PyPI's TUF
| signature for whatever was uploaded?
|
| I honestly haven't looked at conda's fairly new package signing
| support yet.
|
| FWIR, in comparison to legacy python eggs with setup.py files,
| wheels aren't supposed to execute code as the user installing
| the package.
|
| From https://news.ycombinator.com/item?id=30549331 :
|
| _https://github.com/pypa/cibuildwheel :_
|
| >>> _Build Python wheels for all the platforms on CI with
| minimal configuration._
|
| >>> _Python wheels are great. Building them across Mac, Linux,
| Windows, on multiple versions of Python, is not._
|
| >>> _cibuildwheel is here to help. cibuildwheel runs on your CI
| server - currently it supports GitHub Actions, Azure Pipelines,
| Travis CI, AppVeyor, CircleCI, and GitLab CI - and it builds
| and tests your wheels across all of your platforms_
| woodruffw wrote:
| You're right, both the infrastructure and metadata for
| cryptographic signatures on Python packages (both wheels and
| sdists) isn't quite there yet.
|
| At the moment, we're working towards the "e2e" scheme you've
| described by adding support for Sigstore[1] certificates and
| signatures, which will allow any number of identities
| (including email addresses and individual GitHub release
| workflows) to sign for packages. The integrity/availability
| of those signing artifacts will in turn be enforced through
| TUF, like you mentioned.
|
| You can follow some of the related Sigstore-in-Python work
| here[2], and the ongoing Warehouse (PyPI) TUF work here[3].
| We're also working on adding OpenID Connect token
| consumption[4] to Warehouse itself, meaning that you'll be
| able to bootstrap from a trusted GitHub workflow to a PyPI
| release token without needing to share any secrets.
|
| [1]: https://www.sigstore.dev/
|
| [2]: https://github.com/sigstore/sigstore-python
|
| [3]: https://github.com/pypa/warehouse/pull/10870
|
| [4]: https://github.com/pypa/warehouse/pull/11272
___________________________________________________________________
(page generated 2022-06-17 23:01 UTC)