https://lwn.net/SubscriberLink/867657/0efafb319ce20e3e/ LWN.net Logo LWN .net News from the source LWN * Content + Weekly Edition + Archives + Search + Kernel + Security + Distributions + Events calendar + Unread comments + ------------------------------------------------------------- + LWN FAQ + Write for us User: [ ] Password: [ ] [Log in] | [Subscribe] | [Register] Subscribe / Log in / New account Cooperative package management for Python [LWN subscriber-only content] Welcome to LWN.net The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider subscribing to LWN. Thank you for visiting LWN.net! By Jake Edge August 31, 2021 A longstanding tug-of-war between system package managers and Python's own installation mechanisms (primarily pip, but there are others) looks on its way to being resolved--or at least regularized. PEP 668 ("Graceful cooperation between external and Python package managers") has been created to provide ways for the two types of package installation to work together, rather than at cross-purposes at times. Since many operating systems depend on Python tools, with package versions that may differ from those of users' Python applications, making them play together nicely should result in more stable systems. The root cause of the problem is that distribution package managers and Python package managers ("pip" is shorthand to refer to those throughout the rest of the article) often share the same "site-packages" directory for storing installed packages. Updating a package, or, worse yet, removing one, may make perfect sense in the context of the specific package manager, but completely foul up the other. As the PEP notes, that can cause real havoc: This may pose a critical problem for the integrity of distros, which often have package-management tools that are themselves written in Python. For example, it's possible to unintentionally break Fedora's dnf command with a pip install command, making it hard to recover. The sys.path system parameter governs where Python looks for modules when it encounters an import statement; it gets initialized from the PYTHONPATH environment variable, with some installation- and invocation-specific directories added. sys.path is a Python list of directories that get consulted in order, much like the shell PATH environment variable that it is modeled on. Python programs can manipulate sys.path to redirect the search, which is part of what makes virtual environments work. Using virtual environments with pip, instead of installing packages system-wide, has been the recommended practice to avoid conflicts with OS-installed packages for quite some time. But it is not generally mandatory, so users sometimes still run into problems. One goal of PEP 668 is to allow distributions to indicate that they provide another mechanism for managing Python packages, which will then change the default behavior of pip. Users will still be able to override that default, but that will hopefully alert them to the problems that could arise. A distribution that wants to opt into the new behavior will tell pip that it manages Python packages with its tooling by placing a configuration file called EXTERNALLY-MANAGED in the directory where the Python standard library lives. If pip finds the EXTERNALLY-MANAGED file there and is not running within a virtual environment, it should exit with an error message unless the user has explicitly overridden the default with command-line flag; the PEP recommends --break-system-packages for the flag name. The EXTERNALLY-MANAGED file can contain an error message that pip should return when it exits due to those conditions being met; the messages can be localized in the file as well. The intent is for the message to give distribution-specific information guiding the user to the proper way to create a virtual environment. Another problem that can occur is when packages are removed from system-wide installs by pip. If, for example, the user installs a package system-wide and runs into a problem, the "obvious" solution to that may cause bigger problems: There is a worse problem with system-wide installs: if you attempt to recover from this situation with sudo pip uninstall, you may end up removing packages that are shipped by the system's package manager. In fact, this can even happen if you simply upgrade a package - pip will try to remove the old version of the package, as shipped by the OS. At this point it may not be possible to recover the system to a consistent state using just the software remaining on the system. A second change proposed in the PEP would limit pip to only operating on the directories specified for its use. The idea is that distributions can separate the two kinds of packages into their own directories, which is something that several Linux distributions already do: For example, Fedora and Debian (and their derivatives) both implement this split by using /usr/local for locally-installed packages and /usr for distro-installed packages. Fedora uses /usr /local/lib/python3.x/site-packages vs. /usr/lib/python3.x/ site-packages. (Debian uses /usr/local/lib/python3/dist-packages vs. /usr/lib/python3/dist-packages as an additional layer of separation from a locally-compiled Python interpreter: if you build and install upstream CPython in /usr/local/bin, it will look at /usr/local/lib/python3/site-packages, and Debian wishes to make sure that packages installed via the locally-built interpreter don't show up on sys.path for the distro interpreter.) So the proposal would require pip to query the location where it is meant to place its packages and only modify files in that directory. Since the locally installed packages are normally placed ahead of the system-wide packages on sys.path, though, this can lead to pip "shadowing" a distribution package. Shadowing an installed package can, of course, lead to some of the problems mentioned, so it is recommended that pip emit a warning when this happens. The PEP has an extensive analysis of the use cases and the impact these changes will have. "The changed behavior in this PEP is intended to 'do the right thing' for as many use cases as possible." In particular, the changes to allow distributions to have two different locations for packages and for pip not to change the system-wide location are essentially standardizing the current practice of some distributions. The "Recommendations for distros" section of the PEP specifically calls out that separation as a best practice moving forward. There are situations where distributions would not want to default to this new behavior, however. Containers for single applications may not benefit from the restrictions, so the PEP recommends that distributions change their behavior for those container images: Distros that produce official images for single-application containers (e.g., Docker container images) should remove the EXTERNALLY-MANAGED file, preferably in a way that makes it not come back if a user of that image installs package updates inside their image (think RUN apt-get dist-upgrade). On dpkg-based systems, using dpkg-divert --local to persistently rename the file would work. On other systems, there may need to be some configuration flag available to a post-install script to re-remove the EXTERNALLY-MANAGED file. In general, the PEP seems not to be particularly controversial. The PEP discussion thread is positive for the most part, though Paul Moore, who may be the PEP-Delegate deciding on the proposal, is concerned that those affected may not even know about it: One thing I would be looking for is a bit more discussion - the linux-sig discussion mentioned was only 6 messages since May, and there's only a couple of messages here. I'm not convinced that "silence means approval" is sufficient here, it's difficult to be sure where interested parties hang out, so silence seems far more likely to imply "wasn't aware of the proposal" in this case. In fact, I'd suggest that the PEP gets a section listing distributions that have confirmed their intent to support this proposal, including the distribution, and a link to where the commitment was made. Assuming said confirmations are forthcoming, or that any objections and suggestions can be accommodated, PEP 668 seems like a nice step forward for Python. Having tools like DNF and apt fight with pip and others is obviously a situation that has caused problems in the past and will do so again. Finding a way to cooperate without causing any major backward-compatibility headaches is important. Ensuring that other distributions are on-board with these changes, all of which are ultimately optional anyway, should lead to more stability and, ultimately, happier users--both for Python and for the distributions. [Send a free link] ----------------------------------------- (Log in to post comments) Cooperative package management for Python Posted Aug 31, 2021 21:03 UTC (Tue) by NYKevin (subscriber, #129325) [Link] It'd be nice if I could depend on python3 -m venv actually working. It depends on ensurepip, which is or was sabotaged or outright removed by some distros because they didn't want users to have a secondary package manager on their systems. The problem is that both ensurepip and venv have programmatic interfaces and are part of the Python standard library. You can't just tear out random bits and pieces of a language's stdlib. When I write code for "Python," I expect the batteries to be included as advertised. [Reply to this comment] Cooperative package management for Python Posted Aug 31, 2021 22:01 UTC (Tue) by beagnach (subscriber, #32987) [Link] What distros are in question here? How does the breakage manifest? [Reply to this comment] Cooperative package management for Python Posted Aug 31, 2021 22:42 UTC (Tue) by NYKevin (subscriber, #129325) [Link] I have no idea if it's still a problem today, but see for example https://www.google.com/search?q=debian+ensurepip+venv No really, click through. Look at the sheer *number* of people who were (or possibly still are?) inconvenienced by this behavior. I'm sure they had their reasons, but they broke a lot of workflows when they did that. [Reply to this comment] Cooperative package management for Python Posted Sep 1, 2021 0:57 UTC (Wed) by JanC_ (subscriber, #34940) [Link ] Based on the 2-3 sites I looked at, it seems like people were just missing a package, as they didn't have venv (for the correct Python version) installed? There's lots of other parts of "Python" that are not installed by default, e.g. development headers, documentation, tests, examples, IDLE, etc. Basically, a default install includes everything you need to run Python programs, but not everything you might need to develop in Python. I assume this is mostly to reduce its size in environments where all those aren't needed (which is by far the majority of installations). [Reply to this comment] Cooperative package management for Python Posted Sep 1, 2021 4:13 UTC (Wed) by NYKevin (subscriber, #129325) [ Link] As mentioned, both venv and ensurepip have programmatic interfaces and are part of the Python stdlib. If you tell me that your operating system has "Python" on it, I am going to assume that I can use and call into every single part of the stdlib. I'm not going to split out parts of the stdlib as separate dependencies, or instruct people to run distro-specific hacks to fix the half-an-installation they got by default. The entire purpose of ensurepip was to *ensure* that everyone had pip available with every installation of Python, by incorporating it into the stdlib. That's why it's called "ensurepip," and not "maybepip" or "optionalpip." By removing it, you are breaking API compatibility with standard (upstream) Python. [Reply to this comment] Cooperative package management for Python Posted Sep 1, 2021 5:49 UTC (Wed) by stefanor (subscriber, #32895) [ Link] > If you tell me that your operating system has "Python" on it, I am going to assume that I can use and call into every single part of the stdlib. The good news is that the python3-full binary package now exists to meet this need. If you're a Pythonista, you can install this and get the stdlib you expect. But let's dig deeper and questioning the assumption you made. Why do distros break it? There are some optional parts of the stdlib (e.g. database drivers), and then there are the mechanics of the operating system to consider. If an app in your operating system is written in Python, it is reasonable for that application to depend on a subset of the Python standard library, to reduce install footprint by minimizing dependencies. This may sound unreasonable at first, but there are things in the Python standard library that you really don't need for app runtime: e.g. documentation, dev headers, stdlib test suite, Tkinter, IDLE, lib2to3, ensurepip, or distutils. As a Python developer, these may seem like sacrosanct parts of the stdlib, but as a distro maintainer, they are not something that you need to waste install CD space for, and most desktop end-users will never miss them. You can significantly reduce the installed size of the Python stack and its dependencies on most end-user systems by making these components optional. Generally distributions break complex packages up into multiple pieces, trying to find a balance between a minimal core and all the optional bits that users may need for their particular use case. (Not every optional feature will be supported, of course). In Debian, libreoffice is broken into around 200 binary packages, as an extreme example. Python in Debian is broken into several major pieces: python3: The main CPython interpreter package, including the standard runtime stdlib. python3-minimal: Intended for install environments and size constrained CD images. Just the CPython interpreter + a minimalist subset of the stdlib. python3-doc: Documentation. python3-dev: C header files, the -config script, and a static version of libcpython. python3-distutils: The distutils module (only needed at build time) python3-examples: Examples, Demos and Tools python3-dbg: A debug build of the CPython interpreter python3-gdbm: The GNU dbm driver (and dependency on libgdbm) python3-tk: tk module (and Tcl/Tk dependencies) python3-venv: Depends on the wheels that ensurepip requires to bootstrap pip into a venv. libpython3.X-testsuite: The stdlib test suite idle-python-3.X: IDLE Technically, there are a few more packages, but these are the functional break-points. > The entire purpose of ensurepip was to *ensure* that everyone had pip available with every installation of Python, by incorporating it into the stdlib. And yet ensurepip never really made sense in a typical package-managed Linux distro. Distros have package-managers that are responsible for installing things. They don't tend to get on well with other tools messing in the same trees of the filesystem. (That's what this article is about.) Debian expects Debian users to install pip by apt installing python3-venv or python3-pip (as appropriate), not by running ensurepip to install things in /usr. For this reason Debian has always hobbled ensurepip to print an error message explaining this, when executed directly. When used by the venv module, it just does what you expect, and creates a venv seeded with pip. [Reply to this comment] Cooperative package management for Python Posted Sep 1, 2021 9:40 UTC (Wed) by MrWim (subscriber, #47432) [Link ] > ensurepip never really made sense in a typical package-managed Linux distro. Distros have package-managers that are responsible for installing things. They don't tend to get on well with other tools messing in the same trees of the filesystem. I think this is exactly why it makes sense to include venv wherever you include pip. venv is the mechanism that people use to avoid messing with the same trees of the filesystem. By not including it you have a pip that can mess with the distro provided packages, but you don't have the capability to sandbox off these changes. Note: you don't need to be a Python developer to want pip. You'll need it whenever you want to run any non-distro-provided Python software - not only when developing it. It's exactly these users who are not familiar with the Python packaging tools that are at most risk from breaking their systems in a way that they don't know how to diagnose or fix. [Reply to this comment] Copyright (c) 2021, Eklektix, Inc. Comments and public postings are copyrighted by their creators. Linux is a registered trademark of Linus Torvalds