https://diziet.dreamwidth.org/10559.html Account name: [ ] Password [ ] [Log in] (OpenID?) (Forgot it?) [ ] Remember Me You're viewing [personal profile] diziet's journal Create a Dreamwidth Account Learn More [ ] [Interest ] [Go] Reload page in style: site light Ian Jackson * Recent Entries * Archive * Reading * Network * Tags * Memories * Profile Debian's approach to Rust - Dependency handling Jan. 3rd, 2022 06:16 pm diziet: (Default) [personal profile] diziet tl;dr: Faithfully following upstream semver, in Debian package dependencies, is a bad idea. Introduction I have been involved in Debian for a very long time. And I've been working with Rust for a few years now. Late last year I had cause to try to work on Rust things within Debian. When I did, I found it very difficult. The Debian Rust Team were very helpful. However, the workflow and tooling require very large amounts of manual clerical work - work which it is almost impossible to do correctly since the information required does not exist. I had wanted to package a fairly straightforward program I had written in Rust, partly as a learning exercise. But, unfortunately, after I got stuck in, it looked to me like the effort would be wildly greater than I was prepared for, so I gave up. Since then I've been thinking about what I learned about how Rust is packaged in Debian. I think I can see how to fix some of the problems. Although I don't want to go charging in and try to tell everyone how to do things, I felt I ought at least to write up my ideas. Hence this blog post, which may become the first of a series. This post is going to be about semver handling. I see problems with other aspects of dependency handling and source code management and traceability as well, and of course if my ideas find favour in principle, there are a lot of details that need to be worked out, including some kind of transition plan. How Debian packages Rust, and build vs runtime dependencies Today I will be discussing almost entirely build-dependencies; Rust doesn't (yet?) support dynamic linking, so built Rust binaries don't have Rusty dependencies. However, things are a bit confusing because even the Debian "binary" packages for Rust libraries contain pure source code. So for a Rust library package, "building" the Debian binary package from the Debian source package does not involve running the Rust compiler; it's just file-copying and format conversion. The library's Rust dependencies do not need to be installed on the "build" machine for this. So I'm mostly going to be talking about Depends fields, which are Debian's way of talking about runtime dependencies, even though they are used only at build-time. The way this works is that some ultimate leaf package (which is supposed to produce actual executable code) Build-Depends on the libraries it needs, and those Depends on their under-libraries, so that everything needed is installed. What do dependencies mean and what are they for anyway? In systems where packages declare dependencies on other packages, it generally becomes necessary to support "versioned" dependencies. In all but the most simple systems, this involves an ordering (or similar) on version numbers and a way for a package A to specify that it depends on certain versions of B. Both Debian and Rust have this. Rust upstream crates have version numbers and can specify their dependencies according to semver. Debian's dependency system can represent that. So it was natural for the designers of the scheme for packaging Rust code in Debian to simply translate the Rust version dependencies to Debian ones. However, while the two dependency schemes seem equivalent in the abstract, their concrete real-world semantics are totally different. These different package management systems have different practices and different meanings for dependencies. (Interestingly, the Python world also has debates about the meaning and proper use of dependency versions.) The epistemological problem Consider some package A which is known to depend on B. In general, it is not trivial to know which versions of B will be satisfactory. I.e., whether a new B, with potentially-breaking changes, will actually break A. Sometimes tooling can be used which calculates this (eg, the Debian shlibdeps system for runtime dependencies) but this is unusual - especially for build-time dependencies. Which versions of B are OK can normally only be discovered by a human consideration of changelogs etc., or by having a computer try particular combinations. Few ecosystems with dependencies, in the Free Software community at least, make an attempt to precisely calculate the versions of B that are actually required to build some A. So it turns out that there are three cases for a particular combination of A and B: it is believed to work; it is known not to work; and: it is not known whether it will work. And, I am not aware of any dependency system that has an explicit machine-readable representation for the "unknown" state, so that they can say something like "A is known to depend on B; versions of B before v1 are known to break; version v2 is known to work". (Sometimes statements like that can be found in human-readable docs.) That leaves two possibilities for the semantics of a dependency A depends B, version(s) V..W: Precise: A will definitely work if B matches V..W, and Optimistic: We have no reason to think B breaks with any of V..W. At first sight the latter does not seem useful, since how would the package manager find a working combination? Taking Debian as an example, which uses optimistic version dependencies, the answer is as follows: The primary information about what package versions to use is not only the dependencies, but mostly in which Debian release is being targeted. (Other systems using optimistic version dependencies could use the date of the build, i.e. use only packages that are "current".) Precise Optimistic Package developers, People involved in Package developers, downstream developer/ version management downstream developers/ users, users. distribution QA and release managers. Package developers A wide range of B can declare versions V satisfy the declared and dependency It definitely works. requirement. ranges V..W so that The principal Contextual, eg, Releases - version data used Only dependency set(s) of packages by the package versions. available. manager Version Selecting working Sequencing (ordering) of dependencies are combinations (out of updates; QA. for all that ever existed). Use a particular release Expected use Downstream can combine of the whole system. pattern by a any Mixing-and-matching downstream declared-good requires additional QA and combination. remedial work. Downstreams are Pessimistically protected from updating versions and Whole-release QA. breakage by dependencies whenever anything might go wrong. A single version of each A substantial package, except where deployment will Multiple versions of there are actual typically contain many packages. incompatibilities which are too hard to fix. Top-down: Bottom-up: Package updates are Depending package Depended-on package is driven by updates the declared updated in the repository metadata. for the work-in-progress release. So, while Rust and Debian have systems that look superficially similar, they contain fundamentally different kinds of information. Simply representing the Rust versions directly into Debian doesn't work. What is currently done by the Debian Rust Team is to manually patch the dependency specifications, to relax them. This is very labour-intensive, and there is little automation supporting either decisionmaking or actually applying the resulting changes. What to do Desired end goal To update a Rust package in Debian, that many things depend on, one need simply update that package. Debian's sophisticated build and CI infrastructure will try building all the reverse-dependencies against the new version. Packages that actually fail against the new dependency are flagged as suffering from release-critical problems. Debian Rust developers then update those other packages too. If the problems turn out to be too difficult, it is possible to roll back. If a problem with a depending packages is not resolved in a timely fashion, priority is given to updating core packages, and the depending package falls by the wayside (since it is empirically unmaintainable, given available effort). There is no routine manual patching of dependency metadata (or of anything else). Radical proposal Debian should not precisely follow upstream Rust semver dependency information. Instead, Debian should optimistically try the combinations of packages that we want to have. The resulting breakages will be discovered by automated QA; they will have to be fixed by manual intervention of some kind, but usually, simply updating the depending package will be sufficient. This no longer ensures (unlike the upstream Rust scheme) that the result is expected to build and work if the dependencies are satisfied. But as discussed, we don't really need that property in Debian. More important is the new property we gain: that we are able to mix and match versions that we find work in practice, without a great deal of manual effort. Or to put it another way, in Debian we should do as a Rust upstream maintainer does when they do the regular "update dependencies for new semvers" task: we should update everything, see what breaks, and fix those. (In theory a Rust upstream package maintainer is supposed to do some additional checks or something. But the practices are not standardised and any checks one does almost never reveal anything untoward, so in practice I think many Rust upstreams just update and see what happens. The Rust upstream community has other mechanisms - often, reactive ones - to deal with any problems. Debian should subscribe to those same information sources, eg RustSec.) Nobbling cargo Somehow, when cargo is run to build Rust things against these Debian packages, cargo's dependency system will have to be overridden so that the version of the package that is actually selected by Debian's package manager is used by cargo without complaint. We probably don't want to change the Rust version numbers of Debian Rust library packages, so this should be done by either presenting cargo with an automatically-massaged Cargo.toml where the dependency version restrictions are relaxed, or by using a modified version of cargo which has special option(s) to relax certain dependencies. Handling breakage Rust packages in Debian should already be provided with autopkgtests so that ci.debian.net will detect build breakages. Build breakages will stop the updated dependency from migrating to the work-in-progress release, Debian testing. To resolve this, and allow forward progress, we will usually upload a new version of the dependency containing an appropriate Breaks, and either file an RC bug against the depending package, or update it. This can be done after the upload of the base package. Thus, resolution of breakage due to incompatibilities will be done collaboratively within the Debian archive, rather than ad-hoc locally. And it can be done without blocking. My proposal prioritises the ability to make progress in the core, over stability and in particular over retaining leaf packages. This is not Debian's usual approach but given the Rust ecosystem's practical attitudes to API design, versioning, etc., I think the instability will be manageable. In practice fixing leaf packages is not usually really that hard, but it's still work and the question is what happens if the work doesn't get done. After all we are always a shortage of effort - and we probably still will be, even if we get rid of the makework clerical work of patching dependency versions everywhere (so that usually no work is needed on depending packages). Exceptions to the one-version rule There will have to be some packages that we need to keep multiple versions of. We won't want to update every depending package manually when this happens. Instead, we'll probably want to set a version number split: rdepends which want version