https://diziet.dreamwidth.org/10559.html
Account name: [ ] Password [ ] [Log in]
(OpenID?) (Forgot it?) [ ] Remember Me
You're viewing [personal profile] diziet's journal
Create a Dreamwidth Account Learn More
[ ] [Interest ] [Go]
Reload page in style: site light
Ian Jackson
* Recent Entries
* Archive
* Reading
* Network
* Tags
* Memories
* Profile
Debian's approach to Rust - Dependency handling
Jan. 3rd, 2022 06:16 pm
diziet: (Default)
[personal profile] diziet
tl;dr: Faithfully following upstream semver, in Debian package
dependencies, is a bad idea.
Introduction
I have been involved in Debian for a very long time. And I've been
working with Rust for a few years now. Late last year I had cause to
try to work on Rust things within Debian.
When I did, I found it very difficult. The Debian Rust Team were very
helpful. However, the workflow and tooling require very large amounts
of manual clerical work - work which it is almost impossible to do
correctly since the information required does not exist. I had wanted
to package a fairly straightforward program I had written in Rust,
partly as a learning exercise. But, unfortunately, after I got stuck
in, it looked to me like the effort would be wildly greater than I
was prepared for, so I gave up.
Since then I've been thinking about what I learned about how Rust is
packaged in Debian. I think I can see how to fix some of the
problems. Although I don't want to go charging in and try to tell
everyone how to do things, I felt I ought at least to write up my
ideas. Hence this blog post, which may become the first of a series.
This post is going to be about semver handling. I see problems with
other aspects of dependency handling and source code management and
traceability as well, and of course if my ideas find favour in
principle, there are a lot of details that need to be worked out,
including some kind of transition plan.
How Debian packages Rust, and build vs runtime dependencies
Today I will be discussing almost entirely build-dependencies; Rust
doesn't (yet?) support dynamic linking, so built Rust binaries don't
have Rusty dependencies.
However, things are a bit confusing because even the Debian "binary"
packages for Rust libraries contain pure source code. So for a Rust
library package, "building" the Debian binary package from the Debian
source package does not involve running the Rust compiler; it's just
file-copying and format conversion. The library's Rust dependencies
do not need to be installed on the "build" machine for this.
So I'm mostly going to be talking about Depends fields, which are
Debian's way of talking about runtime dependencies, even though they
are used only at build-time. The way this works is that some ultimate
leaf package (which is supposed to produce actual executable code)
Build-Depends on the libraries it needs, and those Depends on their
under-libraries, so that everything needed is installed.
What do dependencies mean and what are they for anyway?
In systems where packages declare dependencies on other packages, it
generally becomes necessary to support "versioned" dependencies. In
all but the most simple systems, this involves an ordering (or
similar) on version numbers and a way for a package A to specify that
it depends on certain versions of B.
Both Debian and Rust have this. Rust upstream crates have version
numbers and can specify their dependencies according to semver.
Debian's dependency system can represent that.
So it was natural for the designers of the scheme for packaging Rust
code in Debian to simply translate the Rust version dependencies to
Debian ones. However, while the two dependency schemes seem
equivalent in the abstract, their concrete real-world semantics are
totally different.
These different package management systems have different practices
and different meanings for dependencies. (Interestingly, the Python
world also has debates about the meaning and proper use of dependency
versions.)
The epistemological problem
Consider some package A which is known to depend on B. In general, it
is not trivial to know which versions of B will be satisfactory.
I.e., whether a new B, with potentially-breaking changes, will
actually break A.
Sometimes tooling can be used which calculates this (eg, the Debian
shlibdeps system for runtime dependencies) but this is unusual -
especially for build-time dependencies. Which versions of B are OK
can normally only be discovered by a human consideration of
changelogs etc., or by having a computer try particular combinations.
Few ecosystems with dependencies, in the Free Software community at
least, make an attempt to precisely calculate the versions of B that
are actually required to build some A. So it turns out that there are
three cases for a particular combination of A and B: it is believed
to work; it is known not to work; and: it is not known whether it
will work.
And, I am not aware of any dependency system that has an explicit
machine-readable representation for the "unknown" state, so that they
can say something like "A is known to depend on B; versions of B
before v1 are known to break; version v2 is known to work".
(Sometimes statements like that can be found in human-readable docs.)
That leaves two possibilities for the semantics of a dependency A
depends B, version(s) V..W: Precise: A will definitely work if B
matches V..W, and Optimistic: We have no reason to think B breaks
with any of V..W.
At first sight the latter does not seem useful, since how would the
package manager find a working combination? Taking Debian as an
example, which uses optimistic version dependencies, the answer is as
follows: The primary information about what package versions to use
is not only the dependencies, but mostly in which Debian release is
being targeted. (Other systems using optimistic version dependencies
could use the date of the build, i.e. use only packages that are
"current".)
Precise Optimistic
Package developers,
People involved in Package developers, downstream developer/
version management downstream developers/ users,
users. distribution QA and
release managers.
Package developers A wide range of B can
declare versions V satisfy the declared
and dependency It definitely works. requirement.
ranges V..W so that
The principal Contextual, eg, Releases -
version data used Only dependency set(s) of packages
by the package versions. available.
manager
Version Selecting working Sequencing (ordering) of
dependencies are combinations (out of updates; QA.
for all that ever
existed).
Use a particular release
Expected use Downstream can combine of the whole system.
pattern by a any Mixing-and-matching
downstream declared-good requires additional QA and
combination. remedial work.
Downstreams are Pessimistically
protected from updating versions and Whole-release QA.
breakage by dependencies whenever
anything might go
wrong.
A single version of each
A substantial package, except where
deployment will Multiple versions of there are actual
typically contain many packages. incompatibilities which
are too hard to fix.
Top-down: Bottom-up:
Package updates are Depending package Depended-on package is
driven by updates the declared updated in the repository
metadata. for the work-in-progress
release.
So, while Rust and Debian have systems that look superficially
similar, they contain fundamentally different kinds of information.
Simply representing the Rust versions directly into Debian doesn't
work.
What is currently done by the Debian Rust Team is to manually patch
the dependency specifications, to relax them. This is very
labour-intensive, and there is little automation supporting either
decisionmaking or actually applying the resulting changes.
What to do
Desired end goal
To update a Rust package in Debian, that many things depend on, one
need simply update that package.
Debian's sophisticated build and CI infrastructure will try building
all the reverse-dependencies against the new version. Packages that
actually fail against the new dependency are flagged as suffering
from release-critical problems.
Debian Rust developers then update those other packages too. If the
problems turn out to be too difficult, it is possible to roll back.
If a problem with a depending packages is not resolved in a timely
fashion, priority is given to updating core packages, and the
depending package falls by the wayside (since it is empirically
unmaintainable, given available effort).
There is no routine manual patching of dependency metadata (or of
anything else).
Radical proposal
Debian should not precisely follow upstream Rust semver dependency
information. Instead, Debian should optimistically try the
combinations of packages that we want to have. The resulting
breakages will be discovered by automated QA; they will have to be
fixed by manual intervention of some kind, but usually, simply
updating the depending package will be sufficient.
This no longer ensures (unlike the upstream Rust scheme) that the
result is expected to build and work if the dependencies are
satisfied. But as discussed, we don't really need that property in
Debian. More important is the new property we gain: that we are able
to mix and match versions that we find work in practice, without a
great deal of manual effort.
Or to put it another way, in Debian we should do as a Rust upstream
maintainer does when they do the regular "update dependencies for new
semvers" task: we should update everything, see what breaks, and fix
those.
(In theory a Rust upstream package maintainer is supposed to do some
additional checks or something. But the practices are not
standardised and any checks one does almost never reveal anything
untoward, so in practice I think many Rust upstreams just update and
see what happens. The Rust upstream community has other mechanisms -
often, reactive ones - to deal with any problems. Debian should
subscribe to those same information sources, eg RustSec.)
Nobbling cargo
Somehow, when cargo is run to build Rust things against these Debian
packages, cargo's dependency system will have to be overridden so
that the version of the package that is actually selected by Debian's
package manager is used by cargo without complaint.
We probably don't want to change the Rust version numbers of Debian
Rust library packages, so this should be done by either presenting
cargo with an automatically-massaged Cargo.toml where the dependency
version restrictions are relaxed, or by using a modified version of
cargo which has special option(s) to relax certain dependencies.
Handling breakage
Rust packages in Debian should already be provided with autopkgtests
so that ci.debian.net will detect build breakages. Build breakages
will stop the updated dependency from migrating to the
work-in-progress release, Debian testing.
To resolve this, and allow forward progress, we will usually upload a
new version of the dependency containing an appropriate Breaks, and
either file an RC bug against the depending package, or update it.
This can be done after the upload of the base package.
Thus, resolution of breakage due to incompatibilities will be done
collaboratively within the Debian archive, rather than ad-hoc
locally. And it can be done without blocking.
My proposal prioritises the ability to make progress in the core,
over stability and in particular over retaining leaf packages. This
is not Debian's usual approach but given the Rust ecosystem's
practical attitudes to API design, versioning, etc., I think the
instability will be manageable. In practice fixing leaf packages is
not usually really that hard, but it's still work and the question is
what happens if the work doesn't get done. After all we are always a
shortage of effort - and we probably still will be, even if we get
rid of the makework clerical work of patching dependency versions
everywhere (so that usually no work is needed on depending packages).
Exceptions to the one-version rule
There will have to be some packages that we need to keep multiple
versions of. We won't want to update every depending package manually
when this happens. Instead, we'll probably want to set a version
number split: rdepends which want version