https://diziet.dreamwidth.org/10559.html

Account name: [       ] Password [       ] [Log in]
(OpenID?) (Forgot it?) [ ] Remember Me
You're viewing [personal profile] diziet's journal
Create a Dreamwidth Account  Learn More
[                    ] [Interest        ] [Go]
Reload page in style:  site   light

Ian Jackson

  * Recent Entries
  * Archive
  * Reading
  * Network
  * Tags
  * Memories
  * Profile

Debian's approach to Rust - Dependency handling

Jan. 3rd, 2022 06:16 pm
diziet: (Default)
[personal profile] diziet

tl;dr: Faithfully following upstream semver, in Debian package
dependencies, is a bad idea.

Introduction

I have been involved in Debian for a very long time. And I've been
working with Rust for a few years now. Late last year I had cause to
try to work on Rust things within Debian.

When I did, I found it very difficult. The Debian Rust Team were very
helpful. However, the workflow and tooling require very large amounts
of manual clerical work - work which it is almost impossible to do
correctly since the information required does not exist. I had wanted
to package a fairly straightforward program I had written in Rust,
partly as a learning exercise. But, unfortunately, after I got stuck
in, it looked to me like the effort would be wildly greater than I
was prepared for, so I gave up.

Since then I've been thinking about what I learned about how Rust is
packaged in Debian. I think I can see how to fix some of the
problems. Although I don't want to go charging in and try to tell
everyone how to do things, I felt I ought at least to write up my
ideas. Hence this blog post, which may become the first of a series.

This post is going to be about semver handling. I see problems with
other aspects of dependency handling and source code management and
traceability as well, and of course if my ideas find favour in
principle, there are a lot of details that need to be worked out,
including some kind of transition plan.

How Debian packages Rust, and build vs runtime dependencies

Today I will be discussing almost entirely build-dependencies; Rust
doesn't (yet?) support dynamic linking, so built Rust binaries don't
have Rusty dependencies.

However, things are a bit confusing because even the Debian "binary"
packages for Rust libraries contain pure source code. So for a Rust
library package, "building" the Debian binary package from the Debian
source package does not involve running the Rust compiler; it's just
file-copying and format conversion. The library's Rust dependencies
do not need to be installed on the "build" machine for this.

So I'm mostly going to be talking about Depends fields, which are
Debian's way of talking about runtime dependencies, even though they
are used only at build-time. The way this works is that some ultimate
leaf package (which is supposed to produce actual executable code)
Build-Depends on the libraries it needs, and those Depends on their
under-libraries, so that everything needed is installed.

What do dependencies mean and what are they for anyway?

In systems where packages declare dependencies on other packages, it
generally becomes necessary to support "versioned" dependencies. In
all but the most simple systems, this involves an ordering (or
similar) on version numbers and a way for a package A to specify that
it depends on certain versions of B.

Both Debian and Rust have this. Rust upstream crates have version
numbers and can specify their dependencies according to semver.
Debian's dependency system can represent that.

So it was natural for the designers of the scheme for packaging Rust
code in Debian to simply translate the Rust version dependencies to
Debian ones. However, while the two dependency schemes seem
equivalent in the abstract, their concrete real-world semantics are
totally different.

These different package management systems have different practices
and different meanings for dependencies. (Interestingly, the Python
world also has debates about the meaning and proper use of dependency
versions.)

The epistemological problem

Consider some package A which is known to depend on B. In general, it
is not trivial to know which versions of B will be satisfactory.
I.e., whether a new B, with potentially-breaking changes, will
actually break A.

Sometimes tooling can be used which calculates this (eg, the Debian
shlibdeps system for runtime dependencies) but this is unusual -
especially for build-time dependencies. Which versions of B are OK
can normally only be discovered by a human consideration of
changelogs etc., or by having a computer try particular combinations.

Few ecosystems with dependencies, in the Free Software community at
least, make an attempt to precisely calculate the versions of B that
are actually required to build some A. So it turns out that there are
three cases for a particular combination of A and B: it is believed
to work; it is known not to work; and: it is not known whether it
will work.

And, I am not aware of any dependency system that has an explicit
machine-readable representation for the "unknown" state, so that they
can say something like "A is known to depend on B; versions of B
before v1 are known to break; version v2 is known to work".
(Sometimes statements like that can be found in human-readable docs.)

That leaves two possibilities for the semantics of a dependency A
depends B, version(s) V..W: Precise: A will definitely work if B
matches V..W, and Optimistic: We have no reason to think B breaks
with any of V..W.

At first sight the latter does not seem useful, since how would the
package manager find a working combination? Taking Debian as an
example, which uses optimistic version dependencies, the answer is as
follows: The primary information about what package versions to use
is not only the dependencies, but mostly in which Debian release is
being targeted. (Other systems using optimistic version dependencies
could use the date of the build, i.e. use only packages that are
"current".)

                           Precise                 Optimistic

                                           Package developers,
People involved in  Package developers,    downstream developer/
version management  downstream developers/ users,
                    users.                 distribution QA and
                                           release managers.

Package developers                         A wide range of B can
declare versions V                         satisfy the declared
and dependency      It definitely works.   requirement.
ranges V..W so that

The principal                              Contextual, eg, Releases -
version data used   Only dependency        set(s) of packages
by the package      versions.              available.
manager

Version             Selecting working      Sequencing (ordering) of
dependencies are    combinations (out of   updates; QA.
for                 all that ever
                    existed).
                                           Use a particular release
Expected use        Downstream can combine of the whole system.
pattern by a        any                    Mixing-and-matching
downstream          declared-good          requires additional QA and
                    combination.           remedial work.

Downstreams are     Pessimistically
protected from      updating versions and  Whole-release QA.
breakage by         dependencies whenever
                    anything might go
                    wrong.
                                           A single version of each
A substantial                              package, except where
deployment will     Multiple versions of   there are actual
typically contain   many packages.         incompatibilities which
                                           are too hard to fix.

                    Top-down:              Bottom-up:
Package updates are Depending package      Depended-on package is
driven by           updates the declared   updated in the repository
                    metadata.              for the work-in-progress
                                           release.

So, while Rust and Debian have systems that look superficially
similar, they contain fundamentally different kinds of information.
Simply representing the Rust versions directly into Debian doesn't
work.

What is currently done by the Debian Rust Team is to manually patch
the dependency specifications, to relax them. This is very
labour-intensive, and there is little automation supporting either
decisionmaking or actually applying the resulting changes.

What to do

Desired end goal

To update a Rust package in Debian, that many things depend on, one
need simply update that package.

Debian's sophisticated build and CI infrastructure will try building
all the reverse-dependencies against the new version. Packages that
actually fail against the new dependency are flagged as suffering
from release-critical problems.

Debian Rust developers then update those other packages too. If the
problems turn out to be too difficult, it is possible to roll back.

If a problem with a depending packages is not resolved in a timely
fashion, priority is given to updating core packages, and the
depending package falls by the wayside (since it is empirically
unmaintainable, given available effort).

There is no routine manual patching of dependency metadata (or of
anything else).

Radical proposal

Debian should not precisely follow upstream Rust semver dependency
information. Instead, Debian should optimistically try the
combinations of packages that we want to have. The resulting
breakages will be discovered by automated QA; they will have to be
fixed by manual intervention of some kind, but usually, simply
updating the depending package will be sufficient.

This no longer ensures (unlike the upstream Rust scheme) that the
result is expected to build and work if the dependencies are
satisfied. But as discussed, we don't really need that property in
Debian. More important is the new property we gain: that we are able
to mix and match versions that we find work in practice, without a
great deal of manual effort.

Or to put it another way, in Debian we should do as a Rust upstream
maintainer does when they do the regular "update dependencies for new
semvers" task: we should update everything, see what breaks, and fix
those.

(In theory a Rust upstream package maintainer is supposed to do some
additional checks or something. But the practices are not
standardised and any checks one does almost never reveal anything
untoward, so in practice I think many Rust upstreams just update and
see what happens. The Rust upstream community has other mechanisms -
often, reactive ones - to deal with any problems. Debian should
subscribe to those same information sources, eg RustSec.)

Nobbling cargo

Somehow, when cargo is run to build Rust things against these Debian
packages, cargo's dependency system will have to be overridden so
that the version of the package that is actually selected by Debian's
package manager is used by cargo without complaint.

We probably don't want to change the Rust version numbers of Debian
Rust library packages, so this should be done by either presenting
cargo with an automatically-massaged Cargo.toml where the dependency
version restrictions are relaxed, or by using a modified version of
cargo which has special option(s) to relax certain dependencies.

Handling breakage

Rust packages in Debian should already be provided with autopkgtests
so that ci.debian.net will detect build breakages. Build breakages
will stop the updated dependency from migrating to the
work-in-progress release, Debian testing.

To resolve this, and allow forward progress, we will usually upload a
new version of the dependency containing an appropriate Breaks, and
either file an RC bug against the depending package, or update it.
This can be done after the upload of the base package.

Thus, resolution of breakage due to incompatibilities will be done
collaboratively within the Debian archive, rather than ad-hoc
locally. And it can be done without blocking.

My proposal prioritises the ability to make progress in the core,
over stability and in particular over retaining leaf packages. This
is not Debian's usual approach but given the Rust ecosystem's
practical attitudes to API design, versioning, etc., I think the
instability will be manageable. In practice fixing leaf packages is
not usually really that hard, but it's still work and the question is
what happens if the work doesn't get done. After all we are always a
shortage of effort - and we probably still will be, even if we get
rid of the makework clerical work of patching dependency versions
everywhere (so that usually no work is needed on depending packages).

Exceptions to the one-version rule

There will have to be some packages that we need to keep multiple
versions of. We won't want to update every depending package manually
when this happens. Instead, we'll probably want to set a version
number split: rdepends which want version <X will get the old one.

Details - a sketch

I'm going to sketch out some of the details of a scheme I think would
work. But I haven't thought this through fully. This is still mostly
at the handwaving stage. If my ideas find favour, we'll have to do
some detailed review and consider a whole bunch of edge cases I'm
glossing over.

The dependency specification consists of two halves: the depending
.deb's Depends (or, for a leaf package, Build-Depends) and the base
.deb' Version and perhaps Breaks and Provides.

Even though libraries vastly outnumber leaf packages, we still want
to avoid updating leaf Debian source packages simply to bump
dependencies.

Dependency encoding proposal

Compared to the existing scheme, I suggest we implement the
dependency relaxation by changing the depended-on package, rather
than the depending one.

So we retain roughly the existing semver translation for Depends
fields. But we drop all local patching of dependency versions.

Into every library source package we insert a new Debian-specific
metadata file declaring the earliest version that we uploaded. When
we translate a library source package to a .deb, the "binary" package
build adds Provides for every previous version.

The effect is that when one updates a base package, the usual
behaviour is to simply try to use it to satisfy everything that
depends on that base package. The Debian CI will report the build or
test failures of all the depending packages which the API changes
broke.

We will have a choice, then:

Breakage handling - update broken depending packages individually

If there are only a few packages that are broken, for each broken
dependency, we add an appropriate Breaks to the base binary package.
(The version field in the Breaks should be chosen narrowly, so that
it is possible to resolve it without changing the major version of
the dependency, eg by making a minor source change.)

When can then do one of the following:

  * Update the dependency from upstream, to a version which works
    with the new base. (Assuming there is one.) This should be the
    usual response.

  * Fix the dependency source code so that builds and works with the
    new base package. If this wasn't just a backport of an upstream
    change, we should send our fix upstream. (We should prefer to
    update the whole package, than to backport an API adjustment.)

  * File an RC bug against the dependency (which will eventually
    trigger autoremoval), or preemptively ask for the Debian release
    managers to remove the dependency from the work-in-progress
    release.

Breakage handling - declare new incompatible API in Debian

If the API changes are widespread and many dependencies are affected,
we should represent this by changing the in-Debian-source-package
metadata to arrange for fewer Provides lines to be generated -
withdrawing the Provides lines for earlier APIs.

Hopefully examination of the upstream changelog will show what the
main compat break is, and therefore tell us which Provides we still
want to retain.

This is like declaring Breaks for all the rdepends. We should do it
if many rdepends are affected.

Then, for each rdependency, we must choose one of the responses in
the bullet points above. In practice this will often be a mass bug
filing campaign, or large update campaign.

Breakage handling - multiple versions

Sometimes there will be a big API rewrite in some package, and we
can't easily update all of the rdependencies because the upstream
ecosystem is fragmented and the work involved in reconciling it all
is too substantial.

When this happens we will bite the bullet and include multiple
versions of the base package in Debian. The old version will become a
new source package with a version number in its name.

This is analogous to how key C/C++ libraries are handled.

Downsides of this scheme

The first obvious downside is that assembling some arbitrary set of
Debian Rust library packages, that satisfy the dependencies declared
by Debian, is no longer necessarily going to work. The combinations
that Debian has tested - Debian releases - will work, though. And at
least, any breakage will affect only people building Rust code using
Debian-supplied libraries.

Another less obvious problem is that because there is no such thing
as Build-Breaks (in a Debian binary package), the per-package update
scheme may result in no way to declare that a particular library
update breaks the build of a particular leaf package. In other words,
old source packages might no longer build when exposed to newer
versions of their build-dependencies, taken from a newer Debian
release. This is a thing that already happens in Debian, with source
packages in other languages, though.

Semver violation

I am proposing that Debian should routinely compile Rust packages
against dependencies in violation of the declared semver, and ship
the results to Debian's millions of users.

This sounds quite alarming! But I think it will not in fact lead to
shipping bad binaries, for the following reasons:

The Rust community strongly values safety (in a broad sense) in its
APIs. An API which is merely capable of insecure (or other seriously
bad) use is generally considered to be wrong. For example, such
situations are regarded as vulnerabilities by the RustSec project,
even if there is no suggestion that any actually-broken caller source
code exists, let alone that actually-broken compiled code is likely.

The Rust community also values alerting programmers to problems.
Nontrivial semantic changes to APIs are typically accompanied not
merely by a semver bump, but also by changes to names or types,
precisely to ensure that broken combinations of code do not compile.

Or to look at it another way, in Debian we would simply be doing what
many Rust upstream developers routinely do: bump the versions of
their dependencies, and throw it at the wall and hope it sticks. We
can mitigate the risks the same way a Rust upstream maintainer would:
when updating a package we should of course review the upstream
changelog for any gotchas. We should look at RustSec and other
upstream ecosystem tracking and authorship information.

Difficulties for another day

As I said, I see some other issues with Rust in Debian.

  * I think the library "feature flag" encoding scheme is
    unnecessary. I hope to explain this in a future essay.

  * I found Debian's approach to handling the source code for its
    Rust packages quite awkward; and, it has some troubling
    properties. Again, I hope to write about this later.

  * I get the impression that updating rustc in Debian is a very
    difficult process. I haven't worked on this myself and I don't
    feel qualified to have opinions about it. I hope others are
    thinking about how to make things easier.

Thanks all for your attention!

Tags:

  * computers,
  * debian,
  * rust

  * Previous Entry
  * Add Memory
  * Share This Entry
  * Next Entry

---------------------------------------------------------------------

  * 6 comments
  * Reply

---------------------------------------------------------------------
Flat | Top-Level Comments Only

(no subject)

Date: 2022-01-04 05:48 am (UTC)
From: (Anonymous)
This sounds like it will have the net effect of making Rust packages
more complex to package, by forcing the maintainer to do work that
upstream has not yet done (or has done and already found to not
work). That seems much more complex than just matching upstream
dependency constraints.

It seems doable for a maintainer who is already well-integrated
upstream and doing some of that work already. It seems much less
doable by a random maintainer who is just trying to package some
other package and needs to package some dependencies in the process.
In particular, this approach seems much less scalable.

  * Link
  * Reply
  * Thread
  * Hide 1 comment
  * Show 1 comment

(no subject)

Date: 2022-01-04 10:32 am (UTC)
diziet: (Default)
From: [personal profile] diziet
I don't agree. Especially given my experience of being "a random
maintainer who is just trying to package some other package". We
should not match the upstream dependency constraaints because they
are usually not real constraints, just a product of the semver
policy, as I have explained.

But I don't think these blog comments are a good place for this
debate.

  * Link
  * Reply
  * Thread from start
  * Parent

(no subject)

Date: 2022-01-06 02:34 am (UTC)
From: (Anonymous)
As a developer I'm fine with you upgrading my dependencies, as long
as you're moving to newer, not older versions.

I'm afraid that you will find that it is laborious and if I haven't
done it myself, it's because it's a chore and/or it breaks something
in a complex way.

And you really really need to have latest stable Rust all the time. I
currently explicitly tell users not to use Debian's Rust, because
it's a relic compatible with only 30% of maintained crates.

  * Link
  * Reply
  * Thread
  * Hide 1 comment
  * Show 1 comment

(no subject)

Date: 2022-01-07 11:25 am (UTC)
emperor: (Default)
From: [personal profile] emperor
As a sysadmin, this sort of approach makes me sad; I look after a
number of machines running a wide range of software written in a
number of languages. I cannot (and don't want to) track the security
(et al) issues in all the transitive dependencies and runtimes of all
of that lot, that's why I want to use a distribution. For particular
edge cases I might end up using upstream versions of things, but each
one of those is an ongoing commitment of time and effort.

  * Link
  * Reply
  * Thread from start
  * Parent

(no subject)

Date: 2022-01-07 11:59 am (UTC)
diziet: (Default)
From: [personal profile] diziet
FTR I don't think the comments section of my blog is the right place
for yet another round of the perennial complaints from upstreams who
feel they are entitled to dictate what distributions might do for
their users. Nor for another round of the perennial "distributions
are obsolete" nonsense.

I have already moderated out two comments along those lines, which I
additionally felt were quite rude.

  * Link
  * Reply

(no subject)

Date: 2022-01-12 12:02 am (UTC)
From: [personal profile] tglman
for this:
`We probably don't want to change the Rust version numbers of Debian
Rust library packages, so this should be done by either presenting
cargo with an automatically-massaged Cargo.toml where the dependency
version restrictions are relaxed, or by using a modified version of
cargo which has special option(s) to relax certain dependencies.'

could make sense to generate directly a Cargo.lock

  * Link
  * Reply

  * Previous Entry
  * Add Memory
  * Share This Entry
  * Next Entry

  * 6 comments
  * Reply

Flat | Top-Level Comments Only

Profile

diziet: (Default)
Ian Jackson
My Website

November 2024

S  M  T  W  T  F  S
               1  2
3  4  5  6  7  8  9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30

Most Popular Tags

  * 3d printing - 3 uses
  * administrivia - 1 use
  * bicon - 1 use
  * board games - 5 uses
  * chiark - 6 uses
  * chiark-utils - 1 use
  * climate emergency - 1 use
  * computers - 54 uses
  * covid - 2 uses
  * cycling - 2 uses
  * debian - 10 uses
  * derive-adhoc - 2 uses
  * derive-deftly - 1 use
  * derril water - 1 use
  * dgit - 2 uses
  * diversity - 2 uses
  * dkim-rotate - 2 uses
  * engineering - 2 uses
  * eudcc - 1 use
  * games - 1 use
  * git - 2 uses
  * hippotat - 2 uses
  * legal - 4 uses
  * nailing-cargo - 3 uses
  * otter - 3 uses
  * outflank-mailman - 1 use
  * partial-borrow - 1 use
  * personal - 4 uses
  * phone - 5 uses
  * politics - 6 uses
  * prefork-interp - 1 use
  * rust - 21 uses
  * rust-polyglot - 2 uses
  * secnet - 1 use
  * subdirmk - 2 uses
  * tor - 1 use
  * userv - 1 use
  * xen - 3 uses

Page Summary

  * (Anonymous) - (no subject)
  * (Anonymous) - (no subject)
  * [personal profile] diziet - (no subject)
  * [personal profile] tglman - (no subject)

Active Entries

  * 1: The Rust Foundation's 2nd bad draft trademark policy
  * 2: What your vote is worth - a back of the envelope calculation
  * 3: chiark's skip-skip-cross-up-grade
  * 4: How to use Rust on Debian (and Ubuntu, etc.)
  * 5: Don't use apt-get source; use dgit
  * 6: Hacking my filter coffee machine
  * 7: DigiSpark (ATTiny85) - Arduino, C, Rust, build systems
  * 8: DKIM: rotate and publish your keys
  * 9: I cycled to all the villages in alphabetical order
  * 10: Please vote in favour of the Debian Social Contract change

Style Credit

  * Style: Basic for Transmogrified by Yvonne

Expand Cut Tags

No cut tags
Powered by Dreamwidth Studios
Top of page