[HN Gopher] A tale of several distros joining forces for a commo...
___________________________________________________________________
A tale of several distros joining forces for a common goal:
reproducible builds
Author : todsacerdoti
Score : 98 points
Date : 2025-02-08 11:38 UTC (11 hours ago)
(HTM) web link (video.fosdem.org)
(TXT) w3m dump (video.fosdem.org)
| jmclnx wrote:
| It is very cool to see distros working together for a common
| goal.
|
| But I still do not understand the point of "reproducible builds".
| I know what they are, but to me the amount of work involved
| outweighs the benefit.
|
| I even heard NetBSD is also working on "reproducible builds". So
| maybe I am missing something :)
| david-gpu wrote:
| It's a safety measure. Reproducible builds ensure identical
| binaries are produced from the same source. They help detect
| e.g. hidden backdoors.
| samsartor wrote:
| The video gets into that. The main purpose is to verify that
| the binary you're running came from the actual source code.
| ssivark wrote:
| What makes you so confident that the benefit is less than the
| effort?
|
| Given the increasing likelihood of supply chain attacks, isn't
| this a very prudent precaution?
| 3s wrote:
| A really important application of reproducible builds is
| running code inside Secure Enclaves that has been committed to
| on a public transparency log. A client can connect to a remote
| secure enclave that can then prove to the client that it's
| running the commit code via a process known as remote
| attestation. It's pretty cool stuff. However it's only possible
| if the build inside the enclave is reproducible (deterministic)
| and always identical to the build on the transparency log
| NegativeLatency wrote:
| Would make stuff like this harder to pull off:
| https://en.wikipedia.org/wiki/XZ_Utils_backdoor
| nindalf wrote:
| Are you sure?
|
| If I'm understanding correctly, the malicious code was
| introduced as part of the test code, so no matter who
| compiled it, they'd get a binary with the same (malicious)
| functionality. Heck, it might even have been reproducibly
| malicious.
|
| The real crazy part was that it was modifying the
| functionality of sshd at _runtime_ , allowing the attacker to
| log into any system.
|
| Reproducibility of either sshd or xz wouldn't have stopped
| this attack.
|
| That's my reading of https://research.swtch.com/xz-script.
| solarkraft wrote:
| I tend to agree. The exploit was (by detour) committed to
| the source.
| ramses0 wrote:
| Technically, interestingly, if `bazel` were used as the
| build tool, it would avoid the straightforward ability to
| cross-contaminate the build executable with the test
| code...
|
| Yeah, `bazel run test:...` would have access to the test
| files, but `bazel build xz:executable` would not (by
| default) be able to pull in extra shenanigans from the
| test files (and I think there's generally linting and
| formatting rules required by default with `BUILD.bazel`
| files, reducing another sneak-vectors)
| kpcyrd wrote:
| It unfortunately doesn't help in cases like this.
| Reproducible Builds gives you a trusted path from source to
| binary, but it doesn't help with backdoors in the source
| code/build instructions.
|
| For that we'd need some sort of source code reviewing effort
| like https://github.com/crev-dev/cargo-crev implements. I've
| started whatsrc.org to keep track of the source code inputs
| we're putting into our computers (that would benefit from
| reviews), but the conclusion is also somewhat "it's too
| much".
| champtar wrote:
| With reproducible build you know that what you test on your dev
| laptop is the same as what will go out from your CI, and if
| hash mismatch you can chase why. For a concrete exemple,
| Mellanox driver configure script will auto detect if it's
| running under docker and change a compile flags, so if you
| build in a container using podman you get a different result.
| solarkraft wrote:
| > but to me the amount of work involved outweighs the benefit
|
| I don't know whether I'd spend this much work on such an
| abstract goal, but what reproducibility changes really is quite
| amazing. It vastly increases trust in published binaries and
| obviates the need for signing and the security benefit of
| compiling software yourself.
| gruez wrote:
| >obviates the need for signing and the security benefit of
| compiling software yourself
|
| Not really. Most people still would rely on signatures
| because they can't be expected to compile everything from
| scratch just to verify their download is authentic. Moreover
| even though reproducible builds make verification easier, it
| still requires someone to sound the alarm. For less popular
| packages there might be nobody checking any particular build
| is backdoored, because most people see "reproducible builds"
| and they assume Somebody Else is doing the reproduction.
| mjl- wrote:
| for transparency of reproducible builds of go applications,
| i made https://beta.gobuilds.org/. it compiles any publicly
| available go application on-demand, with a toolchain
| version of your choice (latest stable by default), for a
| platform of your choice. all (pure) go applications are
| reproducible by default, including when cross-compiled, and
| go toolchains run nothing provided by the go module
| (awesome properties!). the source code is verified through
| the go sum database (a transparency log containing go
| modules). the hash of the resulting binary is added to
| gobuild's own transparency log. so it can be publicly
| verified. the gobuilds service builds the binary itself,
| and has another instance (on a different platform & config)
| build the binary too, to ensure the binary is really
| reproducible (i'ld like other instances that i don't run
| myself as secondaries too). i no longer publish binaries
| for my applications (that i write in go). i just point to
| the "latest"-build link for the go module at gobuilds. also
| makes it easy for users (including myself) to get new
| builds for new go toolchains (which may include fixes to
| the (relatively large, and often used) standard library).
|
| you still may not trust the public gobuilds instance. my
| hope is that people (eg software projects themselves, or
| distros, or other kinds of communities) will run & use
| their own gobuild instances and verify their builds against
| the public gobuilds service. win-win: gives them assurance
| their builds are really reproducible, and builds trust in
| the public gobuilds (keeping it honest, if someone sees a
| hash mismatch, they will speak up).
|
| i usually don't get much enthusiasm for it though. (:
| noirscape wrote:
| Practically speaking, the idea with a reproducible build is
| that you can take the source files they used, run their
| instructions the same way they did and get _hash for hash_ [0]
| the specific resulting executable.
|
| The main benefit is that you can trust that the resulting
| binary file being served matches the source code that it's
| build from. This mostly matters for distros in that they build
| from a source package repository, but anyone running a mirror
| could hypothetically replace the package with another
| (potentially malicious) package, leading users to install
| malicious tooling. It mainly matters for distros because pretty
| much every distro out there runs on third party mirrors (often
| ran by universities, but also just people who want to help)
| rather than on direct upstream; packages get uploaded to a main
| server, then mirrors copy from that main server (to reduce
| network traffic load on the main server). Right now, mirror
| trust is mostly "we assume you're not gonna be evil, until we
| get complaints". If the build is reproducible, the software can
| inherently confirm that the file they're getting is
| trustworthy, making "getting complaints" much easier to
| confirm.
|
| It can also speed up the overall building process; if the
| package source code hasn't changed, you can also always assume
| that the resulting binary hasn't changed (meaning you can use
| hashes instead of relying on mtime like make does). Docker
| build cache works in a somewhat similar way (although docker
| isn't inherently deterministic).
|
| Devwise, you can also reconstruct a build much easier if it's
| reproducible; ie. if you've accidentally thrown away the .elf
| file for debugging, if your build is deterministic, you can
| just rerun the build and get the same .elf file again.
|
| [0]: While not a problem for Linux distros, in cases where you
| need a secret to sign an application, reproducible typically
| means "identical _except_ for the signature " instead. F-Droid
| uses this for example to figure out if they should use
| buildserver stuff or the original APKs:
| https://f-droid.org/docs/Reproducible_Builds/
| yellow_lead wrote:
| > a mirror could hypothetically replace the package with
| another (potentially malicious) package, leading users to
| install malicious tooling.
|
| It was my assumption that a mirror is required to host a
| build that has a hash conforming to the original. Is that not
| the case?
| gruez wrote:
| More specifically the packages are signed by the distro and
| automatically checked, so a mirror can't go rogue even if
| it wanted to.
| jerf wrote:
| Yes, the real attack isn't that mirrors change the files,
| the real attack is that just because a distro packages
| Binary X and Source X, it is difficult without reproducible
| builds to prove that Source X actually did produce Binary
| X. It could have been compiled with a trojan in it between
| the source and binary.
| SkiFire13 wrote:
| > meaning you can use hashes instead of relying on mtime like
| make does
|
| Note that mtime still has the advantage of being faster than
| hashing.
| jcranmer wrote:
| The main benefit you'll hear touted is something along the
| lines of being able to get an attestation that the resulting
| artifact was built following the steps claimed to build it. I
| think that's a somewhat overstated benefit, though, as it's not
| clear to me that this is an avenue of attack used in practice,
| given the frequency with which software already has
| vulnerabilities usable for exploits, or the ease with which one
| can insert a backdoor into the source code (e.g., the xz
| backdoor).
|
| I think the actual main utility is that the process has done a
| very good job of rooting out several causes of unintentional
| nondeterminism in the build process. I say unintentional
| because the two main causes of unreproducibility, by several
| orders of magnitude, are timestamps being embedded _everywhere_
| and absolute paths being embedded everywhere, and those are
| rather expected. But some of the unreproducibility comes from
| things like accidental reliance on inodes in file paths (i.e.,
| doing "for file in listdir()" without sorting the results of
| listdir) or the compiler itself accidentally sorting based on
| pointer address (which is unreproducible on ASLR systems).
| uecker wrote:
| The xz backdoor was news because they went to a lot of effort
| to try to hide a backdoor in the source (actually a binary
| file in the source) and still failed. In contrast, without
| reproducible builds it is trivial for a maintainer with
| upload rights (or somebody who managed to get the credentials
| from a maintainer) to insert a backdoor into a binary. And it
| is then virtually impossible to detect.
| zelphirkalt wrote:
| It is also about reliably being able to built things a month
| from now, in a year, in 5 years, etc.
| anotherhue wrote:
| Reproducibility changes everything, because nothing changes. We
| go from shamans chanting incantations over a blessed code base to
| a mathematical function with an algebra of system composition.
|
| Let me give you the simplest example, when builds are
| reproducible you don't need package repositories, you need build
| caches.
|
| All the problems with maintaining a repository (save bandwidth)
| evaporate.
| okanat wrote:
| > Let me give you the simplest example, when builds are
| reproducible you don't need package repositories, you need
| build caches.
|
| The type of reproducibility is different here. What you mention
| is possible via a stable compiler ABI already. However one
| needs to keep the source code the same. Without a stable
| compiler ABI, you may or may not fix it depending on what the
| compiler does.
|
| The goal of reproducible builds is removing sources of
| environment-dependent behavior at the build level instead of
| the compiler level. So given all the same dependencies and same
| build commands your binaries should match wherever and whenever
| you compile them. The distros and software developers also made
| a huge effort to remove any kind of environment-dependent
| commands.
|
| Different distros still have differences in the build commands
| they issue and the set of dependencies they enable. The space
| to cache each individual possible output would be enormous and
| impractical. So you would still need repositories.
| anotherhue wrote:
| All true, but I can't not take this opportunity to shill
| NixOS which has meaningfully addressed many of these issues,
| and is indeed spending impractical amounts of money storing
| build outputs ($10k/m).
|
| https://discourse.nixos.org/t/the-nixos-foundations-call-
| to-...
|
| It is absolutely better to remove build entropy at the source
| code stage, but until all software is written that way there
| are a few build-environment tricks we can use along the way.
| zelphirkalt wrote:
| I highly doubt, that we will get anywhere close to
| developers understanding and valuing reproducibility any
| time soon. Maybe in 20y or so. Basically outside of the
| corners like Nix, Guix and maybe a few random people in
| discussions about issues of package managers, I have not
| met anyone knowing how to and caring about reproducibility.
|
| Meanwhile I enjoy setting up my own GNU Guile projects with
| a Makefile that sets up a reproducible Guix shell in which
| my project is run, so that I get the same result on various
| devices, only needing to issue a single easily memorized or
| discoverable Makefile target call. Most developers I met
| don't know how to set something like this up. Provided no
| one messes with guix package manager commits and guix
| infrastructure still exists, my projects will run in 10y
| just like they do today, with reproducible result. Neat.
|
| Recently I have taken some time to have this for Ocaml as
| well. Took some asking on the mailing list, but now works.
| No need to have anything installed prior to running the
| Makefile target other than Make and Guix package manager.
| anotherhue wrote:
| I assume you tried Flakes also? Does Guile have a better
| approach to this problem (other than the general language
| difference (everyone hate nix)).
|
| Edit: Worth looking at
| https://github.com/numtide/devshell
| zelphirkalt wrote:
| I have not tried Flakes. Guile's main way of installing
| packages, I believe, is using Guix package manager. And
| Guix is written using Guile.
| jimmaswell wrote:
| > The space to cache each individual possible output would be
| enormous and impractical. So you would still need
| repositories.
|
| I've been using Gentoo for the past few weeks and this was my
| thought. There are so many ways to compile packages depending
| on your needs. It is practical to make binary sets for a few
| common configurations that are good enough for most people
| for each distro, though.
|
| (Installing and using Gentoo has been an incredible learning
| experience. You have to go into it with the _desire_ to take
| long tangents filling the gaps in your knowledge re: shared
| libraries, kernel modules, your init system of choice, gcc,
| your windowing system of choice, bootloaders, bash, etc. I
| feel like I 've made more of a quantum leap in my Linux
| skills the past month than I have in years, and it's been
| quite fun and rewarding.)
|
| > Different distros still have differences in the build
| commands they issue and the set of dependencies they enable.
|
| Would it even make sense in theory for this not to be the
| case? What is a Linux distro but a set of programs,
| libraries, and environment choices chosen to be run on the
| Linux kernel?
| kbaker wrote:
| The Yocto Project (from the embedded space) has
| reproducibility as one of its goals. Since everything is
| built from source and from scratch, even the build
| toolchains, it is not too big of a step.
|
| Seems like Yocto would make the base for a good general
| purpose desktop distro (if there is not one out there
| already.)
| BiteCode_dev wrote:
| You mean you don't need dep resolution, signatures and
| moderation? How so?
| anotherhue wrote:
| Good questions:
|
| 1. If the deps are also themselves reproducible then you
| refer to a fixed point version of them and (at least for nix)
| the package manager works out the rest.
|
| 2. Signatures are a trust mechanism, if a cache feeds you bad
| data in response to a query then that's absolutely an issue,
| but since there can be multiple caches (or your own local
| spot-checks) it becomes easier to detect if a cache is
| returning bad data. A hyper-targetted attack would still get
| you unless you decide to manually build certain packages, but
| that's no different than existing repos.
|
| Manually building might sound impractical but it doesn't
| actually take that long, probably less than a day for a
| desktop environment, which might acceptable in a high-trust
| environment if amortized. I should add that the process is
| fully automatic, it just takes longer than using a cache.
|
| 3. Moderation I don't have a good answer to, anyone can run
| an apt server, or publish a flake.nix file to a repo. Some
| would say it is censorship resistant.
| XorNot wrote:
| This doesn't work at all: a build cache is meaningless in this
| context because the only thing you can do is rebuild the code
| to verify that what is in the cache is what comes from the code
| you expect...or to bootstrap the whole compiler chain, then use
| cryptographic signatures to chain trust from a matching
| compiler hash up to all the dependent outputs.
|
| It certainly doesn't change the nature of packaging in any real
| way in that case.
| anotherhue wrote:
| Others can rebuild the chain and the results can be compared.
| Without reproducibility there is no concept of comparison.
| XorNot wrote:
| Which is still a cryptographic trust system - i.e. who are
| the others who built it and are they suitably independent
| or working from similar sources?
| lowkey wrote:
| Original link didn't work for me. Here is the source
| https://fosdem.org/2025/schedule/event/fosdem-2025-6479-a-ta...
| pelasaco wrote:
| I think for a distro like Talos Linux, with only 12 binaries,
| will be much easier to accomplish it
| kpcyrd wrote:
| hello, I'm one of the speakers. I've been working on this since
| 2017, happy to answer any questions the hackernews crowd might
| have.
| algo_trader wrote:
| looking at [1][2] are the slides available online? i am on
| chrome, getting just static html
|
| [1] https://salsa.debian.org/reproducible-builds/reproducible-
| pr... [2]
| https://fosdem.org/2025/schedule/event/fosdem-2025-6479-a-ta...
| foobazfoo wrote:
| Fantastic project. Thank you for all your efforts.
|
| Regarding the reproducible bootstrapping problem, what is your
| project's policy on building from binary sources? For instance,
| Zig is written in zig and bootstraps from a binary wasm file
| which is translated to C:
| https://github.com/ziglang/zig/tree/master/stage1
|
| Golang has an even more complicated bootstrapping procedure
| requiring to build each successive version of the compiler to
| get to the most recent version.
___________________________________________________________________
(page generated 2025-02-08 23:00 UTC)