hngopher.com

       [HN Gopher] A tale of several distros joining forces for a commo...
       ___________________________________________________________________
        
       A tale of several distros joining forces for a common goal:
       reproducible builds
        
       Author : todsacerdoti
       Score  : 98 points
       Date   : 2025-02-08 11:38 UTC (11 hours ago)
        
 (HTM) web link (video.fosdem.org)
 (TXT) w3m dump (video.fosdem.org)
        
       | jmclnx wrote:
       | It is very cool to see distros working together for a common
       | goal.
       | 
       | But I still do not understand the point of "reproducible builds".
       | I know what they are, but to me the amount of work involved
       | outweighs the benefit.
       | 
       | I even heard NetBSD is also working on "reproducible builds". So
       | maybe I am missing something :)
        
         | david-gpu wrote:
         | It's a safety measure. Reproducible builds ensure identical
         | binaries are produced from the same source. They help detect
         | e.g. hidden backdoors.
        
         | samsartor wrote:
         | The video gets into that. The main purpose is to verify that
         | the binary you're running came from the actual source code.
        
         | ssivark wrote:
         | What makes you so confident that the benefit is less than the
         | effort?
         | 
         | Given the increasing likelihood of supply chain attacks, isn't
         | this a very prudent precaution?
        
         | 3s wrote:
         | A really important application of reproducible builds is
         | running code inside Secure Enclaves that has been committed to
         | on a public transparency log. A client can connect to a remote
         | secure enclave that can then prove to the client that it's
         | running the commit code via a process known as remote
         | attestation. It's pretty cool stuff. However it's only possible
         | if the build inside the enclave is reproducible (deterministic)
         | and always identical to the build on the transparency log
        
         | NegativeLatency wrote:
         | Would make stuff like this harder to pull off:
         | https://en.wikipedia.org/wiki/XZ_Utils_backdoor
        
           | nindalf wrote:
           | Are you sure?
           | 
           | If I'm understanding correctly, the malicious code was
           | introduced as part of the test code, so no matter who
           | compiled it, they'd get a binary with the same (malicious)
           | functionality. Heck, it might even have been reproducibly
           | malicious.
           | 
           | The real crazy part was that it was modifying the
           | functionality of sshd at _runtime_ , allowing the attacker to
           | log into any system.
           | 
           | Reproducibility of either sshd or xz wouldn't have stopped
           | this attack.
           | 
           | That's my reading of https://research.swtch.com/xz-script.
        
             | solarkraft wrote:
             | I tend to agree. The exploit was (by detour) committed to
             | the source.
        
               | ramses0 wrote:
               | Technically, interestingly, if `bazel` were used as the
               | build tool, it would avoid the straightforward ability to
               | cross-contaminate the build executable with the test
               | code...
               | 
               | Yeah, `bazel run test:...` would have access to the test
               | files, but `bazel build xz:executable` would not (by
               | default) be able to pull in extra shenanigans from the
               | test files (and I think there's generally linting and
               | formatting rules required by default with `BUILD.bazel`
               | files, reducing another sneak-vectors)
        
           | kpcyrd wrote:
           | It unfortunately doesn't help in cases like this.
           | Reproducible Builds gives you a trusted path from source to
           | binary, but it doesn't help with backdoors in the source
           | code/build instructions.
           | 
           | For that we'd need some sort of source code reviewing effort
           | like https://github.com/crev-dev/cargo-crev implements. I've
           | started whatsrc.org to keep track of the source code inputs
           | we're putting into our computers (that would benefit from
           | reviews), but the conclusion is also somewhat "it's too
           | much".
        
         | champtar wrote:
         | With reproducible build you know that what you test on your dev
         | laptop is the same as what will go out from your CI, and if
         | hash mismatch you can chase why. For a concrete exemple,
         | Mellanox driver configure script will auto detect if it's
         | running under docker and change a compile flags, so if you
         | build in a container using podman you get a different result.
        
         | solarkraft wrote:
         | > but to me the amount of work involved outweighs the benefit
         | 
         | I don't know whether I'd spend this much work on such an
         | abstract goal, but what reproducibility changes really is quite
         | amazing. It vastly increases trust in published binaries and
         | obviates the need for signing and the security benefit of
         | compiling software yourself.
        
           | gruez wrote:
           | >obviates the need for signing and the security benefit of
           | compiling software yourself
           | 
           | Not really. Most people still would rely on signatures
           | because they can't be expected to compile everything from
           | scratch just to verify their download is authentic. Moreover
           | even though reproducible builds make verification easier, it
           | still requires someone to sound the alarm. For less popular
           | packages there might be nobody checking any particular build
           | is backdoored, because most people see "reproducible builds"
           | and they assume Somebody Else is doing the reproduction.
        
             | mjl- wrote:
             | for transparency of reproducible builds of go applications,
             | i made https://beta.gobuilds.org/. it compiles any publicly
             | available go application on-demand, with a toolchain
             | version of your choice (latest stable by default), for a
             | platform of your choice. all (pure) go applications are
             | reproducible by default, including when cross-compiled, and
             | go toolchains run nothing provided by the go module
             | (awesome properties!). the source code is verified through
             | the go sum database (a transparency log containing go
             | modules). the hash of the resulting binary is added to
             | gobuild's own transparency log. so it can be publicly
             | verified. the gobuilds service builds the binary itself,
             | and has another instance (on a different platform & config)
             | build the binary too, to ensure the binary is really
             | reproducible (i'ld like other instances that i don't run
             | myself as secondaries too). i no longer publish binaries
             | for my applications (that i write in go). i just point to
             | the "latest"-build link for the go module at gobuilds. also
             | makes it easy for users (including myself) to get new
             | builds for new go toolchains (which may include fixes to
             | the (relatively large, and often used) standard library).
             | 
             | you still may not trust the public gobuilds instance. my
             | hope is that people (eg software projects themselves, or
             | distros, or other kinds of communities) will run & use
             | their own gobuild instances and verify their builds against
             | the public gobuilds service. win-win: gives them assurance
             | their builds are really reproducible, and builds trust in
             | the public gobuilds (keeping it honest, if someone sees a
             | hash mismatch, they will speak up).
             | 
             | i usually don't get much enthusiasm for it though. (:
        
         | noirscape wrote:
         | Practically speaking, the idea with a reproducible build is
         | that you can take the source files they used, run their
         | instructions the same way they did and get _hash for hash_ [0]
         | the specific resulting executable.
         | 
         | The main benefit is that you can trust that the resulting
         | binary file being served matches the source code that it's
         | build from. This mostly matters for distros in that they build
         | from a source package repository, but anyone running a mirror
         | could hypothetically replace the package with another
         | (potentially malicious) package, leading users to install
         | malicious tooling. It mainly matters for distros because pretty
         | much every distro out there runs on third party mirrors (often
         | ran by universities, but also just people who want to help)
         | rather than on direct upstream; packages get uploaded to a main
         | server, then mirrors copy from that main server (to reduce
         | network traffic load on the main server). Right now, mirror
         | trust is mostly "we assume you're not gonna be evil, until we
         | get complaints". If the build is reproducible, the software can
         | inherently confirm that the file they're getting is
         | trustworthy, making "getting complaints" much easier to
         | confirm.
         | 
         | It can also speed up the overall building process; if the
         | package source code hasn't changed, you can also always assume
         | that the resulting binary hasn't changed (meaning you can use
         | hashes instead of relying on mtime like make does). Docker
         | build cache works in a somewhat similar way (although docker
         | isn't inherently deterministic).
         | 
         | Devwise, you can also reconstruct a build much easier if it's
         | reproducible; ie. if you've accidentally thrown away the .elf
         | file for debugging, if your build is deterministic, you can
         | just rerun the build and get the same .elf file again.
         | 
         | [0]: While not a problem for Linux distros, in cases where you
         | need a secret to sign an application, reproducible typically
         | means "identical _except_ for the signature " instead. F-Droid
         | uses this for example to figure out if they should use
         | buildserver stuff or the original APKs:
         | https://f-droid.org/docs/Reproducible_Builds/
        
           | yellow_lead wrote:
           | > a mirror could hypothetically replace the package with
           | another (potentially malicious) package, leading users to
           | install malicious tooling.
           | 
           | It was my assumption that a mirror is required to host a
           | build that has a hash conforming to the original. Is that not
           | the case?
        
             | gruez wrote:
             | More specifically the packages are signed by the distro and
             | automatically checked, so a mirror can't go rogue even if
             | it wanted to.
        
             | jerf wrote:
             | Yes, the real attack isn't that mirrors change the files,
             | the real attack is that just because a distro packages
             | Binary X and Source X, it is difficult without reproducible
             | builds to prove that Source X actually did produce Binary
             | X. It could have been compiled with a trojan in it between
             | the source and binary.
        
           | SkiFire13 wrote:
           | > meaning you can use hashes instead of relying on mtime like
           | make does
           | 
           | Note that mtime still has the advantage of being faster than
           | hashing.
        
         | jcranmer wrote:
         | The main benefit you'll hear touted is something along the
         | lines of being able to get an attestation that the resulting
         | artifact was built following the steps claimed to build it. I
         | think that's a somewhat overstated benefit, though, as it's not
         | clear to me that this is an avenue of attack used in practice,
         | given the frequency with which software already has
         | vulnerabilities usable for exploits, or the ease with which one
         | can insert a backdoor into the source code (e.g., the xz
         | backdoor).
         | 
         | I think the actual main utility is that the process has done a
         | very good job of rooting out several causes of unintentional
         | nondeterminism in the build process. I say unintentional
         | because the two main causes of unreproducibility, by several
         | orders of magnitude, are timestamps being embedded _everywhere_
         | and absolute paths being embedded everywhere, and those are
         | rather expected. But some of the unreproducibility comes from
         | things like accidental reliance on inodes in file paths (i.e.,
         | doing  "for file in listdir()" without sorting the results of
         | listdir) or the compiler itself accidentally sorting based on
         | pointer address (which is unreproducible on ASLR systems).
        
           | uecker wrote:
           | The xz backdoor was news because they went to a lot of effort
           | to try to hide a backdoor in the source (actually a binary
           | file in the source) and still failed. In contrast, without
           | reproducible builds it is trivial for a maintainer with
           | upload rights (or somebody who managed to get the credentials
           | from a maintainer) to insert a backdoor into a binary. And it
           | is then virtually impossible to detect.
        
         | zelphirkalt wrote:
         | It is also about reliably being able to built things a month
         | from now, in a year, in 5 years, etc.
        
       | anotherhue wrote:
       | Reproducibility changes everything, because nothing changes. We
       | go from shamans chanting incantations over a blessed code base to
       | a mathematical function with an algebra of system composition.
       | 
       | Let me give you the simplest example, when builds are
       | reproducible you don't need package repositories, you need build
       | caches.
       | 
       | All the problems with maintaining a repository (save bandwidth)
       | evaporate.
        
         | okanat wrote:
         | > Let me give you the simplest example, when builds are
         | reproducible you don't need package repositories, you need
         | build caches.
         | 
         | The type of reproducibility is different here. What you mention
         | is possible via a stable compiler ABI already. However one
         | needs to keep the source code the same. Without a stable
         | compiler ABI, you may or may not fix it depending on what the
         | compiler does.
         | 
         | The goal of reproducible builds is removing sources of
         | environment-dependent behavior at the build level instead of
         | the compiler level. So given all the same dependencies and same
         | build commands your binaries should match wherever and whenever
         | you compile them. The distros and software developers also made
         | a huge effort to remove any kind of environment-dependent
         | commands.
         | 
         | Different distros still have differences in the build commands
         | they issue and the set of dependencies they enable. The space
         | to cache each individual possible output would be enormous and
         | impractical. So you would still need repositories.
        
           | anotherhue wrote:
           | All true, but I can't not take this opportunity to shill
           | NixOS which has meaningfully addressed many of these issues,
           | and is indeed spending impractical amounts of money storing
           | build outputs ($10k/m).
           | 
           | https://discourse.nixos.org/t/the-nixos-foundations-call-
           | to-...
           | 
           | It is absolutely better to remove build entropy at the source
           | code stage, but until all software is written that way there
           | are a few build-environment tricks we can use along the way.
        
             | zelphirkalt wrote:
             | I highly doubt, that we will get anywhere close to
             | developers understanding and valuing reproducibility any
             | time soon. Maybe in 20y or so. Basically outside of the
             | corners like Nix, Guix and maybe a few random people in
             | discussions about issues of package managers, I have not
             | met anyone knowing how to and caring about reproducibility.
             | 
             | Meanwhile I enjoy setting up my own GNU Guile projects with
             | a Makefile that sets up a reproducible Guix shell in which
             | my project is run, so that I get the same result on various
             | devices, only needing to issue a single easily memorized or
             | discoverable Makefile target call. Most developers I met
             | don't know how to set something like this up. Provided no
             | one messes with guix package manager commits and guix
             | infrastructure still exists, my projects will run in 10y
             | just like they do today, with reproducible result. Neat.
             | 
             | Recently I have taken some time to have this for Ocaml as
             | well. Took some asking on the mailing list, but now works.
             | No need to have anything installed prior to running the
             | Makefile target other than Make and Guix package manager.
        
               | anotherhue wrote:
               | I assume you tried Flakes also? Does Guile have a better
               | approach to this problem (other than the general language
               | difference (everyone hate nix)).
               | 
               | Edit: Worth looking at
               | https://github.com/numtide/devshell
        
               | zelphirkalt wrote:
               | I have not tried Flakes. Guile's main way of installing
               | packages, I believe, is using Guix package manager. And
               | Guix is written using Guile.
        
           | jimmaswell wrote:
           | > The space to cache each individual possible output would be
           | enormous and impractical. So you would still need
           | repositories.
           | 
           | I've been using Gentoo for the past few weeks and this was my
           | thought. There are so many ways to compile packages depending
           | on your needs. It is practical to make binary sets for a few
           | common configurations that are good enough for most people
           | for each distro, though.
           | 
           | (Installing and using Gentoo has been an incredible learning
           | experience. You have to go into it with the _desire_ to take
           | long tangents filling the gaps in your knowledge re: shared
           | libraries, kernel modules, your init system of choice, gcc,
           | your windowing system of choice, bootloaders, bash, etc. I
           | feel like I 've made more of a quantum leap in my Linux
           | skills the past month than I have in years, and it's been
           | quite fun and rewarding.)
           | 
           | > Different distros still have differences in the build
           | commands they issue and the set of dependencies they enable.
           | 
           | Would it even make sense in theory for this not to be the
           | case? What is a Linux distro but a set of programs,
           | libraries, and environment choices chosen to be run on the
           | Linux kernel?
        
           | kbaker wrote:
           | The Yocto Project (from the embedded space) has
           | reproducibility as one of its goals. Since everything is
           | built from source and from scratch, even the build
           | toolchains, it is not too big of a step.
           | 
           | Seems like Yocto would make the base for a good general
           | purpose desktop distro (if there is not one out there
           | already.)
        
         | BiteCode_dev wrote:
         | You mean you don't need dep resolution, signatures and
         | moderation? How so?
        
           | anotherhue wrote:
           | Good questions:
           | 
           | 1. If the deps are also themselves reproducible then you
           | refer to a fixed point version of them and (at least for nix)
           | the package manager works out the rest.
           | 
           | 2. Signatures are a trust mechanism, if a cache feeds you bad
           | data in response to a query then that's absolutely an issue,
           | but since there can be multiple caches (or your own local
           | spot-checks) it becomes easier to detect if a cache is
           | returning bad data. A hyper-targetted attack would still get
           | you unless you decide to manually build certain packages, but
           | that's no different than existing repos.
           | 
           | Manually building might sound impractical but it doesn't
           | actually take that long, probably less than a day for a
           | desktop environment, which might acceptable in a high-trust
           | environment if amortized. I should add that the process is
           | fully automatic, it just takes longer than using a cache.
           | 
           | 3. Moderation I don't have a good answer to, anyone can run
           | an apt server, or publish a flake.nix file to a repo. Some
           | would say it is censorship resistant.
        
         | XorNot wrote:
         | This doesn't work at all: a build cache is meaningless in this
         | context because the only thing you can do is rebuild the code
         | to verify that what is in the cache is what comes from the code
         | you expect...or to bootstrap the whole compiler chain, then use
         | cryptographic signatures to chain trust from a matching
         | compiler hash up to all the dependent outputs.
         | 
         | It certainly doesn't change the nature of packaging in any real
         | way in that case.
        
           | anotherhue wrote:
           | Others can rebuild the chain and the results can be compared.
           | Without reproducibility there is no concept of comparison.
        
             | XorNot wrote:
             | Which is still a cryptographic trust system - i.e. who are
             | the others who built it and are they suitably independent
             | or working from similar sources?
        
       | lowkey wrote:
       | Original link didn't work for me. Here is the source
       | https://fosdem.org/2025/schedule/event/fosdem-2025-6479-a-ta...
        
       | pelasaco wrote:
       | I think for a distro like Talos Linux, with only 12 binaries,
       | will be much easier to accomplish it
        
       | kpcyrd wrote:
       | hello, I'm one of the speakers. I've been working on this since
       | 2017, happy to answer any questions the hackernews crowd might
       | have.
        
         | algo_trader wrote:
         | looking at [1][2] are the slides available online? i am on
         | chrome, getting just static html
         | 
         | [1] https://salsa.debian.org/reproducible-builds/reproducible-
         | pr... [2]
         | https://fosdem.org/2025/schedule/event/fosdem-2025-6479-a-ta...
        
         | foobazfoo wrote:
         | Fantastic project. Thank you for all your efforts.
         | 
         | Regarding the reproducible bootstrapping problem, what is your
         | project's policy on building from binary sources? For instance,
         | Zig is written in zig and bootstraps from a binary wasm file
         | which is translated to C:
         | https://github.com/ziglang/zig/tree/master/stage1
         | 
         | Golang has an even more complicated bootstrapping procedure
         | requiring to build each successive version of the compiler to
         | get to the most recent version.
        
       ___________________________________________________________________
       (page generated 2025-02-08 23:00 UTC)