[HN Gopher] Is NixOS truly reproducible?
       ___________________________________________________________________
        
       Is NixOS truly reproducible?
        
       Author : pabs3
       Score  : 74 points
       Date   : 2025-02-09 09:56 UTC (3 days ago)
        
 (HTM) web link (luj.fr)
 (TXT) w3m dump (luj.fr)
        
       | opan wrote:
       | Although I'm aware many distros care somewhat about reproducible
       | builds these days, I tend to associate it primarily with Guix
       | System, I never really considered it a feature of NixOS, having
       | used both (though spent much more time on Guix System now).
       | 
       | For the record, even in the land of Guix I semi-regularly see
       | reports on the bug-guix mailing list that some package isn't
       | reproducible. It seems to get treated as a bug and fixed then.
       | With that in mind, and personally considering Guix kind of the
       | flagship of these efforts, it doesn't surprise me if anyone else
       | doesn't have perfectly reproducible builds yet either. Especially
       | Nix with the huge number of things in nixpkgs. It's probably
       | easier for stuff to fall through the cracks with that many
       | packages to manage.
        
       | jchw wrote:
       | I think this debate comes down to exactly what "reproducible"
       | means. Nix doesn't give bit-exact reproducibility, but it does
       | give reproducible _environments_ , by ensuring that the inputs
       | are always bit-exact. It is closer to being fully reproducible
       | than most other build systems (including Bazel) -- but because it
       | can only reasonably ensure that the inputs are exact, it's still
       | necessary for the build processes themselves to be fully
       | deterministic to get end-to-end bit-exactness.
       | 
       | Nix on its own doesn't fully resolve supply chain concerns about
       | binaries, but it can provide answers to a myriad of other
       | problems. I think most people like Nix reproducibility, and it is
       | marketed as such, for the sake of development: life is much
       | easier when you know _for sure_ you have the exact same version
       | of each dependency, in the exact same configuration. A build on
       | one machine may not be bit-exact to a build on another machine,
       | but it will be exactly the same source code all the way down.
       | 
       | The quest to get every build process to be deterministic is
       | definitely a bigger problem and it will never be solved for all
       | of Nixpkgs. NixOS does have a reproducibility project[1], and
       | some non-trivial amount of NixOS actually _is_ _properly_
       | reproducible, but the observation that Nixpkgs is too vast is
       | definitely spot-on, especially because in most cases the real
       | issues lie upstream. (and carrying patches for reproducibility is
       | possible, but it adds _even more_ maintainer burden.)
       | 
       | [1]: https://reproducible.nixos.org/
        
         | sa46 wrote:
         | > It is closer to being fully reproducible than most other
         | build systems (including Bazel).
         | 
         | How so? Bazel produces the same results for the same inputs.
        
           | jchw wrote:
           | Bazel doesn't guarantee bit-exact outputs, but also Bazel
           | doesn't guarantee pure builds. It does have a sandbox that
           | prevents some impurities, but for example it doesn't prevent
           | things from going out to the network, or even accessing files
           | from anywhere in the filesystem, if you use absolute paths.
           | (Although, on Linux at least, Bazel _does_ prevent you from
           | _modifying_ files outside of the sandbox directory.)
           | 
           | The Nix sandbox _does_ completely obscure the host filesystem
           | and limit network access to processes that can produce a bit-
           | exact output only.
           | 
           | (Bazel also obviously uses the system compilers and headers.
           | Nix does not.)
        
             | dijit wrote:
             | Uh, Either my understanding of Bazel is wrong, or
             | everything you wrote is wrong.
             | 
             | Bazel absolutely prevents network access and filesystem
             | access (reads) from builds. (only permitting _explicit_
             | network includes from the WORKSPACE file, and access to
             | files explicitly depended on in the BUILD files).
             | 
             | Maybe you _can_ write some "rules_" for languages that
             | violate this, but it is designed purposely to be hermetic
             | and bit-perfect reproducible.
             | 
             | EDIT:
             | 
             | From the FAQ[0]:
             | 
             | > Will Bazel make my builds reproducible automatically?
             | 
             | > For Java and C++ binaries, yes, assuming you do not
             | change the toolchain.
             | 
             | The issues with Docker's style of "reproducible" (meaning..
             | consistent environment; are also outlined in the same
             | FAQ[1]
             | 
             | > Doesn't Docker solve the reproducibility problems?
             | 
             | > Docker does not address reproducibility with regard to
             | changes in the source code. Running Make with an
             | imperfectly written Makefile inside a Docker container can
             | still yield unpredictable results.
             | 
             | [0]: https://bazel.build/about/faq#will_bazel_make_my_build
             | s_repr...
             | 
             | [1]: https://bazel.build/about/faq#doesn't_docker_solve_the
             | _repro...
        
               | valcron1000 wrote:
               | I'm not familiar with Bazel at all so this might be
               | obvious, but does Bazel check that the files listed in
               | the BUILD file are the "right ones" (ex. through a
               | checksum), and if so, is this always enforced (that is,
               | this behavior cannot be disabled)?
        
         | matrss wrote:
         | > The quest to get every build process to be deterministic
         | [...] will never be solved for all of Nixpkgs.
         | 
         | Not least because of unfree and/or binary-blob packages that
         | can't be reproducible because they don't even build anything.
         | As much as Guix' strict FOSS and build-from-source policy can
         | be an annoyance, it is a necessary precondition to achieve full
         | reproducibility from source, i.e. the full-source bootstrap.
        
         | colordrops wrote:
         | I'm curious, why couldn't packages that are fully reproduceable
         | be marked with metadata, and in your config you set a flag to
         | only allow reproduceable packages? Similar to the nonfree tag.
         | 
         | Then you'd have a 100% reproduceable OS if you have the flag
         | set (assuming that required base packages are reproduceable)
        
       | 0x457 wrote:
       | IIRC any package that uses Java isn't reproducible because system
       | time and fixing it to epoch permamently causes issues in some
       | application builds.
       | 
       | * there're maven and gradle plugins to make builds reproducible.
        
         | yjftsjthsd-h wrote:
         | Can you force it to some time other than 0? Ex. I've seen some
         | packages force timestamps to the git commit timestamp, which is
         | nice but still fixed.
        
           | Cyph0n wrote:
           | This is an approach you can use when building Docker images
           | in Nix flakes: https://github.com/aksiksi/ncdmv/blob/aa108a1c
           | 1e2c14a13dfbc0...
        
         | layer8 wrote:
         | Can you elaborate on the root causes?
        
         | arjvik wrote:
         | What issues? I'm not aware of any Java build process that
         | checks timestamps.
        
       | vlovich123 wrote:
       | > Our most important finding is that the reproducibility rate in
       | nixpkgs has increased steadily from 69% in 2017 to about 91% in
       | April 2023. The high reproducibility rate in our most recent
       | revision is quite impressive, given both the size of the package
       | set and the absence of systematic monitoring in nixpkgs. We knew
       | that it was possible to achieve very good reproducibility rate in
       | smaller package sets like Debian, but this shows that achieving
       | very high bitwise reproducibility is possible at scale, something
       | that was believed impossible by practitioners4
       | 
       | I think people in this thread are focusing on the wrong thing.
       | Sure, not all packages are reproducible, but the project is
       | systematically increasing the percentage of projects that are
       | reproducible while ALSO adding new projects and demonstrating
       | conclusively that what was considered infeasible is actually
       | readily achievable.
       | 
       | > The interesting aspect of these causes is that they show that
       | even if nixpkgs already achieves great reproducibility rates,
       | there still exists some low hanging fruits towards improving
       | reproducibility that could be tackled by the Nix community and
       | the whole FOSS ecosystem.
       | 
       | This work is helpful I think for the community to tackle the
       | sources of unreproducible builds to push the percentage up even
       | further. I think it also highlights the need for automation to
       | validate that there aren't systematic regressions or regressions
       | in particularly popular packages (doing individual regressions
       | for all packages is a futile effort unless a lot of people
       | volunteer to be part of a distributed check effort).
        
       | IHLayman wrote:
       | How this article discusses reproducibility in NixOS and declines
       | to even mention the intensional model or efforts to implement it
       | are surprising to me, since it appears they have done a lot of
       | research into the matter.
       | 
       | If you don't know, the intensional model is an alternative way to
       | structure the NixOS store so that components are content-
       | addressable (store hash is based on the targets) as opposed to
       | being addressed based on the build instructions and dependencies.
       | IIUC, the entire purpose of the intensional model is to make Nix
       | stores shareable so that you could just depend on Cachix and such
       | without the worry of a supply-chain attack. This approach was an
       | entire chapter in the Nix thesis paper (chapter 6) and has been
       | worked on recently (see https://github.com/NixOS/rfcs/pull/62 and
       | https://github.com/NixOS/rfcs/pull/17 for current progress).
        
       | SilentM68 wrote:
       | In my case, I define, "reproducible," to mean, "immutable." After
       | a few days of testing, I broke NixOS. Simple test was swapping
       | different Desktop Environments, eventually broke Nix, thus I'm
       | not at the point where I'd agree with Nix being truly
       | reproducible, at least not in that context :(
        
         | tmnvdb wrote:
         | Those things are not the same though. Reproducible just means
         | it will break again if you configure your system in the same
         | way.
        
         | bsimpson wrote:
         | One problem is that the applications themselves are impure.
         | 
         | Just running KDE litters a bunch of dotfiles into your user
         | folder, even for settings you didn't adjust. This is true for
         | many applications.
         | 
         | If you had an empty home folder and passively tried a handful
         | of desktops, you'd no longer have an empty home folder.
         | Hopefully your environment is resilient to clutter being leaked
         | into your home folder, but if your filesystem isn't truly
         | immutable, rolling back to a particular Nix config might not
         | get you the exact state your system was in when you first built
         | that.
         | 
         | There's a project that wipes all local changes when you restart
         | your machine, with the goal of making Nix systems more
         | reproducible. I think it's called Impermanence.
        
           | alfiedotwtf wrote:
           | I do all my stuff in temporary docker containers, and when
           | I'm done, the container gets blown away.
           | 
           | If the point of Nix is to keep a filesystem immutable as long
           | as every app sticks to certain rules, is it actually the
           | right till for the job?
           | 
           | Sorry... I actually don't know much about Nix given I've been
           | using VMs and now containers for over a decade, so just
           | trying to understand the problem that nix actually solves
        
       | jf wrote:
       | Aside from this being a great article with lots of interesting
       | details, it's also a rare example of a headline that does NOT
       | follow "Betteridge's law of headlines"
        
       | advisedwang wrote:
       | Is anyone actually implementing the concept of checking hashes
       | with trusted builders? This is all wasted effort if that isn't
       | needed.
       | 
       | I've seen it pointed out (by mjg59, perhaps?) that if you have a
       | trusted builder, why don't you just use their build? That seems
       | to be the actual model in practice.
       | 
       | Reproducibility seems only to be useful if you have a pool of
       | mostly trustworthy builders and somehow want to build a consensus
       | out of that. Which I suppose is useful for a distributed
       | community but does seem like a stretch for the amount of work
       | going in to reproducible builds.
        
       | jonhohle wrote:
       | I work on a matching decomp project that has tooling to recompile
       | C into binaries matching a 28 year old game.
       | 
       | In the final binaries created by compiled with gcc 2.6.3 and
       | assembled with a custom assembler there appear to be unused,
       | uninitialized data that is whatever was in RAM when whoever
       | compiled the game created the release build.
       | 
       | Since the goal is a matching (reproducible) binary, we have tools
       | to restore that random data at specific offsets. Fortunately our
       | targets are fixed
        
       | CamouflagedKiwi wrote:
       | > Our most important finding is that the reproducibility rate in
       | nixpkgs has increased steadily from 69% in 2017 to about 91% in
       | April 2023. The high reproducibility rate in our most recent
       | revision is quite impressive, given both the size of the package
       | set and the absence of systematic monitoring in nixpkgs.
       | 
       | That's one way to read the statistic. Another way you could read
       | the graph is that they still have about the same number (~5k) of
       | non-reproducible builds, which has been pretty constant over the
       | time period. Adding a bunch of easily reproducible additional
       | builds maybe doesn't make me believe it's solving the original
       | issues.
       | 
       | > We knew that it was possible to achieve very good
       | reproducibility rate in smaller package sets like Debian, but
       | this shows that achieving very high bitwise reproducibility is
       | possible at scale, something that was believed impossible by
       | practitioners.
       | 
       | Maybe I miss some nuance here, but why is Debian written off as
       | being so much smaller scale? The top end of the graph here
       | suggests a bit over 70k packages, Debian apparently also
       | currently has 74k packages available
       | (https://www.debian.org/doc/manuals/debian-
       | reference/ch02.en....); I guess there's maybe a bit of time lag
       | here but I'm not sure that is enough to claim Debian is somehow
       | not "at scale".
        
         | dartos wrote:
         | Are they mostly the same 5k packages as 2017?
         | 
         | That seems to be the crux of it.
        
       ___________________________________________________________________
       (page generated 2025-02-12 23:00 UTC)