[HN Gopher] Is NixOS truly reproducible?
___________________________________________________________________
Is NixOS truly reproducible?
Author : pabs3
Score : 74 points
Date : 2025-02-09 09:56 UTC (3 days ago)
(HTM) web link (luj.fr)
(TXT) w3m dump (luj.fr)
| opan wrote:
| Although I'm aware many distros care somewhat about reproducible
| builds these days, I tend to associate it primarily with Guix
| System, I never really considered it a feature of NixOS, having
| used both (though spent much more time on Guix System now).
|
| For the record, even in the land of Guix I semi-regularly see
| reports on the bug-guix mailing list that some package isn't
| reproducible. It seems to get treated as a bug and fixed then.
| With that in mind, and personally considering Guix kind of the
| flagship of these efforts, it doesn't surprise me if anyone else
| doesn't have perfectly reproducible builds yet either. Especially
| Nix with the huge number of things in nixpkgs. It's probably
| easier for stuff to fall through the cracks with that many
| packages to manage.
| jchw wrote:
| I think this debate comes down to exactly what "reproducible"
| means. Nix doesn't give bit-exact reproducibility, but it does
| give reproducible _environments_ , by ensuring that the inputs
| are always bit-exact. It is closer to being fully reproducible
| than most other build systems (including Bazel) -- but because it
| can only reasonably ensure that the inputs are exact, it's still
| necessary for the build processes themselves to be fully
| deterministic to get end-to-end bit-exactness.
|
| Nix on its own doesn't fully resolve supply chain concerns about
| binaries, but it can provide answers to a myriad of other
| problems. I think most people like Nix reproducibility, and it is
| marketed as such, for the sake of development: life is much
| easier when you know _for sure_ you have the exact same version
| of each dependency, in the exact same configuration. A build on
| one machine may not be bit-exact to a build on another machine,
| but it will be exactly the same source code all the way down.
|
| The quest to get every build process to be deterministic is
| definitely a bigger problem and it will never be solved for all
| of Nixpkgs. NixOS does have a reproducibility project[1], and
| some non-trivial amount of NixOS actually _is_ _properly_
| reproducible, but the observation that Nixpkgs is too vast is
| definitely spot-on, especially because in most cases the real
| issues lie upstream. (and carrying patches for reproducibility is
| possible, but it adds _even more_ maintainer burden.)
|
| [1]: https://reproducible.nixos.org/
| sa46 wrote:
| > It is closer to being fully reproducible than most other
| build systems (including Bazel).
|
| How so? Bazel produces the same results for the same inputs.
| jchw wrote:
| Bazel doesn't guarantee bit-exact outputs, but also Bazel
| doesn't guarantee pure builds. It does have a sandbox that
| prevents some impurities, but for example it doesn't prevent
| things from going out to the network, or even accessing files
| from anywhere in the filesystem, if you use absolute paths.
| (Although, on Linux at least, Bazel _does_ prevent you from
| _modifying_ files outside of the sandbox directory.)
|
| The Nix sandbox _does_ completely obscure the host filesystem
| and limit network access to processes that can produce a bit-
| exact output only.
|
| (Bazel also obviously uses the system compilers and headers.
| Nix does not.)
| dijit wrote:
| Uh, Either my understanding of Bazel is wrong, or
| everything you wrote is wrong.
|
| Bazel absolutely prevents network access and filesystem
| access (reads) from builds. (only permitting _explicit_
| network includes from the WORKSPACE file, and access to
| files explicitly depended on in the BUILD files).
|
| Maybe you _can_ write some "rules_" for languages that
| violate this, but it is designed purposely to be hermetic
| and bit-perfect reproducible.
|
| EDIT:
|
| From the FAQ[0]:
|
| > Will Bazel make my builds reproducible automatically?
|
| > For Java and C++ binaries, yes, assuming you do not
| change the toolchain.
|
| The issues with Docker's style of "reproducible" (meaning..
| consistent environment; are also outlined in the same
| FAQ[1]
|
| > Doesn't Docker solve the reproducibility problems?
|
| > Docker does not address reproducibility with regard to
| changes in the source code. Running Make with an
| imperfectly written Makefile inside a Docker container can
| still yield unpredictable results.
|
| [0]: https://bazel.build/about/faq#will_bazel_make_my_build
| s_repr...
|
| [1]: https://bazel.build/about/faq#doesn't_docker_solve_the
| _repro...
| valcron1000 wrote:
| I'm not familiar with Bazel at all so this might be
| obvious, but does Bazel check that the files listed in
| the BUILD file are the "right ones" (ex. through a
| checksum), and if so, is this always enforced (that is,
| this behavior cannot be disabled)?
| matrss wrote:
| > The quest to get every build process to be deterministic
| [...] will never be solved for all of Nixpkgs.
|
| Not least because of unfree and/or binary-blob packages that
| can't be reproducible because they don't even build anything.
| As much as Guix' strict FOSS and build-from-source policy can
| be an annoyance, it is a necessary precondition to achieve full
| reproducibility from source, i.e. the full-source bootstrap.
| colordrops wrote:
| I'm curious, why couldn't packages that are fully reproduceable
| be marked with metadata, and in your config you set a flag to
| only allow reproduceable packages? Similar to the nonfree tag.
|
| Then you'd have a 100% reproduceable OS if you have the flag
| set (assuming that required base packages are reproduceable)
| 0x457 wrote:
| IIRC any package that uses Java isn't reproducible because system
| time and fixing it to epoch permamently causes issues in some
| application builds.
|
| * there're maven and gradle plugins to make builds reproducible.
| yjftsjthsd-h wrote:
| Can you force it to some time other than 0? Ex. I've seen some
| packages force timestamps to the git commit timestamp, which is
| nice but still fixed.
| Cyph0n wrote:
| This is an approach you can use when building Docker images
| in Nix flakes: https://github.com/aksiksi/ncdmv/blob/aa108a1c
| 1e2c14a13dfbc0...
| layer8 wrote:
| Can you elaborate on the root causes?
| arjvik wrote:
| What issues? I'm not aware of any Java build process that
| checks timestamps.
| vlovich123 wrote:
| > Our most important finding is that the reproducibility rate in
| nixpkgs has increased steadily from 69% in 2017 to about 91% in
| April 2023. The high reproducibility rate in our most recent
| revision is quite impressive, given both the size of the package
| set and the absence of systematic monitoring in nixpkgs. We knew
| that it was possible to achieve very good reproducibility rate in
| smaller package sets like Debian, but this shows that achieving
| very high bitwise reproducibility is possible at scale, something
| that was believed impossible by practitioners4
|
| I think people in this thread are focusing on the wrong thing.
| Sure, not all packages are reproducible, but the project is
| systematically increasing the percentage of projects that are
| reproducible while ALSO adding new projects and demonstrating
| conclusively that what was considered infeasible is actually
| readily achievable.
|
| > The interesting aspect of these causes is that they show that
| even if nixpkgs already achieves great reproducibility rates,
| there still exists some low hanging fruits towards improving
| reproducibility that could be tackled by the Nix community and
| the whole FOSS ecosystem.
|
| This work is helpful I think for the community to tackle the
| sources of unreproducible builds to push the percentage up even
| further. I think it also highlights the need for automation to
| validate that there aren't systematic regressions or regressions
| in particularly popular packages (doing individual regressions
| for all packages is a futile effort unless a lot of people
| volunteer to be part of a distributed check effort).
| IHLayman wrote:
| How this article discusses reproducibility in NixOS and declines
| to even mention the intensional model or efforts to implement it
| are surprising to me, since it appears they have done a lot of
| research into the matter.
|
| If you don't know, the intensional model is an alternative way to
| structure the NixOS store so that components are content-
| addressable (store hash is based on the targets) as opposed to
| being addressed based on the build instructions and dependencies.
| IIUC, the entire purpose of the intensional model is to make Nix
| stores shareable so that you could just depend on Cachix and such
| without the worry of a supply-chain attack. This approach was an
| entire chapter in the Nix thesis paper (chapter 6) and has been
| worked on recently (see https://github.com/NixOS/rfcs/pull/62 and
| https://github.com/NixOS/rfcs/pull/17 for current progress).
| SilentM68 wrote:
| In my case, I define, "reproducible," to mean, "immutable." After
| a few days of testing, I broke NixOS. Simple test was swapping
| different Desktop Environments, eventually broke Nix, thus I'm
| not at the point where I'd agree with Nix being truly
| reproducible, at least not in that context :(
| tmnvdb wrote:
| Those things are not the same though. Reproducible just means
| it will break again if you configure your system in the same
| way.
| bsimpson wrote:
| One problem is that the applications themselves are impure.
|
| Just running KDE litters a bunch of dotfiles into your user
| folder, even for settings you didn't adjust. This is true for
| many applications.
|
| If you had an empty home folder and passively tried a handful
| of desktops, you'd no longer have an empty home folder.
| Hopefully your environment is resilient to clutter being leaked
| into your home folder, but if your filesystem isn't truly
| immutable, rolling back to a particular Nix config might not
| get you the exact state your system was in when you first built
| that.
|
| There's a project that wipes all local changes when you restart
| your machine, with the goal of making Nix systems more
| reproducible. I think it's called Impermanence.
| alfiedotwtf wrote:
| I do all my stuff in temporary docker containers, and when
| I'm done, the container gets blown away.
|
| If the point of Nix is to keep a filesystem immutable as long
| as every app sticks to certain rules, is it actually the
| right till for the job?
|
| Sorry... I actually don't know much about Nix given I've been
| using VMs and now containers for over a decade, so just
| trying to understand the problem that nix actually solves
| jf wrote:
| Aside from this being a great article with lots of interesting
| details, it's also a rare example of a headline that does NOT
| follow "Betteridge's law of headlines"
| advisedwang wrote:
| Is anyone actually implementing the concept of checking hashes
| with trusted builders? This is all wasted effort if that isn't
| needed.
|
| I've seen it pointed out (by mjg59, perhaps?) that if you have a
| trusted builder, why don't you just use their build? That seems
| to be the actual model in practice.
|
| Reproducibility seems only to be useful if you have a pool of
| mostly trustworthy builders and somehow want to build a consensus
| out of that. Which I suppose is useful for a distributed
| community but does seem like a stretch for the amount of work
| going in to reproducible builds.
| jonhohle wrote:
| I work on a matching decomp project that has tooling to recompile
| C into binaries matching a 28 year old game.
|
| In the final binaries created by compiled with gcc 2.6.3 and
| assembled with a custom assembler there appear to be unused,
| uninitialized data that is whatever was in RAM when whoever
| compiled the game created the release build.
|
| Since the goal is a matching (reproducible) binary, we have tools
| to restore that random data at specific offsets. Fortunately our
| targets are fixed
| CamouflagedKiwi wrote:
| > Our most important finding is that the reproducibility rate in
| nixpkgs has increased steadily from 69% in 2017 to about 91% in
| April 2023. The high reproducibility rate in our most recent
| revision is quite impressive, given both the size of the package
| set and the absence of systematic monitoring in nixpkgs.
|
| That's one way to read the statistic. Another way you could read
| the graph is that they still have about the same number (~5k) of
| non-reproducible builds, which has been pretty constant over the
| time period. Adding a bunch of easily reproducible additional
| builds maybe doesn't make me believe it's solving the original
| issues.
|
| > We knew that it was possible to achieve very good
| reproducibility rate in smaller package sets like Debian, but
| this shows that achieving very high bitwise reproducibility is
| possible at scale, something that was believed impossible by
| practitioners.
|
| Maybe I miss some nuance here, but why is Debian written off as
| being so much smaller scale? The top end of the graph here
| suggests a bit over 70k packages, Debian apparently also
| currently has 74k packages available
| (https://www.debian.org/doc/manuals/debian-
| reference/ch02.en....); I guess there's maybe a bit of time lag
| here but I'm not sure that is enough to claim Debian is somehow
| not "at scale".
| dartos wrote:
| Are they mostly the same 5k packages as 2017?
|
| That seems to be the crux of it.
___________________________________________________________________
(page generated 2025-02-12 23:00 UTC)