[HN Gopher] NixOS Reproducible Builds: minimal ISO successfully ...
       ___________________________________________________________________
        
       NixOS Reproducible Builds: minimal ISO successfully independently
       rebuilt
        
       Author : CathalMullan
       Score  : 425 points
       Date   : 2023-10-29 11:41 UTC (11 hours ago)
        
 (HTM) web link (discourse.nixos.org)
 (TXT) w3m dump (discourse.nixos.org)
        
       | mihalycsaba wrote:
       | Sorry for being dense, but I thought one of the main reason for
       | nixos's existence is reproducibilty. I thought they have these
       | kinds of things solved already.
       | 
       | I have only ~2 hours experience with Nixos, wanted to try
       | hyprland, I thought it would be easier on Nixos since hyprland
       | needs a bit of setup and maybe it's easier to use someone else's
       | config on nixos, than on some other distro. Finding a config was
       | hard too, found like 3 on some random github gists, thought there
       | would be more... and none of them worked, at that point I gave
       | up.
        
         | rgoulter wrote:
         | Yeah, Nix is a tough tool to learn. It's probably never the
         | right tool to pick for "I just want something that works right
         | now" if you're unfamiliar with it.
         | 
         | > I thought one of the main reason for nixos's existence is
         | reproducibilty
         | 
         | NixOS uses "reproducible" to mean "with the same Nix code, you
         | get the same program behaviour". This is more/less what people
         | hope Dockerfiles provide.
         | 
         | This is the level of reproducibility you want when you say "it
         | works on my machine" or "it worked last time I tried it".
         | 
         | Whereas "reproducible build" aims for bit-for-bit equality for
         | artifacts build on different machines. -- With this, there's a
         | layer of security in that you can verify that code has been
         | built from a particular set of sources.
         | 
         | > Finding a config was hard too
         | 
         | What search query were you using? Searching "nixos
         | configuration" on
         | https://github.com/search?q=nixos%20configuration&type=repos...
         | 
         | Or searching for hyprland specifically, there seem to be many
         | using that
         | https://github.com/search?q=wayland.windowManager.hyprland&t...
        
           | amarshall wrote:
           | > NixOS uses "reproducible" to mean "with the same Nix code,
           | you get the same program behaviour".
           | 
           | Note that "Nix code" also includes the hashes of all non-Nix
           | sources. One way to think of it is that Nix has reliable
           | build cache invalidation.
           | 
           | > This is more/less what people hope Dockerfiles provide.
           | 
           | Indeed, but importantly they do _not_ provide input-
           | reproducibility (while Nix does) because, at least, there are
           | no hashes for remote data.
        
           | mihalycsaba wrote:
           | I don't remember, some of them needed some other tools
           | installed(like flakes whatever it is), I looked for configs,
           | that looked like they don't need a few more hours to learn
           | and to setup some other tools for them to work.
           | 
           | I just wanted to take a quick look at hyprland, I imagined I
           | just use an existing config, I never thought it would need
           | hours of research. Later I installed an arch vm and managed
           | to install hyprland with some basic components in less than
           | an hour from the first guide I found.
           | 
           | Looks like I misunderstood, what nix was made for. I just
           | want a system I can more or less set up with a simple config
           | file.
           | 
           | I saw this os, didn't have time to try it yet, but I thought
           | this is how nix works. https://blendos.co/
           | 
           | For example you just define gnome like this, the nix configs
           | I found looked similar, they just didn't work.
           | 
           | >gnome:
           | 
           | > enabled: true
           | 
           | > style: light
           | 
           | > gtk-theme: 'adw-gtk3'
           | 
           | > icon-theme: 'Adwaita'
           | 
           | > titlebar:
           | 
           | > button-placement: 'right'
           | 
           | > double-click-action: 'toggle-maximize'
           | 
           | > middle-click-action: 'minimize'
           | 
           | > right-click-action: 'menu'
        
             | ParetoOptimal wrote:
             | > I just wanted to take a quick look at hyprland, I
             | imagined I just use an existing config, I never thought it
             | would need hours of research.
             | 
             | It shouldn't.
             | 
             | You'd want a simple flake to start with that has home-
             | manager (for higher chance of finding declarative vest
             | practice configs and modules) and to add small things to
             | that.
             | 
             | I imagine you tried grabbing someone's complex config,
             | modifying it, and ran into issues?
        
               | t0astbread wrote:
               | Flakes will hopefully be that soon but I wouldn't
               | recommend starting with flakes when learning Nix in 2023.
               | They're experimental and you still need to learn most of
               | flake-less Nix (except channels and NIX_PATH) anyways.
               | 
               | When I started learning/using NixOS about two years ago I
               | found it useful to start out with just Nixpkgs (i.e. what
               | you get out of the box) and only add libraries when I
               | felt they would help me. My first configs where ugly as
               | hell and full of bad practice but the cool thing about
               | Nix is that it gives you a lot of safety nets to enable
               | experimentation and refactoring.
        
               | rgoulter wrote:
               | > Flakes will hopefully be that soon but I wouldn't
               | recommend starting with flakes when learning Nix in 2023.
               | 
               | That Flakes provide a consistent entrypoint (and a
               | consistent schema for such) into a codebase would have
               | deferred a significant amount of confusion I had when
               | getting started with Nix.
               | 
               | > They're experimental
               | 
               | The functionality as-is hasn't been changed. The
               | 'experimental' flag itself hasn't been a _practical_
               | problem.
               | 
               | However, flakes still have some rough edges & design
               | problems to them, and there's some disagreement in the
               | community over how flakes were rolled out.
               | 
               | I'd say for an end user, the benefits far outweigh the
               | costs.
               | 
               | > ... and you still need to learn most of flake-less Nix
               | (except channels and NIX_PATH) anyways.
               | 
               | I think the phrase "flake-less Nix" paints the wrong
               | idea. I'd instead put it: Most of what you need to learn
               | about Nix is unrelated whether the Nix evaluation started
               | from a Flake or not.
        
               | ParetoOptimal wrote:
               | > Flakes will hopefully be that soon but I wouldn't
               | recommend starting with flakes when learning Nix in 2023.
               | They're experimental and you still need to learn most of
               | flake-less Nix (except channels and NIX_PATH) anyways.
               | 
               | I've used Nix for a decade and wouldn't recommend the
               | confusing and horrible user experience of Nix without
               | flakes.
               | 
               | Additionally, if you are using github for code examples,
               | you'll have far more success using flakes.
               | 
               | Many experienced people a new user would get help from,
               | including myself, have long since washed their hands of
               | prw-flakes issues and arcana like channels issues.
        
             | lifeisstillgood wrote:
             | I am on a similar journey
             | 
             | I built https://github.com/mikadosoftware/workstation (hey
             | nearly 500 stars!) as the idea of defining a reproducible
             | laptop build.
             | 
             | I don't think docker is the right level - so my next
             | project when i have free time (!) is to do a box build that
             | then might compile to docker
             | 
             | I think there is a sensible point of being able to define
             | via nix both developer workstations and servers
        
               | k8svet wrote:
               | Except it's Docker, and like virtually all Dockerfiles,
               | it immediately runs "apt-get update", tossing
               | reproducibility out the window.
        
         | quietbritishjim wrote:
         | There are two senses of reproducible.
         | 
         | The sense you're thinking of is that you can easily rebuild a
         | binary package and it will use the same dependency versions,
         | build options, etc. There should be no chance of a compiler
         | error that didn't happen the first time (the old "but it worked
         | on my laptop" syndrome).
         | 
         | The sense used here is that every build output is byte-for-byte
         | _binary identical_. It doesn 't depend on the machine name, the
         | time it was compiled or anything like that (or, in a parallel
         | build, the order in which files finish compiling). That is much
         | harder.
        
           | jowea wrote:
           | > The sense you're thinking of is that you can easily rebuild
           | a binary package and it will use the same dependency
           | versions, build options, etc. There should be no chance of a
           | compiler error that didn't happen the first time (the old
           | "but it worked on my laptop" syndrome).
           | 
           | And that's just for Nixpkgs, the packages themselves that
           | also work outside NixOS. NixOS has reproducibility of the
           | entire system complete with configuration.
        
         | ParetoOptimal wrote:
         | > Finding a config was hard too, found like 3 on some random
         | github gists, thought there would be more..
         | 
         | That sounds odd, did you use github code search?
         | 
         | Find relevant home manager options:
         | 
         | https://mipmip.github.io/home-manager-option-search/?query=h...
         | 
         | Then search those on github:
         | 
         | https://github.com/search?utf8=%E2%9C%93&q=lang%3Anix+hyprla...
         | 
         | Note some option searches imply more casual or advanced users.
        
         | chpatrick wrote:
         | > Sorry for being dense, but I thought one of the main reason
         | for nixos's existence is reproducibilty. I thought they have
         | these kinds of things solved already.
         | 
         | Nixos has the advantage that everything is built in its own
         | sandbox with only its explicitly declared (and hashed)
         | dependencies available, unlike in mainstream distros where it's
         | the full system environment, so in many cases you already get
         | the same binary every time. But this doesn't immediately lead
         | to reproducibility because the build process might be
         | nondeterministic for various packages.
        
           | benreesman wrote:
           | This is a really good comment, I have no idea why it's going
           | grey.
           | 
           | Upvote from me FWIW.
        
           | WhyNotHugo wrote:
           | > unlike in mainstream distros where it's the full system
           | environment
           | 
           | Usually packages are built in an environment which has only a
           | minimal base system plus the package's explicitly
           | dependencies. They don't have random unnecessary packages
           | installed.
        
           | goodpoint wrote:
           | > unlike in mainstream distros
           | 
           | Debian has been building in a clean sandbox with only
           | required, tracked dependencies since decades.
           | 
           | It's also building the large majority of packages
           | reproducibly including the binary and whole installation
           | packages (not just the sources like nixos)
        
             | chpatrick wrote:
             | > not just the sources like nixos
             | 
             | Not sure what you mean by that, the Nix packages that are
             | reproducible have reproducible binaries.
             | 
             | In the Nixos world there isn't really a concept of a
             | "binary/installation package" like in Debian or elsewhere.
             | Everything can be rebuilt from source on any machine, but
             | because everything is hashed, if the official binary caches
             | have already built something with the same inputs, they can
             | just give you the outputs directly. So it's more like
             | memoization than a .deb or something that you install.
             | 
             | Nix is a functional language that builds recipes
             | (derivations) to build stuff, with all the inputs and
             | outputs hashed. If the derivation you want to build has
             | already been built by a cache you trust, the system will
             | just fetch it instead of building locally.
             | 
             | What the Nix reproducability project checks is that the
             | same derivation produces the same output regardless of what
             | machine it's built on.
        
         | Aerbil313 wrote:
         | Check out https://github.com/donovanglover/nix-config . Flake
         | based config with hyprland and cool stuff.
         | 
         | > at that point I gave up.
         | 
         | NixOS is not for the weak or time constrained, currently.
         | Hopefully it will be one day. Still if you push through, you
         | reap the benefits.
        
           | flkiwi wrote:
           | Another good option: https://github.com/Misterio77/nix-
           | starter-configs
           | 
           | I started with this one, the minimal version, then moved on
           | to something more like the standard version, and now I'm
           | moving on to something based on his much more complicated and
           | flexible build in a different repo. I had been flailing, then
           | this repo made it click.
        
         | colordrops wrote:
         | Nix is reproducteable in tbe environment sense, meaning you can
         | get the exact same setup every time, but not in the bit-for-bit
         | sense, meaning that the compiled binaries will be identical.
        
       | Reventlov wrote:
       | For those wondering : it should be remembered that the
       | reproducibility of Nix / NixOS / Nixpkgs is only a
       | reproducibility of the sources: if the sources change, one is
       | warned, but it is not a question of the reproducibility of the
       | binaries (which can change at each build). This binary
       | reproducibility of Nix / NixOS / Nixpkgs is indeed not really
       | tested, at least not systematically.
       | 
       | Guix, Archlinux, Debian do the binary reproducibility better than
       | Nix / NixOS / Nixpkgs.
       | 
       | Sources :
       | 
       | - https://r13y.com/ ( Nix* )
       | 
       | - https://tests.reproducible-builds.org/debian/reproducible.ht...
       | ( Debian )
       | 
       | - https://tests.reproducible-builds.org/archlinux/archlinux.ht...
       | ( Archlinux )
       | 
       | - https://data.guix.gnu.org/repository/1/branch/master/latest-...
       | (Guix, might be a bit slow to load, here is some cached copy
       | https://archive.is/lTuPk )
        
         | amarshall wrote:
         | r13y.com is outdated vs. https://reproducible.nixos.org/
        
         | dicytea wrote:
         | > Guix, Archlinux, Debian do the binary reproducibility better
         | than Nix / NixOS / Nixpkgs.
         | 
         | Huh, didn't know that Arch Linux tests reproducibility. It's
         | apparently 85.6% reproducible:
         | https://reproducible.archlinux.org
         | 
         | I wonder how much work would be needed for NixOS, considering
         | it has more than _80k_ packages in the _official_ repository.
        
           | chpatrick wrote:
           | I think that's also a bit of an unfair comparison given the
           | number of AUR packages you usually use on Arch. With nixpkgs
           | there isn't a distinction between official and community
           | packages.
        
             | iopq wrote:
             | Sure there is, the NUR has a few thousand community
             | packages that are not ready for release
             | 
             | The nixpkgs are all official packages, it's just really
             | easy to become a maintainer (you make a pull request adding
             | the package you want to maintain)
        
         | chpatrick wrote:
         | > but it is not a question of the reproducibility of the
         | binaries (which can change at each build). This binary
         | reproducibility of Nix / NixOS / Nixpkgs is indeed not really
         | tested, at least not systematically.
         | 
         | Isn't that exactly what your first source and OP are about?
         | They check that the binaries are the same when built from the
         | same sources on different machines. The point is exactly that
         | the binaries don't change with every build.
         | 
         | > How are these tested?
         | 
         | > Each build is run twice, at different times, on different
         | hardware running different kernels.
        
           | Reventlov wrote:
           | Yeah, that represent maybe 1% of the packages in nixpkgs
           | (only the installation iso).
        
             | chpatrick wrote:
             | Sure but the goal is the same, binary reproducibility, and
             | it is systematic. It's just less far along than Debian.
             | 
             | Also I'm pretty sure a big percent of nixpkgs is already
             | reproducible, we just don't know for sure.
             | 
             | They say the next step might be the GNOME-based ISO, which
             | would be a big achievement because it's basically a full-
             | featured system.
        
         | clhodapp wrote:
         | That is not true at all, with respect to the aims or the
         | reality of nixpkgs. The original post here is talking about
         | reproducing the (binary) minimal iso, which contains a bunch of
         | binary packages.
        
           | Reventlov wrote:
           | It is true. The original post writes about reproducing the
           | minimal iso, which contains probably around 1% of the
           | packages in nixpkgs. The remaining packages are not tested
           | regarding binary reproducibility, or, at least, not in a
           | systematic manner, which means regressions may happen
           | regularly (which is exactly what happened with the .iso, see
           | the previous announcement from 2021:
           | https://discourse.nixos.org/t/nixos-unstable-s-iso-
           | minimal-x... .)
        
         | mauricioc wrote:
         | To emphasize chpatrick's point below, there are two definitions
         | of "reproducibility" in this context:
         | 
         | * Input reproducibility, meaning "perfect cache invalidation
         | for inputs". Nix and Guix do this perfectly by design (which
         | sometimes leads to too many rebuilds). This is not on the radar
         | for Debian and Arch Linux, which handle the rebuild problem
         | ("which packages should I rebuild if a particular source file
         | is updated?") on an ad-hoc basis by triggering manual rebuilds.
         | 
         | * Output reproducibility, meaning "the build process is
         | deterministic and will always produce the same binary". This is
         | the topic of the OP. Nix builds packages in a sandbox, which
         | helps but is not a silver bullet. Nix is in the same boat as
         | Debian and Arch Linux here; indeed, distros frequently upstream
         | patches to increase reproducibility and benefit all the other
         | distros. In this context, https://reproducible.nixos.org is the
         | analogue of the other links you posted, and I agree Nix reports
         | aren't as detailed (which does not mean binary reproducibility
         | is worse on Nix).
         | 
         | Your comment can be misinterpreted as saying "Nix does not do
         | binary reproducibility very well, just input reproducibility",
         | which is false. That's the whole point of the milestone being
         | celebrated here!
        
           | Foxboron wrote:
           | > Your comment can be misinterpreted as saying "Nix does not
           | do binary reproducibility very well, just input
           | reproducibility", which is false.
           | 
           | It's only "false" as nobody has actually tried to rebuild the
           | entire package repository of nixpkgs, which to my knowledge
           | is an open problem nobody has really worked on.
           | 
           | The current result is "only" ~800 packages and the set has
           | regular regressions.
        
             | prateem_ wrote:
             | I am probably misunderstanding your point BUT I have
             | actually depended on Nix for "reproducible docker images"
             | for confidential compute usecase so that all parties can
             | independently verify the workload image hash. Rarely
             | (actually only once) it did fail to produce bit identical
             | images every other time it successfully produced bit
             | identical images on very different machine setups. Granted
             | this is not ISO but docker images, but I would say Nix does
             | produce reproducible builds for many real world complex
             | uses.
             | 
             | Ref: [1] https://gitlab.com/prateem/turning-polyglot-
             | solutions-into-t... [2]
             | https://discourse.nixos.org/t/docker-image-produced-by-
             | docke...
        
               | Foxboron wrote:
               | I'm very sure you are actually just rebuilding the
               | container images themselves, not the package tree you are
               | depending on. Building reproducible ISOs, or container
               | images, with a package repository as a base isn't
               | particularly hard these days.
        
               | prateem_ wrote:
               | I see what you mean. Thanks for clarifying. Even so, Nix
               | is no worse placed than those other distributions for bit
               | reproducibility. Correct?
        
               | Foxboron wrote:
               | It's unclear at the moment because of the limited testing
               | (minimal ISO and a Gnome ISO) vs Arch/Debian/Guix
               | rebuilding entire package repositories.
        
         | dathinab wrote:
         | I think you might want to read the article.
         | 
         | it's about binary bit by bit reproducibility of not just the
         | binaries but also how they get packed into an iso (i.e.
         | r13y.com is outdated, the missing <1% where also as far as I
         | remember a _upstream_ python regression as reproducability of
         | binaries (ignoring the packaging into an iso) was already there
         | a few years ago)
         | 
         | now when it comes to packages beyond the core iso things become
         | complicated to compare due to the subtle but in this regard
         | significant different ways they handle packages, e.g. a bunch
         | of packages you would find on arch in aur you find as normal
         | packages in nix and most of the -bin upstream packages are
         | simply not needed with nix
         | 
         | in general nix makes it easier to create reproducible builds
         | but (independent of nix) this doesn't mean that it's always
         | possible and often needs patching which often but not always is
         | done if you combine this with the default package repository of
         | nix being much larger (>80k) then e.g. arch (<15k non aur)
         | comparing percentages there isn't very useful.
         | 
         | through one very common misconception is that the hash in the
         | nix store path is based on the build output, but it's instead
         | based on all sources (weather binary not) used for building the
         | binary in an isolated environment
         | 
         | this means it has not quite the security benefit some people
         | might think it has, but in turn is necessary as it means nix
         | can use software which is non reproducible buildable in a way
         | which still produces reasonable reproducable deplyments (as in
         | not necessary all bits the same but all functionality,
         | compiler-cfgs, dependencies versions, users, configurations
         | etc. being the same
        
         | watersucks wrote:
         | Doesn't the content-addressed derivation experimental feature
         | address this issue? Instead of store hashes being input-
         | addressed as you mention, the derivation outputs are used to
         | calculate the store hash, which ensures binary reproducibility.
        
           | Smaug123 wrote:
           | Ish. This is covered in section 6.4.1 of Eelco's thesis
           | (https://edolstra.github.io/pubs/phd-thesis.pdf). It all
           | becomes much simpler if evaluating a build many times can
           | only ever result in one output, but the Nix content-addressed
           | model does permit multiple outputs. In such cases, the system
           | just has to choose a canonical output and use that one,
           | rewriting hashes as necessary to canonicalise inputs which
           | are non-canonical.
        
       | onedognight wrote:
       | Rebuilding the minimal ISO from source is an impressive milestone
       | on the journey to a system that builds from source reproducibly.
       | Guix had an orthogonal but equally impressive milestone on the
       | same journey recently[0], bootstrapping a full compiler toolchain
       | from a single reproducible 357 byte binary without any other
       | binary compiler blobs. These two features may one day soon be
       | combined to reproducibly build a full distribution from source.
       | 
       | [0] https://guix.gnu.org/en/blog/2023/the-full-source-
       | bootstrap-...
        
         | Thrir94994i wrote:
         | 357 bytes for bootstrap compiler binary is VERY impressive!
        
           | msm_ wrote:
           | If I remember correctly, this tiny binary is used to
           | (reproducibly) bootstrap the next binary, which bootstraps
           | the next binary, until eventually GCC can be compiled (and
           | compile other software).
        
             | Smaug123 wrote:
             | The bootstrap chain is https://github.com/oriansj/stage0-po
             | six-x86/blob/e86bf7d304b...
        
           | rssoconnor wrote:
           | To be fair, it is 357 bytes ... plus a POSIX operating
           | system.
           | 
           | Still, that POSIX operating system bit is also being worked
           | on.
        
             | 6581 wrote:
             | Isn't that what builder-hex0 does?
             | 
             | https://github.com/ironmeld/builder-hex0
        
         | 15155 wrote:
         | At 357 bytes, do you need a reproducible binary at all?
         | 
         | I'd think one could hand-document all 357 bytes of machine code
         | and have them be intelligible.
        
           | ahoka wrote:
           | That's just the first stage. Simple enough to be audited
           | manually.
        
           | jowea wrote:
           | This[0] is basically the hand-documentation of those bytes
           | then. Handwritten ELF header and assembly code.
           | 
           | [0] https://github.com/oriansj/bootstrap-
           | seeds/blob/master/POSIX...
        
             | ralferoo wrote:
             | Just had a read of this to see what it did... And I must
             | admit, I don't understand what purpose this is supposed to
             | serve.
             | 
             | All it seems to do is convert hex into binary and dump it
             | to a file. Not sure how that's any more useful than just
             | copying the binary for next stage directly, after all this
             | binary had to get on the system somehow.
        
               | reactordev wrote:
               | It's the first stage. Likely piped. Hence the hex out.
               | The context on how it's called is key: https://github.com
               | /oriansj/stage0-posix-x86/blob/e86bf7d304b...
        
               | autumn-antlers wrote:
               | Section 1.6.1 of the GNU Mes manual places these early
               | stages assemblers into context:
               | 
               | https://www.gnu.org/software/mes/manual/mes.html#Stage0
        
               | rssoconnor wrote:
               | The program does also dispose of comment lines.
               | 
               | One could argue that this is just a kind of trick so they
               | can say the next "binary" is actually a "source" file
               | because it happens to be written by a human in ASCII.
               | 
               | Still the phase distinction between what is a source and
               | what is a binary becomes blurry at this low level. I
               | believe the next stage of compiling is to, writing in
               | ASCII represented machine code with comments, to allow
               | for the existence of labels and then compute offsets for
               | jumps to labels. And then more and more features are
               | added until you have a minimal assembler letting you
               | write somewhat machine independent code, and then
               | continuing to work you way up the toolchain.
               | 
               | So at which point does the translation from "source" to
               | "binary" become a real thing and not just a trick of
               | semantics? Is it when we have a machine independent
               | assembly code? Is it when we computed offsets for
               | labelled jumps? It is when we started stripping comments
               | out of the source code?
        
               | ralferoo wrote:
               | Yeah, I kind of agree, but my issue is kind of with this
               | statement (in the link from the peer post):
               | 
               | > What if we could bootstrap our entire system from only
               | this one hex0 assembler binary seed? We would only ever
               | need to inspect these 500 bytes of computer codes. Every
               | later program is written in a more friendly programming
               | language: Assembly, C, ... Scheme.
               | 
               | And my issue is that this isn't true. hex1 isn't written
               | in assembler any more than hex0 is. Both of those
               | bootstrap files can get onto the system simply by
               | ignoring whitespace and anything after #, converting the
               | hex into binary and writing it to a file.
               | 
               | Having hex0 doesn't add anything to the mix, other than
               | being shorter than hex1, because you still have the same
               | initial bootstrap problem of how you can prove that the
               | hex0 binary represents the hex in its source vs the hex1
               | binary and its source and both have the same problem of
               | needing to prove the hex in the source matches the
               | assembly (and that the program even does what the
               | comments claim).
               | 
               | hex1 is a more useful bootstrap point, because you can
               | use standard system tools to create the binary from the
               | source (e.g. sed) and also compile itself and verify that
               | the files are the same.
               | 
               | Having hex0 and hex1 just means you need to manually
               | verify both rather than just hex1.
               | 
               | I guess my point is that if you have insufficient trust
               | in your system that you can't e.g. trust "sed" to create
               | the original binary files, or trust the output of "dd -x"
               | or "md5sum" to verify the binary files, you also can't
               | trust it enough to verify that the hex in those source
               | files is correct or that the binary files match.
        
               | lmm wrote:
               | > Having hex0 doesn't add anything to the mix, other than
               | being shorter than hex1, because you still have the same
               | initial bootstrap problem of how you can prove that the
               | hex0 binary represents the hex in its source vs the hex1
               | binary and its source
               | 
               | Well presumably you toggle hex0 in on the front panel and
               | then type hex1 with the keyboard, which is easier than
               | toggling in the binary of hex1.
        
           | forkerenok wrote:
           | Or tattooed on oneself! Or etched on a dog tag!
        
             | spicybright wrote:
             | Nix is a great dog name
        
               | cpuguy83 wrote:
               | I get cat vibes from "Nix".
        
         | dataflow wrote:
         | How long does a fully bootstrapped build take?
        
           | pharmakom wrote:
           | With caching, just the time to download the artefact.
        
             | dataflow wrote:
             | Doesn't caching completely defeat the point of
             | bootstrapping? How do you know the cached artifact is
             | correct? You have to build it manually to verify that, at
             | which point you're still building manually...
        
               | __MatrixMan__ wrote:
               | 1. hash it
               | 
               | 2. rebuild it without the cache
               | 
               | 3. hash that
               | 
               | 4. compare
               | 
               | Or, trust somebody who has. Inconvenient, but is there
               | any other way to establish trust in the correspondence
               | between code and a binary?
        
               | mbakke wrote:
               | Guix has tooling to verify binaries:
               | 
               | https://guix.gnu.org/en/manual/en/html_node/Invoking-
               | guix-ch...
               | 
               | "guix build --no-grafts --no-substitutes --check foo"
               | will force a local rebuild of package foo and fail if the
               | result is not bit-identical. "guix challenge" will
               | compare your local binaries against multiple cache
               | servers.
               | 
               | I build everything locally and compare my results with
               | the official substitute servers from time to time.
        
               | pharmakom wrote:
               | You have a hash that n trusted parties agree on. This is
               | enabled by reproducible builds.
        
           | mbakke wrote:
           | It obviously depends on the hardware, but IIRC for me maybe
           | 3-4 hours building from the 357 byte seed to the latest GCC.
           | 
           | The early binaries are not very optimized :-)
        
         | tracnar wrote:
         | It's not yet as far as the Guix stage0, but there was an
         | interesting talk about bootstrapping nix from TinyCC at NixCon:
         | https://media.ccc.de/v/nixcon-2023-34402-bootstrapping-nix-a...
        
         | TacticalCoder wrote:
         | That is amazing and it is great to see there are people out
         | there fighting the good fight (while others ask: _" but where's
         | the benefit!? if there's a backdoor, everybody is still going
         | to get the backdoor!"_).
         | 
         | > it gives us a reliable way to verify the binaries we ship are
         | faithful to their sources
         | 
         | That's the thing many don't understand: it's not about proving
         | that the result is 100% trustable. It's about proving it's 100%
         | faithful to the source. Which means that _should_ monkey
         | business be detected (like a sneaky backdoor), it can be
         | recreated deterministically 100% of the time.
         | 
         | In other words for the bad guys: nowhere to run, nowhere to
         | hide.
        
       | somat wrote:
       | I find it funny(ironic) that the OpenBSD project is trying hard
       | to go the other way, every single install has unique and
       | randomized address offsets.
       | 
       | While I understand that these two goals, reproducible builds and
       | unique installs, are orthogonal to each other, both can be had at
       | the same time, the duality of the situation still makes me laugh.
        
         | oever wrote:
         | If the address offsets can be randomized with a provided seed,
         | then demonstrating reproducibility is still possible.
         | 
         | Alternatively, randomizing the offsets when starting the
         | program is another way to keep reproducibility and even
         | increase security; the offsets would change at every run.
        
         | WhyNotHugo wrote:
         | OpenBSD does randomised linking at boot time. Packages
         | themselves can still be reproducible. All the randomisation is
         | done locally after the packages are downloaded and their
         | checksums validated.
        
       | KennyFromIT wrote:
       | I've lived in the Red Hat ecosystem for work recently. How does
       | this compare to something like... Fedora Silverblue? Ansible?
       | Fedora Silverblue + Ansible?
        
         | TheDong wrote:
         | The closest equivalent to the nixos ISO builder and
         | reproducibility related to it in the fedora ecosystem is
         | osbuild / imagebuilder -
         | https://www.osbuild.org/guides/introduction.html
         | 
         | Imagebuilder claims reproducibility, but as far as I know it
         | mostly installed rpm packages as binaries, not from source, so
         | it's not really proper reproducibility unless all the input
         | packages are also reproducible.
         | 
         | If the descriptions of building packages from source, building
         | distro images, and reproducibility in the linked thread didn't
         | make sense to you, you're probably not really the target
         | audience anyway.
        
         | candiddevmike wrote:
         | Nix is a declarative OS, where you describe what the OS should
         | look like, instead of Ansible where you give the OS steps to
         | follow. Silverblue and Nix are orthogonal aside from being
         | Linux distributions--Silverblue is attempting to change how
         | software is delivered using only containers on an immutable
         | host.
         | 
         | If you're interested in an Ansible alternative that uses
         | Jsonnet and state tracking to somewhat mimic Nix, check out
         | Etcha: https://etcha.dev
        
           | rgoulter wrote:
           | > Nix is a declarative OS
           | 
           | I think precision is important.
           | 
           | "Nix" refers to the package manager (and the language the
           | package manager uses).
           | 
           | Whereas it's "NixOS" that's the OS which makes use of Nix to
           | manage the system configuration.
        
             | Ericson2314 wrote:
             | Thank you. This is important. Too bad our website doesn't
             | make it clear at all.
        
       | mbakke wrote:
       | Very impressive milestone, congrats to those who made this
       | possible!
       | 
       | > [...] actually rebuilding the ISO still introduced differences.
       | This was due to some remaining problems in the hydra cache and
       | the way the ISO was created.
       | 
       | Can anyone shed some light on the fix for "how the ISO was
       | created"? I attempted making a reproducible ISO a while back but
       | could not make the file system create extents in a deterministic
       | fashion.
        
         | raboof wrote:
         | For NixOS, it's in the 'how did we reproduce' section of the
         | article: the last step of that process produces the iso in the
         | ./result/iso directory.
         | 
         | It sounds like what you're looking for is the commands that
         | that build invoked, but I'm not sure what step you're looking
         | for. For example, the xorriso invocations are at
         | https://github.com/NixOS/nixpkgs/blob/master/nixos/lib/make-...
        
       | ahmedfromtunis wrote:
       | Stupid question as I never worked on something like this before:
       | why isn't reproducibility the default behavior?
       | 
       | I mean if 2 copies of a piece of software were compiled from the
       | same source, what stops them from being identical each and every
       | time?
       | 
       | I know there are so many moving parts, but I still can't
       | understand how discrepancies can manifest themselves.
        
         | bravetraveler wrote:
         | I don't develop enough to give a particularly good answer, but
         | one example I've heard of involves timestamps
         | 
         | Imagine the program uses the current date or time as a value.
         | When compiled at different moments, the bits change.
         | 
         | Same applies to anything where the build environment or timing
         | influences the output binary
        
         | kaba0 wrote:
         | Parallelism. There might be actions that are not order-
         | independent, and the state of the CPU might result in slightly
         | different binaries, but all are correct.
        
           | edgyquant wrote:
           | Why does this matter though? Why does order of compilation
           | result in a different binary?
        
             | speed_spread wrote:
             | Because order of completion of the parallel tasks is not
             | guaranteed, if all tasks write to the same file you might
             | get a different result each time.
        
             | kaba0 wrote:
             | Just some random, made up example: say you want to compile
             | an OOP PL that has interfaces and implementations of that.
             | You discover reachable implementations through static
             | analysis, which is multi-threaded. You might discover
             | implementations A,B,C in any order -- but they will get
             | their methods placed in the jump table based on this order.
             | This will trivially result in semantically equivalent, but
             | not binary-equivalent executables.
             | 
             | Of course there would have been better designs for this toy
             | example, but binary reproducibility is/was usually not of
             | the highest priority historically in most compiler
             | infrastructures, and in some cases it might be a relatively
             | big performance regression to fix, or simply just a too big
             | refactor.
        
           | TacticalCoder wrote:
           | > There might be actions that are not order-independent, and
           | the state of the CPU might result in slightly different
           | binaries, but all are correct.
           | 
           | Well no: that's really the thing reproducible packages are
           | showing: there's only _one_ correct binary.
           | 
           | And it's the one that's 100% reproducible.
           | 
           | I'd even say that that's the whole point: there's only _one_
           | correct binary.
           | 
           | I'll die on the hill that if different binaries are "all
           | correct", then none are: for me they're all useless if
           | they're not reproducible.
           | 
           | And it looks like people working on entire _.iso_ being fully
           | bit-for-bit reproducible are willing to die on that hill too.
        
             | kaba0 wrote:
             | See my reply to the sibling post -- binary reproducibility
             | is not the end goal. It is an important property, and I do
             | agree that most compiler toolchains should strive for that,
             | but e.g. it might not be a priority for, say, a JIT
             | compiler.
        
         | Smaug123 wrote:
         | Loads of things. Obvious ones where the decision is explicitly
         | taken to be non-reproducible include timestamps and authorship
         | information. There are also other places where reproducibility
         | is implicitly broken by default: e.g. many runtimes don't
         | define the order of entries in a hashmap, and then the compiler
         | iterates over a hashmap to build the binary.
        
           | londons_explore wrote:
           | I can see why devs would want "This Software was built on
           | 10/10/2007 by bob7 from git hash aaffaaff" to appear on the
           | splash screen of software.
           | 
           | How do you get similar behaviour while having a reproducible
           | build?
           | 
           | Can you, for example, have the final binary contain a
           | reproducible part, and another section of the elf file for
           | deliberately non-reproducible info?
        
             | beisner wrote:
             | if you have a reproducible build, then the notion of
             | "software was _built on date_ by _user_ " is kind of
             | useless information, no? Because it does not matter - if
             | you can verify that a specific git hash of a codebase
             | results in a particular binary through reproducible builds,
             | a malicious adversary could have built it yesterday and
             | given it to me and i can be almost surely confident
             | (barring hash-collisions...) it's identical to a known
             | trusted team member building it.
             | 
             | Having information about which git has was used, as well as
             | the time it was published, is part of the source
             | distribution so an output can contain references to these
             | inputs and still be deterministic w.r.t. those inputs.
             | 
             | If you REALLY want to know when/who built something, you
             | could add in an auxiliary source file which contains that
             | information, which is required to build. Which is
             | essentially what compilers which leverage current time do
             | anyway, it's just implicit.
        
               | ndriscoll wrote:
               | Your source would also have to have a reference to which
               | exact version of which compiler to use, which versions of
               | which external headers to use, etc. and now you're
               | inventing Nix.
               | 
               | Conceivably there could be a standard for a sidecar file
               | to specify how something was built (e.g. nixpkgs commit
               | hash, or all of the parameters that went into the build).
               | Or content address the inputs, i.e. invent Nix again.
               | 
               | So we could solve this problem by having everyone
               | standardize on using Nix.
        
               | Smaug123 wrote:
               | Such standards do exist:
               | https://slsa.dev/spec/v1.0/provenance
        
               | londons_explore wrote:
               | The usecase is: user wants an easy way to know, from the
               | GUI of some running software, exactly what
               | build/version/git commit/branch/date they're running -
               | perhaps to file a bug report for example.
               | 
               | The actual _build date_ doesn 't matter if the software
               | is reproducible - but its a proxy for 'how out of date is
               | this software'.
        
               | tripdout wrote:
               | In that case, you can report the Git SHA and still be
               | reproducible.
        
               | tikhonj wrote:
               | If you actually had reproducible builds, the _build date_
               | would not tell you anything about how out of date the
               | software is--you would only need the date of the source
               | code the binary was built from. By definition, the binary
               | you 'd get from building a version of the source today
               | would be identical to the version you'd get building it
               | the day that version of the source was finished.
        
             | Smaug123 wrote:
             | Yeah, "who built this" information belongs in a signing
             | certificate that accompanies the build artefact, not in the
             | artefact itself. The Git hash can certainly appear in the
             | binary (it's a reproducible part of the build input), and
             | the date can instead be e.g. the commit date, which is
             | probably more relevant to a user anyway.
        
               | bloak wrote:
               | Much as I like Git, I'm not sure I like the idea of the
               | artefacts depending on the git commit and therefore on
               | the entire git history. I rather feel the artefacts
               | should only depend on the actual source and not on a
               | particular version control system used for storing the
               | source.
        
         | dwheeler wrote:
         | There are many specific causes, time stamps probably being the
         | most common issue. You can see a list of common issues here:
         | 
         | https://reproducible-builds.org/docs/
         | 
         | The main overall issue is that developers don't test to ensure
         | they reproduce. Once it's part of the release tests it tends to
         | stay reproducible.
        
           | acaloiar wrote:
           | I agree, although I wouldn't describe the overall issue as
           | developers not testing to ensure reproducibility. The reason
           | most builds aren't reproducible is that build reproducibility
           | isn't a goal for most projects.
           | 
           | It would be great if 100% of builds were reproducible, but I
           | don't believe developers shouldn't be testing for
           | reproducibility unless it's a defined goal.
           | 
           | As generalized reproducible build tooling (guix, nix, etc.)
           | becomes more mainstream, I imagine we'll see more
           | reproducible builds as adoption grows and reproducibility is
           | no longer something developers have to "check for", but
           | simply rely upon from their tooling.
        
             | acaloiar wrote:
             | Typo: I don't believe developers shouldn't be -> I don't
             | believe developers should be
        
         | dataflow wrote:
         | Sometimes it's randomized algorithms, sometimes it's
         | performance (e.g. it might be faster not to sort something),
         | sometimes it's time or environment-dependent metadata,
         | sometimes it's thread interleaving, etc.
        
           | drdrey wrote:
           | a very common one is pointer values being different from run
           | to run and across different operating systems. Any code that
           | intentionally or accidentally relies on pointer values will
           | be non-deterministic
        
             | dataflow wrote:
             | Would be nice if you could explain how/why this happens,
             | given that normally, pointers aren't persisted.
        
               | traxys wrote:
               | I think they meant if you cast a pointer to an integer,
               | do some math on that and then store that. Then you will a
               | stored result that will likely differ from run to run
        
               | edgyquant wrote:
               | That sounds like runtime differences not a difference
               | between two binaries
        
               | stcg wrote:
               | The difference in binaries must be caused by some runtime
               | difference of a compiler.
        
               | someplaceguy wrote:
               | Languages such as Standard ML and others (Scheme? Lisp?
               | Not sure...) have implementations that can save the
               | current state of the heap into a binary.
               | 
               | This is used in theorem provers, for example, so that you
               | don't have to verify proofs of theorems over and over
               | again (which can be very slow).
               | 
               | Instead, you verify them once, save the state of the heap
               | to disk (as a binary ELF, for instance) and then you can
               | run the binary to continue exactly where you left off
               | (i.e. with all the interesting theorems already in
               | memory, in a proved state).
               | 
               | This is what the HOL4 theorem prover's main `hol` script
               | does, i.e. it runs HOL4 by loading such a memory state
               | from disk, with the core theories and theorems already
               | loaded.
               | 
               | Presumably, to make this reproducible you'd need to make
               | sure that all the memory objects are saved to disk in a
               | deterministic order somehow (e.g. not in memory address
               | order, as it can change from run to run, especially when
               | using multiple threads).
               | 
               | Edit: Presumably you'd also need to make sure that you
               | persist the heap when all threads are idle and in a known
               | state (e.g. with all timers stopped), to avoid random
               | stack states and extraneous temporary allocations from
               | being persisted, which would also affect the resulting
               | binary.
        
               | dataflow wrote:
               | Thanks, yeah. So I guess the concrete example I would
               | cite here is that the most natural (and most efficient?)
               | way of persisting std::map<ptr, ....> would introduce
               | pointer ordering into the output.
        
               | someplaceguy wrote:
               | Just like the most natural (and most efficient?) way of
               | persisting any std::unordered_map<...> can result in a
               | completely randomly-ordered output, due to a DoS
               | mitigation that some commonly-used language runtimes
               | have.
        
             | edgyquant wrote:
             | That's runtime behavior
        
         | mseepgood wrote:
         | Laziness and carelessness of compiler developers.
        
           | jonhohle wrote:
           | As others have mentioned, there's sorting issues (are
           | directory entries created in the same order for a project
           | that compiled everything in a directory?), timestamps
           | (archive files and many other formats embed timestamps), and
           | things that you really want to be random (tmpdir on Linux [at
           | least in the past] would create directories of varying
           | length).
           | 
           | I've successfully built tools to compare Java JARs that
           | required getting around two of those and other test tools
           | that required the third. I'm sure there are more.
        
         | cpuguy83 wrote:
         | Here is a very recent post from the Go team on things they had
         | to do to make the Go toolchain fully reproducible.
         | 
         | https://go.dev/blog/rebuild
        
         | fooker wrote:
         | A surprising amount of compiler and program behavior depends on
         | how pointer values compare.
         | 
         | These comparisons don't have to go the same way for everything
         | to be correct.
        
       | mgaunard wrote:
       | Don't you have to fake the system time to do this? The time often
       | ends up inside the binaries one way or another.
        
         | mbakke wrote:
         | Indeed time stamps are probably the most common sources of
         | indeterminism. So common that a de-facto standard variable to
         | fake a timestamp has been implemented in many compilers:
         | 
         | https://reproducible-builds.org/docs/source-date-epoch/
        
         | amelius wrote:
         | Could you name an example of how (and for what reason) this
         | might happen?
        
           | mbakke wrote:
           | Typically part of a "version string":                   $
           | python3         Python 3.10.7 (main, Jan  1 1970, 00:00:01)
           | [GCC 11.3.0] on linux         Type "help", "copyright",
           | "credits" or "license" for more information.         >>>
           | 
           | Perhaps a relic from when software had to be manually
           | updated?
        
             | oever wrote:
             | On NixOS, I think the release time or commit time is used:
             | $ python3         Python 3.10.11 (main, Apr  4 2023,
             | 22:10:32) [GCC 12.2.0] on linux         Type "help",
             | "copyright", "credits" or "license" for more information.
             | >>>
             | 
             | That is more useful than the build time.
        
               | mbakke wrote:
               | How is that possible? Is nixpkgs an input to the Python
               | derivation? Or do packagers "hard code" a value every
               | time they modify the Python build code? Automated tooling
               | that sets it after pull requests? Something else? :-)
        
               | Smaug123 wrote:
               | GCC respects SOURCE_DATE_EPOCH, and Nixpkgs has specific
               | support for setting that environment variable: https://gi
               | thub.com/NixOS/nixpkgs/blob/92fdbd284c262f3e478033...
               | (although I haven't proved that this is actually how it
               | works for cpython's build).
               | 
               | Irrelevant spelunking details follow:
               | 
               | That string is output by cpython to contain the contents
               | of the __DATE__ C macro (https://github.com/python/cpytho
               | n/blob/fa35b9e89b2e207fc8bae... which calls to https://gi
               | thub.com/python/cpython/blob/fa35b9e89b2e207fc8bae...
               | which uses the __DATE__ macro at https://github.com/pytho
               | n/cpython/blob/fa35b9e89b2e207fc8bae... ).
               | 
               | Cpython is defined in nixpkgs at https://github.com/NixOS
               | /nixpkgs/blob/92fdbd284c262f3e478033... which I imagine
               | (but haven't proved) uses GCC.
        
               | mbakke wrote:
               | Thank you! Setting SOURCE_DATE_EPOCH to the most recent
               | file timestamp found in the source input is a clever
               | hack.
        
               | oever wrote:
               | 2023-04-04T22:10:32 is the timestamp of
               | Python-3.10.11/Misc/NEWS from https://www.python.org/ftp/
               | python/3.10.11/Python-3.10.11.tar...
        
               | raboof wrote:
               | The source for the cpython build is the release tarball (
               | https://github.com/NixOS/nixpkgs/blob/master/pkgs/develop
               | men...).
               | 
               | In that case, NixOS sets SOURCE_DATE_EPOCH (which I
               | suspect will be picked up by the python build) to the
               | latest timestamp found in that archive
               | (https://github.com/NixOS/nixpkgs/blob/master/pkgs/build-
               | supp...)
        
           | mgaunard wrote:
           | GCC embeds timestamps in o/gcno/gcda files to check they
           | match.
           | 
           | It's mostly annoying as gcov will actively prevent you from
           | using gcda files from a different but equivalent binary than
           | what generated the gcno.
        
       | Uptrenda wrote:
       | Wouldn't this help solve the problem Ken Thompson wrote about in
       | 'reflections on trusting trust?' If you can fully bootstrap a
       | system from source code then it's harder to have things like
       | back-doored compilers.
        
       | Crontab wrote:
       | I love that there are people out there who cares about things
       | like this.
        
       ___________________________________________________________________
       (page generated 2023-10-29 23:00 UTC)