[HN Gopher] NixOS Reproducible Builds: minimal ISO successfully ...
___________________________________________________________________
NixOS Reproducible Builds: minimal ISO successfully independently
rebuilt
Author : CathalMullan
Score : 425 points
Date : 2023-10-29 11:41 UTC (11 hours ago)
(HTM) web link (discourse.nixos.org)
(TXT) w3m dump (discourse.nixos.org)
| mihalycsaba wrote:
| Sorry for being dense, but I thought one of the main reason for
| nixos's existence is reproducibilty. I thought they have these
| kinds of things solved already.
|
| I have only ~2 hours experience with Nixos, wanted to try
| hyprland, I thought it would be easier on Nixos since hyprland
| needs a bit of setup and maybe it's easier to use someone else's
| config on nixos, than on some other distro. Finding a config was
| hard too, found like 3 on some random github gists, thought there
| would be more... and none of them worked, at that point I gave
| up.
| rgoulter wrote:
| Yeah, Nix is a tough tool to learn. It's probably never the
| right tool to pick for "I just want something that works right
| now" if you're unfamiliar with it.
|
| > I thought one of the main reason for nixos's existence is
| reproducibilty
|
| NixOS uses "reproducible" to mean "with the same Nix code, you
| get the same program behaviour". This is more/less what people
| hope Dockerfiles provide.
|
| This is the level of reproducibility you want when you say "it
| works on my machine" or "it worked last time I tried it".
|
| Whereas "reproducible build" aims for bit-for-bit equality for
| artifacts build on different machines. -- With this, there's a
| layer of security in that you can verify that code has been
| built from a particular set of sources.
|
| > Finding a config was hard too
|
| What search query were you using? Searching "nixos
| configuration" on
| https://github.com/search?q=nixos%20configuration&type=repos...
|
| Or searching for hyprland specifically, there seem to be many
| using that
| https://github.com/search?q=wayland.windowManager.hyprland&t...
| amarshall wrote:
| > NixOS uses "reproducible" to mean "with the same Nix code,
| you get the same program behaviour".
|
| Note that "Nix code" also includes the hashes of all non-Nix
| sources. One way to think of it is that Nix has reliable
| build cache invalidation.
|
| > This is more/less what people hope Dockerfiles provide.
|
| Indeed, but importantly they do _not_ provide input-
| reproducibility (while Nix does) because, at least, there are
| no hashes for remote data.
| mihalycsaba wrote:
| I don't remember, some of them needed some other tools
| installed(like flakes whatever it is), I looked for configs,
| that looked like they don't need a few more hours to learn
| and to setup some other tools for them to work.
|
| I just wanted to take a quick look at hyprland, I imagined I
| just use an existing config, I never thought it would need
| hours of research. Later I installed an arch vm and managed
| to install hyprland with some basic components in less than
| an hour from the first guide I found.
|
| Looks like I misunderstood, what nix was made for. I just
| want a system I can more or less set up with a simple config
| file.
|
| I saw this os, didn't have time to try it yet, but I thought
| this is how nix works. https://blendos.co/
|
| For example you just define gnome like this, the nix configs
| I found looked similar, they just didn't work.
|
| >gnome:
|
| > enabled: true
|
| > style: light
|
| > gtk-theme: 'adw-gtk3'
|
| > icon-theme: 'Adwaita'
|
| > titlebar:
|
| > button-placement: 'right'
|
| > double-click-action: 'toggle-maximize'
|
| > middle-click-action: 'minimize'
|
| > right-click-action: 'menu'
| ParetoOptimal wrote:
| > I just wanted to take a quick look at hyprland, I
| imagined I just use an existing config, I never thought it
| would need hours of research.
|
| It shouldn't.
|
| You'd want a simple flake to start with that has home-
| manager (for higher chance of finding declarative vest
| practice configs and modules) and to add small things to
| that.
|
| I imagine you tried grabbing someone's complex config,
| modifying it, and ran into issues?
| t0astbread wrote:
| Flakes will hopefully be that soon but I wouldn't
| recommend starting with flakes when learning Nix in 2023.
| They're experimental and you still need to learn most of
| flake-less Nix (except channels and NIX_PATH) anyways.
|
| When I started learning/using NixOS about two years ago I
| found it useful to start out with just Nixpkgs (i.e. what
| you get out of the box) and only add libraries when I
| felt they would help me. My first configs where ugly as
| hell and full of bad practice but the cool thing about
| Nix is that it gives you a lot of safety nets to enable
| experimentation and refactoring.
| rgoulter wrote:
| > Flakes will hopefully be that soon but I wouldn't
| recommend starting with flakes when learning Nix in 2023.
|
| That Flakes provide a consistent entrypoint (and a
| consistent schema for such) into a codebase would have
| deferred a significant amount of confusion I had when
| getting started with Nix.
|
| > They're experimental
|
| The functionality as-is hasn't been changed. The
| 'experimental' flag itself hasn't been a _practical_
| problem.
|
| However, flakes still have some rough edges & design
| problems to them, and there's some disagreement in the
| community over how flakes were rolled out.
|
| I'd say for an end user, the benefits far outweigh the
| costs.
|
| > ... and you still need to learn most of flake-less Nix
| (except channels and NIX_PATH) anyways.
|
| I think the phrase "flake-less Nix" paints the wrong
| idea. I'd instead put it: Most of what you need to learn
| about Nix is unrelated whether the Nix evaluation started
| from a Flake or not.
| ParetoOptimal wrote:
| > Flakes will hopefully be that soon but I wouldn't
| recommend starting with flakes when learning Nix in 2023.
| They're experimental and you still need to learn most of
| flake-less Nix (except channels and NIX_PATH) anyways.
|
| I've used Nix for a decade and wouldn't recommend the
| confusing and horrible user experience of Nix without
| flakes.
|
| Additionally, if you are using github for code examples,
| you'll have far more success using flakes.
|
| Many experienced people a new user would get help from,
| including myself, have long since washed their hands of
| prw-flakes issues and arcana like channels issues.
| lifeisstillgood wrote:
| I am on a similar journey
|
| I built https://github.com/mikadosoftware/workstation (hey
| nearly 500 stars!) as the idea of defining a reproducible
| laptop build.
|
| I don't think docker is the right level - so my next
| project when i have free time (!) is to do a box build that
| then might compile to docker
|
| I think there is a sensible point of being able to define
| via nix both developer workstations and servers
| k8svet wrote:
| Except it's Docker, and like virtually all Dockerfiles,
| it immediately runs "apt-get update", tossing
| reproducibility out the window.
| quietbritishjim wrote:
| There are two senses of reproducible.
|
| The sense you're thinking of is that you can easily rebuild a
| binary package and it will use the same dependency versions,
| build options, etc. There should be no chance of a compiler
| error that didn't happen the first time (the old "but it worked
| on my laptop" syndrome).
|
| The sense used here is that every build output is byte-for-byte
| _binary identical_. It doesn 't depend on the machine name, the
| time it was compiled or anything like that (or, in a parallel
| build, the order in which files finish compiling). That is much
| harder.
| jowea wrote:
| > The sense you're thinking of is that you can easily rebuild
| a binary package and it will use the same dependency
| versions, build options, etc. There should be no chance of a
| compiler error that didn't happen the first time (the old
| "but it worked on my laptop" syndrome).
|
| And that's just for Nixpkgs, the packages themselves that
| also work outside NixOS. NixOS has reproducibility of the
| entire system complete with configuration.
| ParetoOptimal wrote:
| > Finding a config was hard too, found like 3 on some random
| github gists, thought there would be more..
|
| That sounds odd, did you use github code search?
|
| Find relevant home manager options:
|
| https://mipmip.github.io/home-manager-option-search/?query=h...
|
| Then search those on github:
|
| https://github.com/search?utf8=%E2%9C%93&q=lang%3Anix+hyprla...
|
| Note some option searches imply more casual or advanced users.
| chpatrick wrote:
| > Sorry for being dense, but I thought one of the main reason
| for nixos's existence is reproducibilty. I thought they have
| these kinds of things solved already.
|
| Nixos has the advantage that everything is built in its own
| sandbox with only its explicitly declared (and hashed)
| dependencies available, unlike in mainstream distros where it's
| the full system environment, so in many cases you already get
| the same binary every time. But this doesn't immediately lead
| to reproducibility because the build process might be
| nondeterministic for various packages.
| benreesman wrote:
| This is a really good comment, I have no idea why it's going
| grey.
|
| Upvote from me FWIW.
| WhyNotHugo wrote:
| > unlike in mainstream distros where it's the full system
| environment
|
| Usually packages are built in an environment which has only a
| minimal base system plus the package's explicitly
| dependencies. They don't have random unnecessary packages
| installed.
| goodpoint wrote:
| > unlike in mainstream distros
|
| Debian has been building in a clean sandbox with only
| required, tracked dependencies since decades.
|
| It's also building the large majority of packages
| reproducibly including the binary and whole installation
| packages (not just the sources like nixos)
| chpatrick wrote:
| > not just the sources like nixos
|
| Not sure what you mean by that, the Nix packages that are
| reproducible have reproducible binaries.
|
| In the Nixos world there isn't really a concept of a
| "binary/installation package" like in Debian or elsewhere.
| Everything can be rebuilt from source on any machine, but
| because everything is hashed, if the official binary caches
| have already built something with the same inputs, they can
| just give you the outputs directly. So it's more like
| memoization than a .deb or something that you install.
|
| Nix is a functional language that builds recipes
| (derivations) to build stuff, with all the inputs and
| outputs hashed. If the derivation you want to build has
| already been built by a cache you trust, the system will
| just fetch it instead of building locally.
|
| What the Nix reproducability project checks is that the
| same derivation produces the same output regardless of what
| machine it's built on.
| Aerbil313 wrote:
| Check out https://github.com/donovanglover/nix-config . Flake
| based config with hyprland and cool stuff.
|
| > at that point I gave up.
|
| NixOS is not for the weak or time constrained, currently.
| Hopefully it will be one day. Still if you push through, you
| reap the benefits.
| flkiwi wrote:
| Another good option: https://github.com/Misterio77/nix-
| starter-configs
|
| I started with this one, the minimal version, then moved on
| to something more like the standard version, and now I'm
| moving on to something based on his much more complicated and
| flexible build in a different repo. I had been flailing, then
| this repo made it click.
| colordrops wrote:
| Nix is reproducteable in tbe environment sense, meaning you can
| get the exact same setup every time, but not in the bit-for-bit
| sense, meaning that the compiled binaries will be identical.
| Reventlov wrote:
| For those wondering : it should be remembered that the
| reproducibility of Nix / NixOS / Nixpkgs is only a
| reproducibility of the sources: if the sources change, one is
| warned, but it is not a question of the reproducibility of the
| binaries (which can change at each build). This binary
| reproducibility of Nix / NixOS / Nixpkgs is indeed not really
| tested, at least not systematically.
|
| Guix, Archlinux, Debian do the binary reproducibility better than
| Nix / NixOS / Nixpkgs.
|
| Sources :
|
| - https://r13y.com/ ( Nix* )
|
| - https://tests.reproducible-builds.org/debian/reproducible.ht...
| ( Debian )
|
| - https://tests.reproducible-builds.org/archlinux/archlinux.ht...
| ( Archlinux )
|
| - https://data.guix.gnu.org/repository/1/branch/master/latest-...
| (Guix, might be a bit slow to load, here is some cached copy
| https://archive.is/lTuPk )
| amarshall wrote:
| r13y.com is outdated vs. https://reproducible.nixos.org/
| dicytea wrote:
| > Guix, Archlinux, Debian do the binary reproducibility better
| than Nix / NixOS / Nixpkgs.
|
| Huh, didn't know that Arch Linux tests reproducibility. It's
| apparently 85.6% reproducible:
| https://reproducible.archlinux.org
|
| I wonder how much work would be needed for NixOS, considering
| it has more than _80k_ packages in the _official_ repository.
| chpatrick wrote:
| I think that's also a bit of an unfair comparison given the
| number of AUR packages you usually use on Arch. With nixpkgs
| there isn't a distinction between official and community
| packages.
| iopq wrote:
| Sure there is, the NUR has a few thousand community
| packages that are not ready for release
|
| The nixpkgs are all official packages, it's just really
| easy to become a maintainer (you make a pull request adding
| the package you want to maintain)
| chpatrick wrote:
| > but it is not a question of the reproducibility of the
| binaries (which can change at each build). This binary
| reproducibility of Nix / NixOS / Nixpkgs is indeed not really
| tested, at least not systematically.
|
| Isn't that exactly what your first source and OP are about?
| They check that the binaries are the same when built from the
| same sources on different machines. The point is exactly that
| the binaries don't change with every build.
|
| > How are these tested?
|
| > Each build is run twice, at different times, on different
| hardware running different kernels.
| Reventlov wrote:
| Yeah, that represent maybe 1% of the packages in nixpkgs
| (only the installation iso).
| chpatrick wrote:
| Sure but the goal is the same, binary reproducibility, and
| it is systematic. It's just less far along than Debian.
|
| Also I'm pretty sure a big percent of nixpkgs is already
| reproducible, we just don't know for sure.
|
| They say the next step might be the GNOME-based ISO, which
| would be a big achievement because it's basically a full-
| featured system.
| clhodapp wrote:
| That is not true at all, with respect to the aims or the
| reality of nixpkgs. The original post here is talking about
| reproducing the (binary) minimal iso, which contains a bunch of
| binary packages.
| Reventlov wrote:
| It is true. The original post writes about reproducing the
| minimal iso, which contains probably around 1% of the
| packages in nixpkgs. The remaining packages are not tested
| regarding binary reproducibility, or, at least, not in a
| systematic manner, which means regressions may happen
| regularly (which is exactly what happened with the .iso, see
| the previous announcement from 2021:
| https://discourse.nixos.org/t/nixos-unstable-s-iso-
| minimal-x... .)
| mauricioc wrote:
| To emphasize chpatrick's point below, there are two definitions
| of "reproducibility" in this context:
|
| * Input reproducibility, meaning "perfect cache invalidation
| for inputs". Nix and Guix do this perfectly by design (which
| sometimes leads to too many rebuilds). This is not on the radar
| for Debian and Arch Linux, which handle the rebuild problem
| ("which packages should I rebuild if a particular source file
| is updated?") on an ad-hoc basis by triggering manual rebuilds.
|
| * Output reproducibility, meaning "the build process is
| deterministic and will always produce the same binary". This is
| the topic of the OP. Nix builds packages in a sandbox, which
| helps but is not a silver bullet. Nix is in the same boat as
| Debian and Arch Linux here; indeed, distros frequently upstream
| patches to increase reproducibility and benefit all the other
| distros. In this context, https://reproducible.nixos.org is the
| analogue of the other links you posted, and I agree Nix reports
| aren't as detailed (which does not mean binary reproducibility
| is worse on Nix).
|
| Your comment can be misinterpreted as saying "Nix does not do
| binary reproducibility very well, just input reproducibility",
| which is false. That's the whole point of the milestone being
| celebrated here!
| Foxboron wrote:
| > Your comment can be misinterpreted as saying "Nix does not
| do binary reproducibility very well, just input
| reproducibility", which is false.
|
| It's only "false" as nobody has actually tried to rebuild the
| entire package repository of nixpkgs, which to my knowledge
| is an open problem nobody has really worked on.
|
| The current result is "only" ~800 packages and the set has
| regular regressions.
| prateem_ wrote:
| I am probably misunderstanding your point BUT I have
| actually depended on Nix for "reproducible docker images"
| for confidential compute usecase so that all parties can
| independently verify the workload image hash. Rarely
| (actually only once) it did fail to produce bit identical
| images every other time it successfully produced bit
| identical images on very different machine setups. Granted
| this is not ISO but docker images, but I would say Nix does
| produce reproducible builds for many real world complex
| uses.
|
| Ref: [1] https://gitlab.com/prateem/turning-polyglot-
| solutions-into-t... [2]
| https://discourse.nixos.org/t/docker-image-produced-by-
| docke...
| Foxboron wrote:
| I'm very sure you are actually just rebuilding the
| container images themselves, not the package tree you are
| depending on. Building reproducible ISOs, or container
| images, with a package repository as a base isn't
| particularly hard these days.
| prateem_ wrote:
| I see what you mean. Thanks for clarifying. Even so, Nix
| is no worse placed than those other distributions for bit
| reproducibility. Correct?
| Foxboron wrote:
| It's unclear at the moment because of the limited testing
| (minimal ISO and a Gnome ISO) vs Arch/Debian/Guix
| rebuilding entire package repositories.
| dathinab wrote:
| I think you might want to read the article.
|
| it's about binary bit by bit reproducibility of not just the
| binaries but also how they get packed into an iso (i.e.
| r13y.com is outdated, the missing <1% where also as far as I
| remember a _upstream_ python regression as reproducability of
| binaries (ignoring the packaging into an iso) was already there
| a few years ago)
|
| now when it comes to packages beyond the core iso things become
| complicated to compare due to the subtle but in this regard
| significant different ways they handle packages, e.g. a bunch
| of packages you would find on arch in aur you find as normal
| packages in nix and most of the -bin upstream packages are
| simply not needed with nix
|
| in general nix makes it easier to create reproducible builds
| but (independent of nix) this doesn't mean that it's always
| possible and often needs patching which often but not always is
| done if you combine this with the default package repository of
| nix being much larger (>80k) then e.g. arch (<15k non aur)
| comparing percentages there isn't very useful.
|
| through one very common misconception is that the hash in the
| nix store path is based on the build output, but it's instead
| based on all sources (weather binary not) used for building the
| binary in an isolated environment
|
| this means it has not quite the security benefit some people
| might think it has, but in turn is necessary as it means nix
| can use software which is non reproducible buildable in a way
| which still produces reasonable reproducable deplyments (as in
| not necessary all bits the same but all functionality,
| compiler-cfgs, dependencies versions, users, configurations
| etc. being the same
| watersucks wrote:
| Doesn't the content-addressed derivation experimental feature
| address this issue? Instead of store hashes being input-
| addressed as you mention, the derivation outputs are used to
| calculate the store hash, which ensures binary reproducibility.
| Smaug123 wrote:
| Ish. This is covered in section 6.4.1 of Eelco's thesis
| (https://edolstra.github.io/pubs/phd-thesis.pdf). It all
| becomes much simpler if evaluating a build many times can
| only ever result in one output, but the Nix content-addressed
| model does permit multiple outputs. In such cases, the system
| just has to choose a canonical output and use that one,
| rewriting hashes as necessary to canonicalise inputs which
| are non-canonical.
| onedognight wrote:
| Rebuilding the minimal ISO from source is an impressive milestone
| on the journey to a system that builds from source reproducibly.
| Guix had an orthogonal but equally impressive milestone on the
| same journey recently[0], bootstrapping a full compiler toolchain
| from a single reproducible 357 byte binary without any other
| binary compiler blobs. These two features may one day soon be
| combined to reproducibly build a full distribution from source.
|
| [0] https://guix.gnu.org/en/blog/2023/the-full-source-
| bootstrap-...
| Thrir94994i wrote:
| 357 bytes for bootstrap compiler binary is VERY impressive!
| msm_ wrote:
| If I remember correctly, this tiny binary is used to
| (reproducibly) bootstrap the next binary, which bootstraps
| the next binary, until eventually GCC can be compiled (and
| compile other software).
| Smaug123 wrote:
| The bootstrap chain is https://github.com/oriansj/stage0-po
| six-x86/blob/e86bf7d304b...
| rssoconnor wrote:
| To be fair, it is 357 bytes ... plus a POSIX operating
| system.
|
| Still, that POSIX operating system bit is also being worked
| on.
| 6581 wrote:
| Isn't that what builder-hex0 does?
|
| https://github.com/ironmeld/builder-hex0
| 15155 wrote:
| At 357 bytes, do you need a reproducible binary at all?
|
| I'd think one could hand-document all 357 bytes of machine code
| and have them be intelligible.
| ahoka wrote:
| That's just the first stage. Simple enough to be audited
| manually.
| jowea wrote:
| This[0] is basically the hand-documentation of those bytes
| then. Handwritten ELF header and assembly code.
|
| [0] https://github.com/oriansj/bootstrap-
| seeds/blob/master/POSIX...
| ralferoo wrote:
| Just had a read of this to see what it did... And I must
| admit, I don't understand what purpose this is supposed to
| serve.
|
| All it seems to do is convert hex into binary and dump it
| to a file. Not sure how that's any more useful than just
| copying the binary for next stage directly, after all this
| binary had to get on the system somehow.
| reactordev wrote:
| It's the first stage. Likely piped. Hence the hex out.
| The context on how it's called is key: https://github.com
| /oriansj/stage0-posix-x86/blob/e86bf7d304b...
| autumn-antlers wrote:
| Section 1.6.1 of the GNU Mes manual places these early
| stages assemblers into context:
|
| https://www.gnu.org/software/mes/manual/mes.html#Stage0
| rssoconnor wrote:
| The program does also dispose of comment lines.
|
| One could argue that this is just a kind of trick so they
| can say the next "binary" is actually a "source" file
| because it happens to be written by a human in ASCII.
|
| Still the phase distinction between what is a source and
| what is a binary becomes blurry at this low level. I
| believe the next stage of compiling is to, writing in
| ASCII represented machine code with comments, to allow
| for the existence of labels and then compute offsets for
| jumps to labels. And then more and more features are
| added until you have a minimal assembler letting you
| write somewhat machine independent code, and then
| continuing to work you way up the toolchain.
|
| So at which point does the translation from "source" to
| "binary" become a real thing and not just a trick of
| semantics? Is it when we have a machine independent
| assembly code? Is it when we computed offsets for
| labelled jumps? It is when we started stripping comments
| out of the source code?
| ralferoo wrote:
| Yeah, I kind of agree, but my issue is kind of with this
| statement (in the link from the peer post):
|
| > What if we could bootstrap our entire system from only
| this one hex0 assembler binary seed? We would only ever
| need to inspect these 500 bytes of computer codes. Every
| later program is written in a more friendly programming
| language: Assembly, C, ... Scheme.
|
| And my issue is that this isn't true. hex1 isn't written
| in assembler any more than hex0 is. Both of those
| bootstrap files can get onto the system simply by
| ignoring whitespace and anything after #, converting the
| hex into binary and writing it to a file.
|
| Having hex0 doesn't add anything to the mix, other than
| being shorter than hex1, because you still have the same
| initial bootstrap problem of how you can prove that the
| hex0 binary represents the hex in its source vs the hex1
| binary and its source and both have the same problem of
| needing to prove the hex in the source matches the
| assembly (and that the program even does what the
| comments claim).
|
| hex1 is a more useful bootstrap point, because you can
| use standard system tools to create the binary from the
| source (e.g. sed) and also compile itself and verify that
| the files are the same.
|
| Having hex0 and hex1 just means you need to manually
| verify both rather than just hex1.
|
| I guess my point is that if you have insufficient trust
| in your system that you can't e.g. trust "sed" to create
| the original binary files, or trust the output of "dd -x"
| or "md5sum" to verify the binary files, you also can't
| trust it enough to verify that the hex in those source
| files is correct or that the binary files match.
| lmm wrote:
| > Having hex0 doesn't add anything to the mix, other than
| being shorter than hex1, because you still have the same
| initial bootstrap problem of how you can prove that the
| hex0 binary represents the hex in its source vs the hex1
| binary and its source
|
| Well presumably you toggle hex0 in on the front panel and
| then type hex1 with the keyboard, which is easier than
| toggling in the binary of hex1.
| forkerenok wrote:
| Or tattooed on oneself! Or etched on a dog tag!
| spicybright wrote:
| Nix is a great dog name
| cpuguy83 wrote:
| I get cat vibes from "Nix".
| dataflow wrote:
| How long does a fully bootstrapped build take?
| pharmakom wrote:
| With caching, just the time to download the artefact.
| dataflow wrote:
| Doesn't caching completely defeat the point of
| bootstrapping? How do you know the cached artifact is
| correct? You have to build it manually to verify that, at
| which point you're still building manually...
| __MatrixMan__ wrote:
| 1. hash it
|
| 2. rebuild it without the cache
|
| 3. hash that
|
| 4. compare
|
| Or, trust somebody who has. Inconvenient, but is there
| any other way to establish trust in the correspondence
| between code and a binary?
| mbakke wrote:
| Guix has tooling to verify binaries:
|
| https://guix.gnu.org/en/manual/en/html_node/Invoking-
| guix-ch...
|
| "guix build --no-grafts --no-substitutes --check foo"
| will force a local rebuild of package foo and fail if the
| result is not bit-identical. "guix challenge" will
| compare your local binaries against multiple cache
| servers.
|
| I build everything locally and compare my results with
| the official substitute servers from time to time.
| pharmakom wrote:
| You have a hash that n trusted parties agree on. This is
| enabled by reproducible builds.
| mbakke wrote:
| It obviously depends on the hardware, but IIRC for me maybe
| 3-4 hours building from the 357 byte seed to the latest GCC.
|
| The early binaries are not very optimized :-)
| tracnar wrote:
| It's not yet as far as the Guix stage0, but there was an
| interesting talk about bootstrapping nix from TinyCC at NixCon:
| https://media.ccc.de/v/nixcon-2023-34402-bootstrapping-nix-a...
| TacticalCoder wrote:
| That is amazing and it is great to see there are people out
| there fighting the good fight (while others ask: _" but where's
| the benefit!? if there's a backdoor, everybody is still going
| to get the backdoor!"_).
|
| > it gives us a reliable way to verify the binaries we ship are
| faithful to their sources
|
| That's the thing many don't understand: it's not about proving
| that the result is 100% trustable. It's about proving it's 100%
| faithful to the source. Which means that _should_ monkey
| business be detected (like a sneaky backdoor), it can be
| recreated deterministically 100% of the time.
|
| In other words for the bad guys: nowhere to run, nowhere to
| hide.
| somat wrote:
| I find it funny(ironic) that the OpenBSD project is trying hard
| to go the other way, every single install has unique and
| randomized address offsets.
|
| While I understand that these two goals, reproducible builds and
| unique installs, are orthogonal to each other, both can be had at
| the same time, the duality of the situation still makes me laugh.
| oever wrote:
| If the address offsets can be randomized with a provided seed,
| then demonstrating reproducibility is still possible.
|
| Alternatively, randomizing the offsets when starting the
| program is another way to keep reproducibility and even
| increase security; the offsets would change at every run.
| WhyNotHugo wrote:
| OpenBSD does randomised linking at boot time. Packages
| themselves can still be reproducible. All the randomisation is
| done locally after the packages are downloaded and their
| checksums validated.
| KennyFromIT wrote:
| I've lived in the Red Hat ecosystem for work recently. How does
| this compare to something like... Fedora Silverblue? Ansible?
| Fedora Silverblue + Ansible?
| TheDong wrote:
| The closest equivalent to the nixos ISO builder and
| reproducibility related to it in the fedora ecosystem is
| osbuild / imagebuilder -
| https://www.osbuild.org/guides/introduction.html
|
| Imagebuilder claims reproducibility, but as far as I know it
| mostly installed rpm packages as binaries, not from source, so
| it's not really proper reproducibility unless all the input
| packages are also reproducible.
|
| If the descriptions of building packages from source, building
| distro images, and reproducibility in the linked thread didn't
| make sense to you, you're probably not really the target
| audience anyway.
| candiddevmike wrote:
| Nix is a declarative OS, where you describe what the OS should
| look like, instead of Ansible where you give the OS steps to
| follow. Silverblue and Nix are orthogonal aside from being
| Linux distributions--Silverblue is attempting to change how
| software is delivered using only containers on an immutable
| host.
|
| If you're interested in an Ansible alternative that uses
| Jsonnet and state tracking to somewhat mimic Nix, check out
| Etcha: https://etcha.dev
| rgoulter wrote:
| > Nix is a declarative OS
|
| I think precision is important.
|
| "Nix" refers to the package manager (and the language the
| package manager uses).
|
| Whereas it's "NixOS" that's the OS which makes use of Nix to
| manage the system configuration.
| Ericson2314 wrote:
| Thank you. This is important. Too bad our website doesn't
| make it clear at all.
| mbakke wrote:
| Very impressive milestone, congrats to those who made this
| possible!
|
| > [...] actually rebuilding the ISO still introduced differences.
| This was due to some remaining problems in the hydra cache and
| the way the ISO was created.
|
| Can anyone shed some light on the fix for "how the ISO was
| created"? I attempted making a reproducible ISO a while back but
| could not make the file system create extents in a deterministic
| fashion.
| raboof wrote:
| For NixOS, it's in the 'how did we reproduce' section of the
| article: the last step of that process produces the iso in the
| ./result/iso directory.
|
| It sounds like what you're looking for is the commands that
| that build invoked, but I'm not sure what step you're looking
| for. For example, the xorriso invocations are at
| https://github.com/NixOS/nixpkgs/blob/master/nixos/lib/make-...
| ahmedfromtunis wrote:
| Stupid question as I never worked on something like this before:
| why isn't reproducibility the default behavior?
|
| I mean if 2 copies of a piece of software were compiled from the
| same source, what stops them from being identical each and every
| time?
|
| I know there are so many moving parts, but I still can't
| understand how discrepancies can manifest themselves.
| bravetraveler wrote:
| I don't develop enough to give a particularly good answer, but
| one example I've heard of involves timestamps
|
| Imagine the program uses the current date or time as a value.
| When compiled at different moments, the bits change.
|
| Same applies to anything where the build environment or timing
| influences the output binary
| kaba0 wrote:
| Parallelism. There might be actions that are not order-
| independent, and the state of the CPU might result in slightly
| different binaries, but all are correct.
| edgyquant wrote:
| Why does this matter though? Why does order of compilation
| result in a different binary?
| speed_spread wrote:
| Because order of completion of the parallel tasks is not
| guaranteed, if all tasks write to the same file you might
| get a different result each time.
| kaba0 wrote:
| Just some random, made up example: say you want to compile
| an OOP PL that has interfaces and implementations of that.
| You discover reachable implementations through static
| analysis, which is multi-threaded. You might discover
| implementations A,B,C in any order -- but they will get
| their methods placed in the jump table based on this order.
| This will trivially result in semantically equivalent, but
| not binary-equivalent executables.
|
| Of course there would have been better designs for this toy
| example, but binary reproducibility is/was usually not of
| the highest priority historically in most compiler
| infrastructures, and in some cases it might be a relatively
| big performance regression to fix, or simply just a too big
| refactor.
| TacticalCoder wrote:
| > There might be actions that are not order-independent, and
| the state of the CPU might result in slightly different
| binaries, but all are correct.
|
| Well no: that's really the thing reproducible packages are
| showing: there's only _one_ correct binary.
|
| And it's the one that's 100% reproducible.
|
| I'd even say that that's the whole point: there's only _one_
| correct binary.
|
| I'll die on the hill that if different binaries are "all
| correct", then none are: for me they're all useless if
| they're not reproducible.
|
| And it looks like people working on entire _.iso_ being fully
| bit-for-bit reproducible are willing to die on that hill too.
| kaba0 wrote:
| See my reply to the sibling post -- binary reproducibility
| is not the end goal. It is an important property, and I do
| agree that most compiler toolchains should strive for that,
| but e.g. it might not be a priority for, say, a JIT
| compiler.
| Smaug123 wrote:
| Loads of things. Obvious ones where the decision is explicitly
| taken to be non-reproducible include timestamps and authorship
| information. There are also other places where reproducibility
| is implicitly broken by default: e.g. many runtimes don't
| define the order of entries in a hashmap, and then the compiler
| iterates over a hashmap to build the binary.
| londons_explore wrote:
| I can see why devs would want "This Software was built on
| 10/10/2007 by bob7 from git hash aaffaaff" to appear on the
| splash screen of software.
|
| How do you get similar behaviour while having a reproducible
| build?
|
| Can you, for example, have the final binary contain a
| reproducible part, and another section of the elf file for
| deliberately non-reproducible info?
| beisner wrote:
| if you have a reproducible build, then the notion of
| "software was _built on date_ by _user_ " is kind of
| useless information, no? Because it does not matter - if
| you can verify that a specific git hash of a codebase
| results in a particular binary through reproducible builds,
| a malicious adversary could have built it yesterday and
| given it to me and i can be almost surely confident
| (barring hash-collisions...) it's identical to a known
| trusted team member building it.
|
| Having information about which git has was used, as well as
| the time it was published, is part of the source
| distribution so an output can contain references to these
| inputs and still be deterministic w.r.t. those inputs.
|
| If you REALLY want to know when/who built something, you
| could add in an auxiliary source file which contains that
| information, which is required to build. Which is
| essentially what compilers which leverage current time do
| anyway, it's just implicit.
| ndriscoll wrote:
| Your source would also have to have a reference to which
| exact version of which compiler to use, which versions of
| which external headers to use, etc. and now you're
| inventing Nix.
|
| Conceivably there could be a standard for a sidecar file
| to specify how something was built (e.g. nixpkgs commit
| hash, or all of the parameters that went into the build).
| Or content address the inputs, i.e. invent Nix again.
|
| So we could solve this problem by having everyone
| standardize on using Nix.
| Smaug123 wrote:
| Such standards do exist:
| https://slsa.dev/spec/v1.0/provenance
| londons_explore wrote:
| The usecase is: user wants an easy way to know, from the
| GUI of some running software, exactly what
| build/version/git commit/branch/date they're running -
| perhaps to file a bug report for example.
|
| The actual _build date_ doesn 't matter if the software
| is reproducible - but its a proxy for 'how out of date is
| this software'.
| tripdout wrote:
| In that case, you can report the Git SHA and still be
| reproducible.
| tikhonj wrote:
| If you actually had reproducible builds, the _build date_
| would not tell you anything about how out of date the
| software is--you would only need the date of the source
| code the binary was built from. By definition, the binary
| you 'd get from building a version of the source today
| would be identical to the version you'd get building it
| the day that version of the source was finished.
| Smaug123 wrote:
| Yeah, "who built this" information belongs in a signing
| certificate that accompanies the build artefact, not in the
| artefact itself. The Git hash can certainly appear in the
| binary (it's a reproducible part of the build input), and
| the date can instead be e.g. the commit date, which is
| probably more relevant to a user anyway.
| bloak wrote:
| Much as I like Git, I'm not sure I like the idea of the
| artefacts depending on the git commit and therefore on
| the entire git history. I rather feel the artefacts
| should only depend on the actual source and not on a
| particular version control system used for storing the
| source.
| dwheeler wrote:
| There are many specific causes, time stamps probably being the
| most common issue. You can see a list of common issues here:
|
| https://reproducible-builds.org/docs/
|
| The main overall issue is that developers don't test to ensure
| they reproduce. Once it's part of the release tests it tends to
| stay reproducible.
| acaloiar wrote:
| I agree, although I wouldn't describe the overall issue as
| developers not testing to ensure reproducibility. The reason
| most builds aren't reproducible is that build reproducibility
| isn't a goal for most projects.
|
| It would be great if 100% of builds were reproducible, but I
| don't believe developers shouldn't be testing for
| reproducibility unless it's a defined goal.
|
| As generalized reproducible build tooling (guix, nix, etc.)
| becomes more mainstream, I imagine we'll see more
| reproducible builds as adoption grows and reproducibility is
| no longer something developers have to "check for", but
| simply rely upon from their tooling.
| acaloiar wrote:
| Typo: I don't believe developers shouldn't be -> I don't
| believe developers should be
| dataflow wrote:
| Sometimes it's randomized algorithms, sometimes it's
| performance (e.g. it might be faster not to sort something),
| sometimes it's time or environment-dependent metadata,
| sometimes it's thread interleaving, etc.
| drdrey wrote:
| a very common one is pointer values being different from run
| to run and across different operating systems. Any code that
| intentionally or accidentally relies on pointer values will
| be non-deterministic
| dataflow wrote:
| Would be nice if you could explain how/why this happens,
| given that normally, pointers aren't persisted.
| traxys wrote:
| I think they meant if you cast a pointer to an integer,
| do some math on that and then store that. Then you will a
| stored result that will likely differ from run to run
| edgyquant wrote:
| That sounds like runtime differences not a difference
| between two binaries
| stcg wrote:
| The difference in binaries must be caused by some runtime
| difference of a compiler.
| someplaceguy wrote:
| Languages such as Standard ML and others (Scheme? Lisp?
| Not sure...) have implementations that can save the
| current state of the heap into a binary.
|
| This is used in theorem provers, for example, so that you
| don't have to verify proofs of theorems over and over
| again (which can be very slow).
|
| Instead, you verify them once, save the state of the heap
| to disk (as a binary ELF, for instance) and then you can
| run the binary to continue exactly where you left off
| (i.e. with all the interesting theorems already in
| memory, in a proved state).
|
| This is what the HOL4 theorem prover's main `hol` script
| does, i.e. it runs HOL4 by loading such a memory state
| from disk, with the core theories and theorems already
| loaded.
|
| Presumably, to make this reproducible you'd need to make
| sure that all the memory objects are saved to disk in a
| deterministic order somehow (e.g. not in memory address
| order, as it can change from run to run, especially when
| using multiple threads).
|
| Edit: Presumably you'd also need to make sure that you
| persist the heap when all threads are idle and in a known
| state (e.g. with all timers stopped), to avoid random
| stack states and extraneous temporary allocations from
| being persisted, which would also affect the resulting
| binary.
| dataflow wrote:
| Thanks, yeah. So I guess the concrete example I would
| cite here is that the most natural (and most efficient?)
| way of persisting std::map<ptr, ....> would introduce
| pointer ordering into the output.
| someplaceguy wrote:
| Just like the most natural (and most efficient?) way of
| persisting any std::unordered_map<...> can result in a
| completely randomly-ordered output, due to a DoS
| mitigation that some commonly-used language runtimes
| have.
| edgyquant wrote:
| That's runtime behavior
| mseepgood wrote:
| Laziness and carelessness of compiler developers.
| jonhohle wrote:
| As others have mentioned, there's sorting issues (are
| directory entries created in the same order for a project
| that compiled everything in a directory?), timestamps
| (archive files and many other formats embed timestamps), and
| things that you really want to be random (tmpdir on Linux [at
| least in the past] would create directories of varying
| length).
|
| I've successfully built tools to compare Java JARs that
| required getting around two of those and other test tools
| that required the third. I'm sure there are more.
| cpuguy83 wrote:
| Here is a very recent post from the Go team on things they had
| to do to make the Go toolchain fully reproducible.
|
| https://go.dev/blog/rebuild
| fooker wrote:
| A surprising amount of compiler and program behavior depends on
| how pointer values compare.
|
| These comparisons don't have to go the same way for everything
| to be correct.
| mgaunard wrote:
| Don't you have to fake the system time to do this? The time often
| ends up inside the binaries one way or another.
| mbakke wrote:
| Indeed time stamps are probably the most common sources of
| indeterminism. So common that a de-facto standard variable to
| fake a timestamp has been implemented in many compilers:
|
| https://reproducible-builds.org/docs/source-date-epoch/
| amelius wrote:
| Could you name an example of how (and for what reason) this
| might happen?
| mbakke wrote:
| Typically part of a "version string": $
| python3 Python 3.10.7 (main, Jan 1 1970, 00:00:01)
| [GCC 11.3.0] on linux Type "help", "copyright",
| "credits" or "license" for more information. >>>
|
| Perhaps a relic from when software had to be manually
| updated?
| oever wrote:
| On NixOS, I think the release time or commit time is used:
| $ python3 Python 3.10.11 (main, Apr 4 2023,
| 22:10:32) [GCC 12.2.0] on linux Type "help",
| "copyright", "credits" or "license" for more information.
| >>>
|
| That is more useful than the build time.
| mbakke wrote:
| How is that possible? Is nixpkgs an input to the Python
| derivation? Or do packagers "hard code" a value every
| time they modify the Python build code? Automated tooling
| that sets it after pull requests? Something else? :-)
| Smaug123 wrote:
| GCC respects SOURCE_DATE_EPOCH, and Nixpkgs has specific
| support for setting that environment variable: https://gi
| thub.com/NixOS/nixpkgs/blob/92fdbd284c262f3e478033...
| (although I haven't proved that this is actually how it
| works for cpython's build).
|
| Irrelevant spelunking details follow:
|
| That string is output by cpython to contain the contents
| of the __DATE__ C macro (https://github.com/python/cpytho
| n/blob/fa35b9e89b2e207fc8bae... which calls to https://gi
| thub.com/python/cpython/blob/fa35b9e89b2e207fc8bae...
| which uses the __DATE__ macro at https://github.com/pytho
| n/cpython/blob/fa35b9e89b2e207fc8bae... ).
|
| Cpython is defined in nixpkgs at https://github.com/NixOS
| /nixpkgs/blob/92fdbd284c262f3e478033... which I imagine
| (but haven't proved) uses GCC.
| mbakke wrote:
| Thank you! Setting SOURCE_DATE_EPOCH to the most recent
| file timestamp found in the source input is a clever
| hack.
| oever wrote:
| 2023-04-04T22:10:32 is the timestamp of
| Python-3.10.11/Misc/NEWS from https://www.python.org/ftp/
| python/3.10.11/Python-3.10.11.tar...
| raboof wrote:
| The source for the cpython build is the release tarball (
| https://github.com/NixOS/nixpkgs/blob/master/pkgs/develop
| men...).
|
| In that case, NixOS sets SOURCE_DATE_EPOCH (which I
| suspect will be picked up by the python build) to the
| latest timestamp found in that archive
| (https://github.com/NixOS/nixpkgs/blob/master/pkgs/build-
| supp...)
| mgaunard wrote:
| GCC embeds timestamps in o/gcno/gcda files to check they
| match.
|
| It's mostly annoying as gcov will actively prevent you from
| using gcda files from a different but equivalent binary than
| what generated the gcno.
| Uptrenda wrote:
| Wouldn't this help solve the problem Ken Thompson wrote about in
| 'reflections on trusting trust?' If you can fully bootstrap a
| system from source code then it's harder to have things like
| back-doored compilers.
| Crontab wrote:
| I love that there are people out there who cares about things
| like this.
___________________________________________________________________
(page generated 2023-10-29 23:00 UTC)