[HN Gopher] Nixos-unstable's ISO_minimal.x86_64-Linux is 100% re...
___________________________________________________________________
Nixos-unstable's ISO_minimal.x86_64-Linux is 100% reproducible
Author : todsacerdoti
Score : 735 points
Date : 2021-06-20 20:01 UTC (1 days ago)
(HTM) web link (discourse.nixos.org)
(TXT) w3m dump (discourse.nixos.org)
| avalys wrote:
| Can anyone comment on the significance of this accomplishment,
| and why it was hard to achieve before?
|
| I (naively, apparently) assumed this had been possible with open-
| source toolchains for a long time.
| peterkelly wrote:
| For some reason, many compilers and build scripts have
| traditionally been written in a way that's not referentially
| transparent (a pure function from input to output). Unnecessary
| information like the time of the build, absolute path names of
| sources and intermediate files, usernames and hostnames often
| would find their way into build outputs. Compiling the same
| source on different machines or at different times would yield
| different results.
|
| Reproducible builds avoid all this and always produce the same
| outputs given the same inputs. There's no good reason (that I
| can think of) why this shouldn't have been the case all along,
| but for a long time I guess it just wasn't seen as a priority.
|
| The benefit of reproducible builds is that it's possible to
| verify that a distributed binary was definitely compiled from
| known source files and hasn't been tampered with, because you
| can recompile the program yourself and check that the result
| matches the binary distribution.
| otabdeveloper4 wrote:
| > The benefit of reproducible builds is that it's possible to
| verify that a distributed binary was definitely compiled from
| known source files and hasn't been tampered with, because you
| can recompile the program yourself and check that the result
| matches the binary distribution.
|
| It's not just security. If a hash of the input sources maps
| directly to a hash of the output binaries, then you can
| automatically cache build artefacts by hash tag and get huge
| speedups when compiling stuff from scratch.
|
| This was the primary motivation for Nix, since Nix does a
| whole lot of building from scratch and caching.
| kohlerm wrote:
| I agree being able to support distributed caching of
| results is one of the major benefits.
| dane-pgp wrote:
| > There's no good reason (that I can think of) why this
| shouldn't have been the case all along
|
| Well, it's not like developers consciously thought "How can I
| make my build process as non-deterministic as possible?",
| it's just that by the time people started to become aware of
| the benefits of reproducibility, various forms of non-
| determinism had already crept in.
|
| For example, someone writing an archiving tool would be
| completely right to think it is a useful feature to store the
| creation date of the archive in the archive's metadata. The
| idea that a user might want to force this value to instead be
| some fixed constant would only occur to someone later when
| they noticed that their packages were non-reproducible
| because of this.
|
| But you're right; if the goal had been thought of from the
| start, there's no reason why every build tool wouldn't have
| supported this.
| russfink wrote:
| Thank you both. I was wondering the same thing.
| xyzzy_plugh wrote:
| There's a lot of problems with reproducible builds. Filesystem
| paths, timestamps, deterministic build order to say the least.
| This is a pretty great achievement and I'm looking forward to a
| non-minimal stable ISO.
| bombcar wrote:
| Yeah even the "gcc compiled Jan 23, 2021 at 11:23AM" messages
| you often see breaks deterministic builds.
| twisrkrr wrote:
| The code has to be changed so that things like system specific
| paths, time of compilation, hardware, etc. Don't cause the
| compiled program to be unique to that computer (meaning
| compiling the same code on a different computer will give you a
| file that still works but has a different md5 hash)
|
| By being able to reproduce the file completely, down to
| identical md5 hashes, you know you have the same file the
| creator has, and know with certainty that the file has not been
| tampered with
| secondcoming wrote:
| Does this mean that the code cannot be built with CPU
| specific optimisations (march option with gcc)
| Avamander wrote:
| Pretty much. But hopefully x86_64 feature levels will
| provide the benefits of native builds to a reasonable
| extent.
| Denvercoder9 wrote:
| The software doesn't suddenly become incompatible with CPU-
| specific optimisations (or many other compiler flags that
| change its output), but if you do so, you won't be able to
| reproduce the distribution binaries. Distributions don't
| enable CPU-specific optimisations anyway, since they want
| to be usable on more than one CPU model.
| pas wrote:
| Likely it means that with the same input arguments the end
| result is bit-by-bit identical. (As I understand the
| problems were hard to control output elements. So it was
| not enough to se the same args, set the same time, and use
| the same path and filesystem, because there were things
| that happened at different speeds, so they ended up
| happening at relative different elapsed times, so the
| outputs contained different timestamps, etc.)
| clhodapp wrote:
| No, just that you need to avoid naively conflating the
| machine that is doing the compilation with the one that
| optimization is being performed for.
|
| Concretely, you would need to keep track of and reproduce
| e.g. the march flag value as a part of your build input. If
| you wanted to optimize for multiple architectures, that
| would mean separate builds or a larger binary with function
| multi-versioning.
| maartenh wrote:
| Nixpkgs contains the build / patch instructions for any
| packages in NixOS.
|
| If you want to compile any piece of software available in
| Nixpkgs, you can override it's attributes (inputs used to
| build it).
|
| One can trivially have an almost identical operation system
| to your colleagues install, but override just one package
| to enable optimisations for a certain cpu. This would
| however imply that you'd lose the transparent binary cache
| that you could otherwise use.
|
| Exactly this method is used to configure the entire
| operating install! Your OS install is just another package
| that has some custom inputs set.
| danbst wrote:
| Just recently, there were large non-reproducible projects:
| python, gcc. Not sure where is the history of non-r13y.
|
| ---
|
| There is Debian initiative to create bit-to-bit reproducible
| builds for all their software (well, all critical).
|
| https://reproducible-builds.org/
|
| R13y is akin to "computer proofs" in math -- if you don't have
| it, that's fine, but if you have it, that's awesome.
|
| There are practical reasons to favor reproducibility too, but
| those are more for distro maintainers.
|
| The fact that NixOS (not Debian) got this 100% is mostly
| because
|
| - minimal image has a small subset of packages
| (https://hydra.nixos.org/build/146009592#tabs-build-deps)
|
| - Nix tooling was created 15 years ago *exactly* for this, Nix
| is mad to make packages bit-to-bit rebuildable from scratch.
|
| - Nix/Nixpkgs is growing in number of maintainers and got more
| funds
|
| - Nix has fewer Docker/Snap pragmatics
| dataflow wrote:
| > There's no good reason (that I can think of) why this
| shouldn't have been the case all along
|
| Determinism can decrease performance dramatically. Like
| concatenating items (say, object files into a library) in
| order is clearly more expensive in both time & space than
| processing them out of order. One requires you to store
| everything in memory and then sort them before you start
| doing any work, whereas the other one lets you do your work
| in a streaming fashion. Enforcing determinism can turn an
| O(1)-space/O(n)-time algorithm into an O(n)-space/O(n log
| n)-time one, increasing latency _and_ decreasing throughput.
| You wouldn 't take a performance hit like that without a good
| reason to justify it.
| Foxboron wrote:
| >- Nix tooling was created 15 years ago _exactly_ for this,
| Nix is mad to make packages bit-to-bit rebuildable from
| scratch.
|
| I don't think this is accurate?
|
| Nix is about reproducing system behaviour, largely by
| capturing the dependency graph and replaying the build. But
| this doesn't entail bit-for-bit identical binaries. It's very
| much sits in the same group such as Docker and similar
| technologies. This is also how I read the original thesis
| from Eelco[0].
|
| And well, claims like this always rubs me the wrong way since
| nixos only really started using the word "reproducible
| builds" after Debian started their efforts in 2015-2016[1],
| and started their reproducible builds effort later. It also
| muddies the language since people are now talking about
| "reproducible builds" in terms of system behavior as well as
| bit-for-bit identical builds. The result has been that people
| talk about "verifiable builds" instead.
|
| [0]: https://edolstra.github.io/pubs/phd-thesis.pdf
|
| [1]: https://github.com/NixOS/nixpkgs/issues/9731
| infogulch wrote:
| Being bit-for-bit reproduceable means you could do fun things
| like distribute packages as just sources and a big blob of
| signatures, and you can still run only signed binaries.
| mananaysiempre wrote:
| The GCC developers in particular were hostile to such efforts
| for a long time, IIRC. (This is a non-trivial issue because
| randomized data structures exist and can be a good idea to use:
| treaps, universal hashes, etc. I'd guess it also pays for
| compiler heuristics to be randomized sometimes. Incremental
| compilation is much harder to achieve when you require bit-for-
| bit identical output. Even just stripping your compile paths
| from debug info is not entirely straightforward.)
| pas wrote:
| How/why was the randomness part not "solveable" via using
| fixed seeds?
| bruce343434 wrote:
| the security benefit of things like stack canaries rest on
| them being random and not known beforehand, I guess.
| Otherwise stack smashing malware could know to avoid them.
| mananaysiempre wrote:
| Wait, how is that relevant? Nothing says stack canaries
| have to use the same RNG as the main program, let alone
| the same seed, and there are cases such as this one where
| they probably shouldn't, so it makes sense to separate
| them.
| adonovan wrote:
| GCC used to attempt certain optimizations (or more generally,
| choose different code-generation strategies) only if there
| was plenty of memory available. We discovered this in the
| course of designing Google's internal build system, which
| prizes reproducibility.
| moonchild wrote:
| > Incremental compilation is much harder to achieve when you
| require bit-for-bit identical output
|
| Presumably, incremental compilation is only for development.
| For release, you would do a clean build, which would be
| reproducible.
|
| > Even just stripping your compile paths from debug info is
| not entirely straightforward
|
| Just use the same paths.
| mananaysiempre wrote:
| > Presumably, incremental compilation is only for
| development. For release, you would do a clean build, which
| would be reproducible.
|
| I'd say that's exactly the wrong approach: given how hard
| incremental anything is, it would make sense to insist on
| bit-exact output and then fuzz the everliving crap out of
| it until bit-exactness was reached. (The GCC maintainers do
| not agree.) But yes, you could do that. It's not impossible
| to do reproducible builds with GCC 4.7 or whatever, it's
| just intensely unpleasant, especially as a distro
| maintainer faced with yet another bespoke build system.
| (Saying that with all the self-awareness of a person making
| their own build system.)
|
| > Just use the same paths.
|
| I mean, sure, but then you have to build _and_ debug in a
| chroot and waste half a day of your life figuring out how
| to do that and just generally feel stupid. And your debug
| info is still useless to anybody not using the exact same
| setup. Can't we just embed _relative_ paths instead, or
| even arbitrary prefixes it the code is coming from more
| than one place? In recent GCC versions we can, just chuck
| the right incantation into CPPFLAGS and you're golden.
|
| All of this is not really difficult except insofar as
| getting a large and complicated program to do anything is
| difficult. (Stares in the direction of the 17-year-old
| Firefox bug for XDG basedir support.) That's why I said it
| wasn't a GCC problem so much as a maintainer attitude
| problem.
| [deleted]
| [deleted]
| [deleted]
| Arnavion wrote:
| Is there a list of the 1486 packages in the minimal ISO?
| danbst wrote:
| https://hydra.nixos.org/build/146009592#tabs-build-deps
| taviso wrote:
| I don't see a single comment doubting the value of
| reproducibility, so I'll be the resident skeptic :)
|
| I think build reproducibility is a cargo cult. The website says
| reproducibility can reduce the risk of developers being
| threatened or bribed to backdoor their software, but that is just
| ridiculous. Developers have a perfect method for making their own
| software malicious: bugdoors. A bugdoor (bug + backdoor) is a
| deliberately introduced "vulnerability" that the vendor can
| "exploit" when they want backdoor access. If the bug is ever
| discovered you simply issue a patch and say it was a mistake,
| it's perfectly deniable. It's not unusual for major vendors to
| patch critical vulnerabilities every month, there is zero penalty
| for doing this.
|
| The existence of bugdoors means you _have_ to trust the vendor
| who provided the source code, there is no way around this.
|
| You have to trust the developer, but in theory, reproducible
| builds could be used to convince yourself their build server
| hasn't been hacked. This isn't really necessary or useful, you
| can already produce a trustworthy binary by just building the
| source code yourself. You still have to trust the vendor to keep
| hackers off everything else though!
|
| Okay, but building software is tedious, and for some reason you
| are particularly concerned about build servers being hacked.
| Perhaps you will nominate a dozen different organization that
| will all build the code, and make this a consensus system. If
| they all agree, then you can be sure enough the binaries were
| built with a trustworthy toolchain. A modest improvement in
| theory, but that introduces a whole bunch of new crazy problems.
|
| You can't just pick one or two consensus servers, because then an
| attacker can stop you getting updates by compromising any one of
| them. You will have to do something like choose a lot of servers,
| and only require 51% to agree.
|
| Now, imagine a contentious update like a adopting a
| cryptocurrency fork, or switching to systemd (haha). If the
| server operators rebel, they can effectively veto a change the
| vendor wants to make. Perhaps vendors will implement a killswitch
| that allows them to have the final say, or perhaps they operate
| all the consensus build servers themselves.
|
| The problem is now you've either just replaced build servers with
| killswitches, or just replicated the same potentially-compromised
| buildserver.
|
| I wrote a blog post about this a while ago, although I should
| update it at some point.
|
| https://blog.cmpxchg8b.com/2020/07/you-dont-need-reproducibl...
| pabs3 wrote:
| There are other solutions to the problem of trusting
| maintainers; namely incremental distributed code review. The
| Rust folks are working on that:
|
| https://github.com/crev-dev/
|
| You still need Reproducible Builds and Bootstrappable Builds
| even if you have a fully reviewed codebase though.
| [deleted]
| nixpulvis wrote:
| You'll never beat the source.
|
| IIUC Reproducible Builds guarantees that source is turned into
| an artifact in a consistent and unchanging way. So as long as
| the source doesn't change neither will the build.
| taviso wrote:
| I don't really understand what you're saying.
|
| If you're saying "reproducible builds are reproducible", then
| that is obviously true, but the question is what is the
| benefit?
|
| Some people claim that the benefit is that there will be less
| incentive to threaten developers with violence, and I'm
| saying that's nonsense. If you cut through the nonsense,
| there are some modest claims that are true, but doing
| reproducible builds properly is very complicated and the
| benefit is negligible.
| nixpulvis wrote:
| > ... the question is what is the benefit?
|
| I don't think I should have to explain this. It has nothing
| directly to do with violence against developers, that's
| taking _many_ leaps.
|
| It simply gives you what you expect, which is kinda the
| basis of safety and security.
| taviso wrote:
| > It has nothing directly to do with violence against
| developers, that's taking many leaps.
|
| It is literally the very first claim on the front page of
| https://reproducible-builds.org.
| nixpulvis wrote:
| That's just a bunch of marketing hype. I'm trying to stay
| focused closer to the matters at hand.
|
| Perhaps my rambling on Development vs Distribution is
| relevant to the discussion?
| https://nixpulvis.com/ramblings/2021-02-02-signing-and-
| notar...
| EE84M3i wrote:
| Honestly I think the biggest benefit of reproducibility is just
| debuggability. We both check out the same git repo and build
| it, we can later hash the binary and compare the hashes to know
| we're running the exact same code.
|
| On security, if you really care about compromised build servers
| you might as well just build from source yourself. I think
| reproducibility might matter most in systems where side loading
| is hard/impossible like app stores, but I'm not familiar with
| the current state of the art in terms of iOS reproducable
| builds and checking them.
| theon144 wrote:
| >Developers have a perfect method for making their own software
| malicious: bugdoors.
|
| I think rather than malicious developers the focus is on
| malicious build machines. How many things are built solely via
| CI these days, on machines that nobody has ever seen, using
| docker images that nobody has validated?
|
| It's much easier to imagine a malicious provider (as in
| Sourceforge bundling in adware) than malicious developers, I
| think.
|
| But yes, you're right that reproducible builds don't remove the
| need to trust the source.
|
| >You have to trust the developer, but in theory, reproducible
| builds could be used to convince yourself their build server
| hasn't been hacked. This isn't really necessary or useful, you
| can already produce a trustworthy binary by just building the
| source code yourself.
|
| This is pretty much all false though - not only the "just"
| part, as setting up a proper build environment is pretty non-
| trivial for many projects, and building _everything_ from
| source is a task only the most dedicated Gentoomen would take
| up; you can also think of reproducible builds as a "litmus
| test". If you can, with reasonable accuracy, check whether a
| build machine is compromised at any time, you have a much
| greater base on which to trust it and its outputs. The benefits
| of having build machines probably shouldn't need explaining.
|
| >You can't just pick one or two consensus servers, because then
| an attacker can stop you getting updates by compromising any
| one of them. You will have to do something like choose a lot of
| servers, and only require 51% to agree.
|
| >...
|
| >The problem is now you've either just replaced build servers
| with killswitches, or just replicated the same potentially-
| compromised buildserver.
|
| I really don't understand this argument; compromised
| infrastructure probably shouldn't be a regular occurrence, and
| even if so, automated killswitches seem like the vastly more
| preferable option, no?
| taviso wrote:
| > I really don't understand this argument; compromised
| infrastructure probably shouldn't be a regular occurrence,
| and even if so, automated killswitches seem like the vastly
| more preferable option, no?
|
| I'm pointing out how complex implementing reproducible builds
| is. It introduces a bunch of really hard unsolved problems
| that people are very handwavy about.
|
| Who will do the reproducing? You say that users won't be able
| to do it. That makes sense, because if they could, then
| reproducible builds would be useless! However, you also say
| they will be able to check if a build server is compromised
| at any time. In order for both of those claims to be true we
| will have to design and build a complex consensus system
| operated by mutually untrusted volunteers. That's really
| hard, and seems like it provides a pretty negligible benefit.
| danieldk wrote:
| _I think build reproducibility is a cargo cult._
|
| Most people here are debating you on the security angle, but in
| the case of Nix (and Guix) there is another important angle -
| reproducible builds make a content-addressed store possible.
|
| In Nix, the store is traditionally addressed by the hash of the
| derivation (the recipe that builds the package). For example,
| _lr96h..._ in the path
| /nix/store/lr96h3dlny8aiba9p3rmxcxfda0ijj08-coreutils-8.32
|
| is the hash of the (normalized) derivation that was used to
| build coreutils. Since the derivation includes build inputs,
| either changing the derivation for coreutils itself or one of
| its inputs (dependencies) results in a different hash and a
| rebuild of coreutils.
|
| This also means that if somebody changes the derivation of
| coreutils _every_ package that depends on coreutils will be
| rebuilt, even if this change does not result in a different
| output path (compiled package).
|
| This is being addressed by the new work on the content-
| addressed Nix store (although content-addressing aws already
| discussed in Eelco Dolstra's PhD thesis about Nix). In the
| content-addressed store, the hash in the path, such as the on
| above is a hash of the output path (the built package), rather
| than a hash of the normalized derivation. This means that if
| the derivation of coreutils is changed in such a way that it
| does not change the output path, none of the packages that
| depend on coreutils are rebuilt.
|
| However, this only works reliably with reproducible builds,
| because if there is non-determinism in the build, how do you
| know whether a change in the output path is changed as a result
| of changing a derivation or as a result of uninteresting non-
| determinisms (the output hash would change in both cases).
| taviso wrote:
| I don't really have any complaints about using deterministic
| builds for non-security reasons, but the number one claim
| most proponents make is that it somehow prevents backdoors.
| Literally the first claim on reproducible-builds.org is that
| build determinism will prevent threats of violence and
| blackmail.
| londons_explore wrote:
| Where the dependency chain is long, this substantially
| reduces build work during development too.
|
| I'd guess that more than half of the invocations of gcc done
| by Make for example end up producing the exact same bit for
| bit output as some previous invocation.
| taviso wrote:
| I would point out that is literally what ccache (and Google
| goma) does, but doesn't require deterministic builds.
| Instead, it records hashes of preprocessed input and
| compiler commandlines.
|
| They don't make any security claims about this, it's just
| for speeding up builds.
| Ericson2314 wrote:
| What we currently do --- hashing inputs --- is the same
| ccache way. We just don't yet sandbox with the
| granularity yet.
|
| What we want to id hash _outputs_. Say I replace 1 + 2
| with 0 + 3. That will cause ccache to rebuild. We don 't
| want downstream stuff to also be rebuilt. C-linking
| withing a package is nice in parallelizable, but in the
| general case there is more dependency chains and now that
| sort of thing starts to matter.
| cesarb wrote:
| > What isn't clear is what benefit the reproducibility
| provides. The only way to verify that the untrusted binary is
| bit-for-bit identical to the binary that would be produced by
| building the source code, is to produce your own trusted binary
| first and then compare it. At that point you already have a
| trusted binary you can use, so what value did reproducible
| builds provide?
|
| That's not the interesting case. The interesting case is when
| the untrusted binary _doesn 't_ match the binary produced by
| building the source code. Assuming that the untrusted binary
| has been signed by its build system, you now have proof that
| the build system is misbehaving. And that proof can be
| distributed and reproduced by everyone else.
|
| Once Debian is fully reproducible, I expect several
| organizations (universities, other Linux distribution vendors,
| governments, etc) to silently rebuild every single Debian
| package, and compare the result with the Debian binaries; if
| they find any mismatch, they can announce it publicly (with
| proof), so that the whole world (starting with the Debian
| project itself) will know that there's something wrong. This
| does not need any complicated consensus mechanism.
|
| > More often, attackers want signing keys so they can sign
| their own binaries, steal proprietary source code, inject
| malicious code into source code tarballs, or malicious patches
| into source repositories.
|
| In Debian, compromising the build server is not enough to
| inject malicious code into source code tarballs or patches,
| since the source code is also signed by the package maintainer.
| Unexpected changes on which maintainer signed the source code
| for a given package could be flagged as suspicious.
|
| The only attack left from that list, at least for Debian, would
| be for the attacker to sign their own top-level Release file
| (on Debian, individual packages are not signed, instead a file
| containing the hash of a file containing the hash of the
| package is what is signed). But the attacker cannot distribute
| the resulting compromised packages to everyone, since those who
| rebuild and compare every package would notice it not matching
| the corresponding source code, and warn everyone else.
| goodpoint wrote:
| > I expect several organizations (universities, other Linux
| distribution vendors, governments, etc) to silently rebuild
| every single Debian package, and compare the result with the
| Debian binaries
|
| This has been happening for many years. A lot of large
| companies that care about security and maintainability sign
| big contracts to with tech companies that often include
| indemnification.
| dane-pgp wrote:
| > If the server operators rebel, they can effectively veto a
| change the vendor wants to make.
|
| How often do you think there will be a change so controversial
| that teams who have volunteered to secure the update system
| will start effectively carrying out a Denial of Service attack
| against all the users of that distro?
|
| We also have to imagine that these malicious attestation nodes
| can easily be ignored by users just updating a config file, so
| the only thing the node operators could achieve by boycotting
| the attestation process is temporarily inconveniencing people
| who used to rely on them (which is not a great return on
| investment for the reputation they burn in doing this).
| taviso wrote:
| I don't know what reputation damage will happen, they're just
| third parties compiling code. There is no reputational damage
| for operating a malicious tor exit relay, why would this be
| different?
| dane-pgp wrote:
| As I understand it, Tor does have a way of detecting
| whether an exit node is failing to connect users to their
| intended destination. (With TLS enforced, the only thing a
| malicious exit node could do is prevent valid connections).
|
| In any case, I don't think anyone is proposing that the
| attestation nodes be run by random anonymous people on the
| internet. It would make more sense to have half a dozen or
| so teams running these nodes, with each team being known
| and trusted by the distro in question.
|
| I'm not sure what the costs/requirements would be for
| running one of these nodes, but it might be possible for
| distros to each run a node dedicated to building each
| other's distros (or at least the packages that are pushed
| as security updates to stable releases).
|
| Alternatively, individual developers that already work on a
| distro can offer to build packages on their own machines
| and contribute signed hashes to a log maintained by the
| distro itself.
| taviso wrote:
| The point was that a reproducible build _doesn 't_ mean
| you don't have to trust the developer.
|
| Build servers rebelling was just an example of the
| additional complexities and attacks that it introduces,
| for very negligible benefit.
| smoldesu wrote:
| Reproducability is an option to mitigate backdoors and
| incentive developers to operate openly. It's no panacea, but it
| makes a lot of sense in open-source projects where individual
| actors are going to represent your largest threat vector. That
| way, it becomes a lot harder to push an infected blob to main,
| even if it still is _technically_ possible. Hashes are also
| "technically pointless", but we still implement them liberally
| to quickly account for data integrity.
| taviso wrote:
| Signatures are not technically pointless, they mean you only
| have to trust the developer - not the mirror operators.
|
| Reproducibility _is_ technically pointless, because you still
| have to trust the developer, and they can still add
| backdoors.
| dane-pgp wrote:
| Reproducibility means you don't have to worry that the
| developer might have a backdoored toolchain (which also
| means that they can't _pretend_ that a malicious toolchain
| added the malicious code without their knowledge).
|
| A talented developer might still be able to create a
| bugdoor which gets past code review, but that takes more
| effort and skill than just putting the malicious code into
| a local checkout and then saying "How did that get there?".
| taviso wrote:
| Every major vendor has vulnerabilities introduced all the
| time, by accident! No talent is necessary to introduce a
| bugdoor, just malice.
|
| You can already verify that a toolchain wasn't backdoored
| today, reproducible builds aren't necessary for that.
| wildfire wrote:
| > You can already verify that a > toolchain wasn't
| backdoored ) > today
|
| How, exactly?
|
| If we both compiled hello.c (a prototypical hello world
| program), and exchanged binaries; how would you verify my
| build wasn't malicious?
| taviso wrote:
| I think the workflow you're proposing is to take some
| trusted source code, then compile it to make a trusted
| binary. Now compare the trusted binary to the untrusted
| binary provided by the vendor - If they're the same -
| then it must have been made by an uncompromised
| toolchain.
|
| That does require reproducible builds, but here is how to
| do it without reproducible builds:
|
| Take the trusted source code, then compile it to make a
| trusted binary. Now put the untrusted binary in the
| trash, cause you already have a trusted binary :)
| squiggleblaz wrote:
| How about if the system will only run signed builds?
| Couldn't you use it to verify the signed build by
| stripping the signature and comparing them?
| sayrer wrote:
| Is it technically pointless if you view it as a check on
| your own build, rather than a check on the work of others?
|
| You are obviously familiar with Bazel/Blaze etc. Wouldn't
| reproducibility be necessary for those systems to work well
| most of the time? I can think of exceptions (like PGO), but
| it seems useful to produce at least some binaries this way.
| Also covered in this:
| https://security.googleblog.com/2021/06/introducing-slsa-
| end...
| taviso wrote:
| > Is it technically pointless if you view it as a check
| on your own build, rather than a check on the work of
| others?
|
| That depends, I think it's difficult and mostly still
| pointless. I wrote about this a bit in the blog post I
| linked to. It's a big trade off, for questionable
| benefit.
|
| > Wouldn't reproducibility be necessary for those systems
| to work well most of the time?
|
| Yes, there are definitely some good non-security reasons
| to want deterministic builds. My gripe is only with the
| security arguments, like claims it can reduce threats of
| violence against developers (!?!).
| franga2000 wrote:
| > Reproducibility is technically pointless, because you
| still have to trust the developer, and they can still add
| backdoors.
|
| Builder != developer - and with reproducible builds, you no
| longer beed to trust the builder. CI is commonly used for
| the final distributable builds and you can't always trust
| the CI server. Even if you do, many rely on third party
| thingd like docker images - if the base build image gets
| compromised, code could trivially be injected into builds
| running on it and without reproducible builds, that would
| not be detectable.
|
| As a developer, it would be quite reassuring to build my
| binary (which I already do for testing) and compare the
| hash with the one from the CI server to confirm nothing has
| been tampered with. As a bonus, distro maintainers who have
| their own CI can also check against my hashes to verify
| their build systems aren't doing something fishy (malicious
| or otherwise).
| taviso wrote:
| > As a developer, it would be quite reassuring to build
| my binary (which I already do for testing) and compare
| the hash with the one from the CI server to confirm
| nothing has been tampered with.
|
| That makes sense! However, this is not a good argument
| for reproducible builds, because you can already do that
| today.
|
| You already have to build a trusted binary locally for
| testing right? You're dreaming of being able to compare
| that against the untrusted binary so that you can make
| sure it's a trusted binary too - but you already have a
| trusted binary!
|
| Okay - but it's a hassle, you don't want to have to do
| that, right? Too bad - reproducible builds only work if
| someone reproduces them. You're still going to have to
| replicate it somewhere you trust, so you gained
| practically nothing.
| eru wrote:
| With the reproducible build, you can start using the
| untrusted binary while you are still building your
| trusted one.
|
| You can also have ten people on the internet verify the
| untrusted binary. With signatures, adding more people
| doesn't help.
| taviso wrote:
| > With the reproducible build, you can start using the
| untrusted binary while you are still building your
| trusted one.
|
| That's not how it works, you _have_ to reproduce it
| before it becomes trusted.
|
| > You can also have ten people on the internet verify the
| untrusted binary.
|
| Sure, then we have to build a complex consensus system
| that introduces a bunch of unsolved problems. My opinion
| is that this just isn't worth it, there is practically
| nothing to gain and it's really really hard.
| eru wrote:
| > That's not how it works, you have to reproduce it
| before it becomes trusted.
|
| Eh, there's stuff you can do with software before you
| trust it. Eg you can start pressing the CDs or
| distributing the data to your servers. Just don't execute
| it, yet.
|
| > Sure, then we have to build a complex consensus system
| that introduces a bunch of unsolved problems. My opinion
| is that this just isn't worth it, there is practically
| nothing to gain and it's really really hard.
|
| It's the same informal system that keeps eg debian or the
| Linux kernel secure currently:
|
| People don't do kernel reviews themselves. They just use
| the official kernel, and when someone finds a bug (or
| spots otherwise bad code), they notify the community.
|
| Similar with reproducible builds: most normal people will
| just use the builds from their distro's server, but
| independent people can do 'reviews' by running builds.
|
| If ever a build doesn't reproduce, that'll be a loud
| failure. People will complain and investigate.
|
| Reproducible builds in this scenario don't protect you
| from untrusted code upfront, but they make sure you'll
| know when you have been attacked.
| taviso wrote:
| > People don't do kernel reviews themselves. They just
| use the official kernel, and when someone finds a bug (or
| spots otherwise bad code), they notify the community.
|
| There's a big difference here. When a vulnerability is
| found in the Linux kernel, that doesn't mean that you
| were compromised.
|
| If a build was found to be malicious, then you definitely
| were compromised and it's little solace that it was
| discovered after the fact. This is why package managers
| check the deb/rpm signature _before_ installing the
| software, not after.
| brianzelip wrote:
| Here's an informative podcast episode on Nix with one of its
| maintainers, https://changelog.com/podcast/437.
| jnxx wrote:
| A good sign that the friendly competition by Guix has a positive
| influence :)
|
| https://guix.gnu.org/manual/en/html_node/Bootstrapping.html
|
| https://guix.gnu.org/en/blog/2020/guix-further-reduces-boots...
| delroth wrote:
| This smaller bootstrap seed thing is a different problem from
| reproducible builds. nixpkgs does still have a pretty big
| initial TCB (aka. stage0) compared to Guix. But as far as I can
| tell NixOS has the upper hand in terms of how much can be built
| reproducibly (aka. the output hash matches across separate
| builds).
| siraben wrote:
| There's an issue for this[0]. Currently Nixpkgs relies on a
| 130 MB (!) uncompressed tarball, which is pretty big compared
| to Guix. It would be amazing to get it down to something like
| less than 1 KB with live-bootstrap.
|
| Also, due to the way Nixpkgs is architectured, it also lets
| us experiment with more unusual ideas like a uutils-based
| stdenv[1] instead of GNU coreutils.
|
| [0] https://github.com/NixOS/nixpkgs/issues/123095
|
| [1] https://github.com/NixOS/nixpkgs/pull/116274
| jnxx wrote:
| Bootstrapping from a very small binary core (I think 512
| bytes) with an initial C compiler written in Scheme also has
| the advantage that the system can easily be ported to
| different hardware. Which is one major strength of the GNU
| projects and tools.
| delroth wrote:
| Not necessarily. Usually these very small cores end up
| being more architecture specific binaries than a stage0
| consisting of gcc + some other core packages. A good
| illustration of this is that Guix's work on bootstrap seed
| reduction has been so far mostly applied to i686/amd64 and
| not even other architectures they support (at least, not
| fully).
| rejectedandsad wrote:
| I really want to adopt Nix and NixOS for my systems but the cost
| of wrapping packages is just a little too high for me right now
| (or perhaps I'm out of date and a new cool tool that does it
| automatically is out). IMHO, a dependency graph-based build
| system that builds a hermetically sealed transitive closure of an
| app's dependencies that can be plopped into a rootfs via Nix [0]
| is far superior security wise to the traditional practice of
| writing docker files.
|
| [0] https://yann.hodique.info/blog/using-nix-to-build-docker-
| ima...
| SuperSandro2000 wrote:
| Did you try https://github.com/Mic92/nix-ld or
| https://nixos.org/manual/nixpkgs/unstable/#setup-hook-autopa...
| ?
| rejectedandsad wrote:
| Hm, this seems like a lower level set of tools that can be
| composed into something a bit more user-friendly (one of my
| personal complaints with Nix as well, despite being a big fan
| of the concept and overall execution. Nothing too steep that
| can't be learned eventually, but the curve exists). I'm
| wondering if there would be an audience for a higher level
| abstraction on top of Nix, or if one already exists.
| rswail wrote:
| Recently, President Biden put out an executive order that
| mandates that NIST et al work out, over the next year, an
| SBOM/supply chain mandate for software used by Federal
| departments.
|
| That's going to require the equivalent of "chain of custody"
| attestations along the entire build chain.
|
| Along with SOC and PCI/DSS and other standards, this is going to
| require companies and developers to adopt NixOS type immutable
| environments.
| ffk wrote:
| Unfortunately, I don't think this is going to be the outcome.
| We're more likely to end up with "Here is the list of
| filenames, subcomponents, and associated hashes" as opposed to
| requiring NixOS style environments. Vendors to the
| subcontractors will likely be required to provide the same list
| of filename/subcomponent/hashes, a far cry from repeatable
| builds.
| 1MachineElf wrote:
| I doubt it will happen until NixOS or similar tool has a
| corresponding DISA STIG.
| pabs3 wrote:
| I wonder what else other than disorderfs they could throw in to
| flush out additional currently hidden non-determinism.
| koolba wrote:
| There's something very poetic about "unstable" being
| "reproducible". It's like controlled chaos.
| dane-pgp wrote:
| I believe that's known as the Chaotic Good alignment.
| toastal wrote:
| I really liked how easy it was to create a custom ISO when I
| installed Nix. For once I had Dvorak as the default keyboard from
| the outset, neovim for editing, and the proprietary WiFi drivers
| I needed all from a minimal config file and `nix build`.
| fouronnes3 wrote:
| Are there synergies with the Debian reproducible build project
| that this can benefit from?
| Denvercoder9 wrote:
| In general, Debian aims to upstream the changes they make to
| software. That allows all other distributions, including Nix,
| to profit from their work making software reproducible.
| aseipp wrote:
| Debian has been a major driver in making many pieces of
| software reproducible across _every_ distribution; that Debian
| maintainers so often submit patches upstream and work directly
| to solve these issues is a big reason for this.
|
| In other words: the work Debian has done absolutely set the
| stage for this to happen, and it would have taken much longer
| without them.
| amelius wrote:
| Hopefully this will one day also work with NVidia's software
| packages.
| jeroenhd wrote:
| The trick with nvidia on Linux is to not expect that they will
| ever work on anything. If you want to be sure that stuff works,
| either don't buy Nvidia or use Windows.
| amelius wrote:
| What would you recommend instead of NVidia's Jetson embedded
| platform?
| jeroenhd wrote:
| I'm not familiar with the market the Jetson is in and what
| purposes it serves. From a quick Google, it seems to build
| boards for machine learning? If that's true, I'm pretty
| sure Google and Intel have products in that space, and I'm
| sure there's other brands I don't know of.
|
| If Nvidia has its own distribution, it might well work for
| as long as it's willing to maintain the software because
| then they can tune their open source stuff to make it work
| with their proprietary drivers, the same way Apple is
| hiding their tensorflow code. I still would be hesitant to
| rely on Nvidia in that case given their history.
| aseipp wrote:
| Google and Intel's solutions are just as proprietary,
| with the downside almost nobody uses them so bugs,
| performance, supported tooling, community, and support
| windows are often much worse. It's not even clear their
| solutions actually offer better performance in general,
| given this. (And if you think proprietary Nvidia software
| packages are infuriating messes, wait until you try Intel
| proprietary software.) How you feel about their history
| of Linux support all that said is basically irrelevant,
| and they'll continue to dominate because of it.
| solarkraft wrote:
| That is a pretty big deal.
|
| This means everyone building NixOS will get the _exact_ same
| binary, meaning you can now trust _any_ source for it because you
| can verify the hash.
|
| It's a huge win compared to the current default distribution
| model of "just trust these 30 american entities that the software
| does what they say it does".
|
| Big congratulations to the team.
| groodt wrote:
| This is a big deal. Congratulations to all involved.
|
| In Software, complexity naturally increases over time and
| dependencies and interactions between components become
| impossible to reason about. Eventually this complexity causes the
| Software to collapse under its own weight.
|
| Truly reproducible builds (such as NixOS and Nixpkgs) provides us
| with islands of "determinism" which can be taken as true
| invariants. This enables us to build more Systems and Software on
| top of deterministic foundations that can be reproduced by
| others.
|
| This reproducibility also enables powerful things like
| decentralized / distributed trust. Different third-parties can
| build the same software and compare the results. If they differ,
| it could indicate one of the sources has been compromised. See
| Trustix https://github.com/tweag/trustix
| dcposch wrote:
| This really deserves more love.
|
| Who remembers Ken Thompson's "Reflections on Trusting Trust"?
|
| The norm today is auto-updating, pre-built software.
|
| This places a ton of trust in the publisher. Even for open-
| source, well-vetted software, we all collectively cross our
| fingers and hope that whoever is building these binaries and
| running the servers that disseminate them, is honest and good at
| security.
|
| So far this has mostly worked out due to altruism (for open
| source maintainers) and self interest (companies do not want to
| attack their own users). But the failure modes are very serious.
|
| I predict that everyone's imagination on this topic will expand
| once there's a big enough incident in the news. Say some package
| manager gets compromised, nobody finds out, and 6mo later every
| computer on earth running `postgres:latest` from docker hub gets
| ransomwared.
|
| There are only two ways around this:
|
| - Build from source. This will always be a deeply niche thing to
| do. It's slow, inconvenient, and inaccessible except to nerds.
|
| - Reproducible builds.
|
| Reproducible builds are way more important than is currently
| widely appreciated.
|
| I'm grateful to the nixos team for being beating a trail thru the
| jungle here. Retrofitting reproducibility onto a big software
| project that grew without it, is hard work.
| swiley wrote:
| >self interest (companies do not want to attack their own
| users).
|
| Anyone who has bought an Android phone in the past 5 years
| knows that's not true.
| staticassertion wrote:
| > Reproducible builds are way more important than is currently
| widely appreciated.
|
| Why? How will this help with the problems you're talking about?
|
| I can't come up with a single benefit to security from
| reproducible builds. It seems nice for operational reasons and
| performance reasons though.
| pilif wrote:
| _> I can 't come up with a single benefit to security from
| reproducible builds._
|
| It is a means to allow to detect a compromised supply chain.
| If people rebuilding a distro cannot get the same hash as the
| distro shipping from the distributor, then likely the
| distributors infrastructure has been compromised
| staticassertion wrote:
| How does this work in practice? The distro is owned, so
| where are you getting the hash from? I mean, specifically,
| what does the attacker have control of and how does a
| repeatable build help me stop them.
| pilif wrote:
| The idea is that multiple independent builders build the
| same distro. You expect all of them to have the same
| final hash.
|
| This doesn't help against the sources being owned, but it
| helps about build machines being owned.
|
| Accountability for source integrity is in theory provided
| by the source control system. Accountability for the
| build machine integrity can be provided by reproducible
| builds.
|
| To answer your specific questions: The attacker has
| access to the distro's build servers and is packaging and
| shipping altered binaries that do not correspond to the
| sources but instead contain added malware.
|
| Reproducible builds allow third parties to also build
| binaries from the same sources and once multiple third
| parties achieve consensus about the build output, it
| becomes apparent that the distro's build infrastructure
| could be compromised.
| staticassertion wrote:
| OK so a build machine is owned and we have a sort of
| consensus for trusted builders, and if there's a
| consensus mismatch we know something's up.
|
| I suppose that's reasonable. Sounds like reproducible
| builds is a big step towards that, though clearly this
| requires quite a lot of infrastructure support beyond
| just that.
| tester756 wrote:
| >There are only two ways around this:
|
| >- Build from source. This will always be a deeply niche thing
| to do. It's slow, inconvenient, and inaccessible except to
| nerds.
|
| if you trust the compiler :)
| User23 wrote:
| > Who remembers Ken Thompson's "Reflections on Trusting Trust"?
|
| > The norm today is auto-updating, pre-built software.
|
| This is a little bit misleading. The actual paper[1] explains
| that you can't even trust source available code.
|
| [1]
| https://users.ece.cmu.edu/~ganger/712.fall02/papers/p761-tho...
| Accujack wrote:
| >The norm today is auto-updating, pre-built software.
|
| Only if you define "norm" as what's prevalent in consumer
| electronics and phones. Certainly, if you go by numbers, it's
| more common than anything else.
|
| That's not due to choice, though, it's because of the desires
| of corporations for ever more extensive control of their
| revenue streams.
| cookiengineer wrote:
| Actually, being able to build projects much easier from GitHub
| is the sole reason why I'm currently using Arch as my main OS.
|
| Building a project is just a shell script with a couple of
| defined functions. Quite literally.
|
| I really admire NixOS's philosophy of pushing the boundaries as
| a distro where everything, including configurations and
| modifications, can be done in a reproducible manner. They're
| basically trying to automate the review process down the line,
| which is absurdly complex as a challenge.
|
| And given stability and desktop integrations improve over time,
| I really think that Nix has the potential to be the base for
| easily forkable distributions. Building a live/bootable distro
| will be so much easier, as everything is just a set of
| configuration files anyways.
| takeda wrote:
| This is slightly different thing. Nix and NixOS are trying to
| solve multiple things, and that's what it might be a bit
| confusing.
|
| Many people don't realize that, but if you get for example
| mentioned project from github and I do and we compile it on
| our machines we get a different file (it'll work the same but
| it won't be exactly the same).
|
| Say we use the same dependencies, we still will get a
| different files, because maybe you used slightly different
| version of the compiler, or maybe those dependencies were
| compiled with different dependencies or compilers. Maybe the
| project while building inserts a date, or pulls some file.
| There are million ways that we would end up with different
| files.
|
| The goal here is to get bit by bit identical files and it's
| like a Holy Grail in this area. NixOS just appears to
| achieved that and all packages that come with the system are
| now fully reproducible.
| eru wrote:
| A rich source of non-reproducibility is non-determinism
| introduced by parallel building.
|
| Preserving parallel execution, but arriving at
| deterministic outputs, is an interesting and ongoing
| challenge. With a rich mathematical structure, too.
| marcosdumay wrote:
| > I predict that everyone's imagination on this topic will
| expand once there's a big enough incident in the news.
|
| How the Solarwinds incident, with about every large software
| vendor being silently compromised for years does not qualify?
|
| Because it does not, people's imagination is as closed as it
| always was.
| yeowMeng wrote:
| Solarwinds is closed source so the choice to build from
| source is not really an option.
| pabs3 wrote:
| They could have distributed the code to a few select
| parties for the purposes of doing a build and nothing more.
| marcosdumay wrote:
| Specifically Microsoft did distribute the code to several
| parties for the purposes of auditing. But they didn't
| allow building it.
| zamadatix wrote:
| Unless you are going to be the equivalent of a full time
| maintainer doing code review for every piece of software you
| use you need to trust other software maintainers reproducible
| builds or not. Considering this is Linux and not even Linus can
| deeply review every change in just the kernel anymore that
| philosophy can't apply to meaningfully large software like
| Nixos.
| Taek wrote:
| You can't solve this problem without having a full history of
| code to inspect (unless you are decompiling), reproducibility
| is the first step and bootstrapability is the second step.
| Then we refine the toolchains and review processes to ensure
| high impact code is properly scrutinized.
|
| What we can't do is throw our hands up and say anyone who
| compromises the toolchain deep enough is just allowed to win.
| It will happen at some point if we don't put the right
| barriers in place.
|
| It's the first step of a long journey, but it is a step we
| should be taking.
| donio wrote:
| https://github.com/fosslinux/live-bootstrap is another
| approach, bootstrapping from a tiny binary seed that you
| could hand-assemble and type in as hex. But it doesn't
| address the dependency on the underling OS being
| trustworthy.
| jnxx wrote:
| That's too black-and-white. Being able to reproduce stuff
| makes some kind of attacks entirely uninteresting because
| malicious changes can be traced back. Which is what many
| types of attackers do not want. Debian, or the Linux kernel,
| for example, are not fool-proof, but both are in practice
| quite safe to work with.
| zamadatix wrote:
| Who are you going to trace it back to if not the maintainer
| anyways? If the delivery method then why is the delivery of
| the source from the maintainer inherently any safer?
| jnxx wrote:
| No, it is not always the maintainer. Imagine you download
| a binary software package via HTTPS. In theory, the
| integrity of the download is protected by the server
| certificate. However, it is possible that certificates
| get hacked, get stolen, or that nation states force CAs
| to give out back doors. In that case, your download could
| have been changed on the fly with arbitrary alterations.
| Reproducible builds make it possible to detect such
| changes.
| zamadatix wrote:
| Same as when you download the source instead of the
| binary and see it reproducibly builds the backdoored
| binary. And at this point we're back to "Build from
| source. This will always be a deeply niche thing to do.
| It's slow, inconvenient, and inaccessible except to
| nerds." anyways.
|
| It's not that reproducible builds provide 0 value it's
| that they don't truly solve the trust problem as
| initially stated. They also have non-security value to
| boot which is often understated compared to the security
| value IMO.
| eru wrote:
| Reproducible builds still help a lot with security. For
| example, they let you shift build latency around.
|
| Eg suppose you have a software package X, available both
| as a binary and in source.
|
| With reproducible builds, you can start distributing the
| binary to your fleet of computers, while at the same time
| you are kicking off the build process yourself.
|
| If the result of your own build is the same as the binary
| you got, you can give the command to start using it.
| (Otherwise, you quarantine the downloaded binary, and
| ring some alarm bells.)
|
| Similarly, you can re-build some random sample of
| packages locally, just to double-check, and report.
|
| If most debian users were to do something like that, any
| tempering with the debian repositories would be quickly
| detected.
|
| (Having a few malicious users wouldn't hurt this strategy
| much, they can only insert noise in the system, but not
| give you false answers that you trust.)
| robocat wrote:
| > and inaccessible except to nerds.
|
| So was most every part of computer hardware and software
| initially - this is just another milestone in that
| journey.
| bigiain wrote:
| I guess reproducible builds solve some of the problems in
| the same way TLS/SSL solves some of the problems.
|
| Most of the world is happy enough with the soft guarantee
| of: "This is _probably_ your bank's real website. Unless
| a nation state is misusing their control over state owned
| certificate authorities, or GlobalSign or LetsEncrypt or
| whoever has been p0wned."
|
| Expecting binary black and white solutions to trust
| problems isn't all that useful, in my opinion. Often
| providing 50% more "value" in trust compared to the
| status quo is extremely valuable in the bigger picture.
| zamadatix wrote:
| Reproducible builds solve many security problems for sure
| but but the problems it solves in no way help you if the
| maintainer is not alturistic or bad at security as
| originally stated. It helps tell you if the maintainers
| toolchain wasn't compromised and it does it AFTER the
| payload is delivered and you built your own payload not
| made by the maintainer anyways. It doesn't even tell you
| the transport/hosting wasn't compromised unless you can
| somehow get a copy of the source used to compile not made
| by the maintainer directly as the transport/hosting for
| the source they maintain could be as well.
|
| Solving that singular attack vector in the delivery chain
| does nothing for solving the need to trust the altruism
| and self interest of maintainers. A good thing(tm)?
| Absolutely, along with the other non security benefits,
| but has nothing to do with needing to trust maintainers
| or be in the niche that reviews source code when
| automatic updates come along as originally sold.
| pabs3 wrote:
| There are other solutions to the problem of trusting
| maintainers; namely incremental distributed code review.
| The Rust folks are working on that:
|
| https://github.com/crev-dev/
| squiggleblaz wrote:
| The question isn't whether they're perfect, nor is it
| whether they prevent anything. But it does help a person
| who suspects something is up rule certain things in and
| out, which increases the chances that the weak link can
| be found and eliminated.
|
| If you have a fair suspicion that something is up and you
| discover that when you compile reproduceable-package you
| get a different output than when you download a prebuilt
| reproduceable-package, you've now got something to work
| with.
|
| Your observation that they don't truly solve the trust
| problem is true. But it's somehow not relevant. It is
| better to be better off.
| eptcyka wrote:
| Even if the original attack happened upstream, if the
| upstreamed piece of software was pinned via git, then
| it'd be trivial to bisect the upstream project to find
| the culprit.
| dragonsky67 wrote:
| This is great if you are looking at attributing blame.
| Not so great if you are trying to prevent all the worlds
| computers getting owned....
|
| I'd imagine that if I were looking at causing world wide
| chaos, I'd love nothing better than getting into the tool
| chain in a way that I could later on utilise on a wide
| spread basis.
|
| At that point I would have achieved my aims and if that
| means I've burnt a few people along the way, so be it,
| I'm a bad guy, the damage has been done, the objective
| met.
| IgorPartola wrote:
| No. I can review 0.1% of the code and verify that it compiles
| correctly and then let another 999 people review their own
| portion. It only takes one person to find a bit of malicious
| code, we don't all need to review every single line.
| remram wrote:
| That only works if you coordinate. With even more people,
| you can pick randomly and be relatively sure you've read it
| all, but I posit that 1) you don't pick randomly, you pick
| a part that is accessible or interesting to you (and
| therefore probably others) and 2) reading code locally is
| not sufficient to find bugs or backdoors in the whole.
| pabs3 wrote:
| The crev folks are working on a co-ordination system for
| incremental distributed code review:
|
| https://github.com/crev-dev/
| remram wrote:
| Crev is a great idea, unfortunately it is only really
| available for Rust right now.
| pabs3 wrote:
| I noticed there is a git-crev project, might that be
| useful for other languages? Also there is pip-crev for
| Python.
| IgorPartola wrote:
| I actually wonder if it's possible to write code at such
| a macro level as to obfuscate, say, a keylogger in a huge
| codebase such that reviewing just a single module/unit
| would not reveal that something bad is going on.
| eru wrote:
| Depends on how complicated the project itself is. A
| simple structure with the bare minimum of side-effects
| (think, functional programming) would make this effort
| harder.
|
| For something like C, all bets are off:
| http://www.underhanded-c.org/ or
| https://en.wikipedia.org/wiki/Underhanded_C_Contest
| xvector wrote:
| > It only takes one person to find a bit of malicious code,
| we don't all need to review every single line.
|
| This is just objectively wrong. I have worked on projects
| at FAANG where entire teams did not spot critical security
| issues during review.
|
| You are very unlikely to spot an issue with just one pair
| of eyes. You need many if you want _any_ hope of catching
| bugdoors.
| IgorPartola wrote:
| You are misunderstanding what I am saying. I am saying
| that it only takes one person who finds a vulnerability
| to disclose it, to a first approximation. Realistically
| it's probably closer to 2-3 since the first might be
| working for the NSA, the CCP, etc. I am making no
| arguments about what amount of effort it takes to find a
| vulnerability, just talking about how not every single
| user of a piece of code needs to verify it.
| radicalcentrist wrote:
| Reproducibility is what allows you to rely on other
| maintainers' reviews. Without reproducibility, you can't be
| certain that what you're running has been audited at all.
|
| It's true that no single person can audit their entire
| dependency tree. But many eyes make all bugs shallow.
| jnxx wrote:
| This is great! The one fly in the ointment, pardon, is that Nix
| is a bit lax about trusting proprietary and binary-only stuff.
| It would be great if there were a FLOSS-only core system for
| NixOS which would be fully transparent.
| rejectedandsad wrote:
| > It would be great if there were a FLOSS-only core system
| for NixOS
|
| Might be wrong but isn't this part of the premise for
| Guix/GuixSD?
| Filligree wrote:
| And it's good that it exists, I _guess?_
|
| But it can't do any of the things I bought my computer to
| do, so it's of limited value to me.
| quarantine wrote:
| Nix/Nixpkgs blocks unfree packages by default, so I presume
| it would be relatively easy to disable packages with the
| `unFree` attribute.
| jnxx wrote:
| I totally believe it is possible, it is perhaps more of a
| cultural thing.
| eptcyka wrote:
| It's the pragmatic thing. I wouldn't use nixOS if I
| wasn't able to use it on a 16 core modern desktop. I
| don't think there's a performant and 100% FLOSS
| compatible computer that wouldn't make me want to gouge
| my eyes out with a rusty spoon when building stuff for
| ARM.
| zamadatix wrote:
| Talos has 44 core/176 thread server options which can
| take 2 TBs of DDR4 that are FSF certified. The board
| firmware is also open and has reproducible builds.
| eptcyka wrote:
| Thanks, I was legitimately unaware of this option. That
| does smash my argument, but I'm not likely to be using a
| system like that anytime soon due to cost concerns
| mostly.
| tadfisher wrote:
| That is way more expensive than a 16-core desktop,
| though. Workstations are a class above consumer-grade
| desktops and that's reflected in the price.
| zamadatix wrote:
| Talos have as low as 8 core desktop options as well this
| is just an example of how far you can take FLOSS
| hardware. Not that I consider a 16 core x86 desktop
| "consumer-grade" in the first place (speaking as a 5950X
| owner).
|
| Probably not fit for replacing Grandma's budget PC but
| then again grandma probably isn't worried about the ARM
| cross compile performance of their machine running NixOS
| either.
| kaba0 wrote:
| And it's not just hardware, there is a useful limit on
| purity of licenses. In many cases only proprietary
| programs can do the work at all, or orders of magnitudes
| better.
| londons_explore wrote:
| > and 6mo later every computer ... gets ransomwared.
|
| I'm really surprised such an attack hasn't happened already. It
| seems so trivial for a determined attacker to take over an
| opensource project (plenty of very popular projects have just a
| single jaded maintainer).
|
| The malicious compiler could inject an extra timed event into
| the main loop for the time the attack is scheduled to begin,
| but only if it's >3 hours away, which simply retrieves a URL
| and executes whatever is received.
|
| Detecting this by chance is highly unlikely - because to find
| it, someone would have to have their clock set months ahead, be
| running the software for many days, _and_ be monitoring the
| network.
|
| That code is probably only a few hundred bytes, so it probably
| won't be noticed in any disassembly, and is only executed once,
| so probably won't show up in debugging sessions or cpu
| profiling.
|
| It just baffles me that this hasn't been done already!
| Gravyness wrote:
| > I'm really surprised such an attack hasn't happened
| already.
|
| If you count npm packages this happened quite a few times
| already. People (who don't understand security very well)
| seems to be migrating to python now.
| schelling42 wrote:
| How do you know it hasn't been done already? (with a more
| silent payload than ransomware) /s
| Tabular-Iceberg wrote:
| What does the /s mean in this context?
| Zetaphor wrote:
| /s is internet parlance to show that the message should
| be read in a sarcastic tone.
| Tabular-Iceberg wrote:
| Yes, but what confused me is that as far as I can tell we
| really don't know that it hasn't been done before.
| ghoward wrote:
| Not GP, but I think it indicates sarcasm?
| esjeon wrote:
| > I'm grateful to the nixos team for being beating a trail thru
| the jungle here. Retrofitting reproducibility onto a big
| software project that grew without it, is hard work.
|
| Actually, it's Debian guys who pushed reproducible build hard
| in the early days. They upstreamed necessary changes and also
| spread the concept itself. This is a two-decade long community
| effort.
|
| In turn, NixOS is mostly just wrapping those projects with
| their own tooling, literally a cherry on the top. NixOS is
| disproportionately credited here.
| catern wrote:
| That's somewhat uncharitable. patchelf, for example, is one
| tool developed by NixOS which is widely used for reproducible
| build efforts. (although I don't know concretely if Debian
| uses it today)
| Foxboron wrote:
| patchelf is not really widely used for solving reproducible
| builds issues. It's made for rewriting RPATHs which is
| essential for NixOS, but not something you would be seeing
| in other distributions except for when someone need to work
| around poor upstream decisions.
| zucker42 wrote:
| I don't think NixOS is getting too much credit. This is an
| accomplishment, even if it was built on the shoulders of
| giants.
| theon144 wrote:
| By the way, here's the stats on Debian's herculean share of
| the efforts: https://wiki.debian.org/ReproducibleBuilds
| raziel2p wrote:
| The ratio of reproducible to non-reproducible packages
| doesn't seem to have changed that much in the last 5 years.
| kzrdude wrote:
| They have new challenges with new packages. In the last 5
| years there entered a lot of rust packages for example, a
| new compiler to tackle reproducibility with (and not
| trivial, even if upstream has worked on it a lot).
| stavros wrote:
| In my experience, rustc builds are reproducible if you
| build on the same path. They come out byte for byte
| identical.
| kungito wrote:
| Yeah I remember there was some drama regarding build
| machine path leaking into the release binaries
| kzrdude wrote:
| Aha.. don't all compilers behave the same way, with debug
| info?
|
| I mean it's worthwhile to fix, but that behaviour seems
| so standard.
| KirillPanov wrote:
| No, rust leaks the path to the source code on the _build_
| machine. This path likely does not even exist on the
| execution machine, so there 's absolutely no good reason
| for this leakage. It is very nonstandard.
|
| It is really, really annoying that the Rust team is not
| taking this problem seriously.
| shawnz wrote:
| I don't think this is correct. Most compilers include the
| path to the source code on the build machine in the debug
| info, and it's a common problem for reproducible builds.
| This is not a rust-specific issue.
|
| Obviously the binary can't contain paths from the
| execution machine because it doesn't know what the
| execution machine will be at compile time, and the source
| code isn't stored on the execution machine anyway. The
| point of including the source path in the debug info is
| for the developer to locate the code responsible if
| there's a crash.
|
| See: https://reproducible-builds.org/docs/build-path/
| colejohnson66 wrote:
| But is it only on debug builds? Or are release builds
| affected? Because if it's the latter, that's a big issue.
| But for the former, does it really matter?
| mikepurvis wrote:
| I think both efforts have been important and have benefitted
| each other. Nix has _always_ had purity /reproducibility as
| tenets, but indeed it was Debian that got serious about it on
| a bit-for-bit basis, with changes to the compilers, tools
| like diffoscope, etc. The broader awareness and feasibility
| of reproducible builds then made it possible for Nix to
| finally realise the original design goal of a content-
| addressed rather than input-addressed store, where you don't
| need to actually sign your binary cache, but rather just sign
| a mapping between input hashes and content hashes.
| Ericson2314 wrote:
| > where you don't need to actually sign your binary cache,
| but rather just sign a mapping between input hashes and
| content hashes.
|
| Though you can and should sign the mapping!
| mikepurvis wrote:
| Of course, yes-- that was what I was saying. But the
| theory with content-addressability is that unlike a
| conventional distro where the binaries must all be built
| and then archived and distributed centrally, Nix could do
| things like age-out the cache and only archive the
| hashes, and a third party could later offer a rebuild-on-
| demand service where the binaries that come out of it are
| known to be identical to those which were originally
| signed. A similar guarantee is super useful when it comes
| to things like debug symbols.
| dcposch wrote:
| Has a full linux image--something you can actually boot--
| existed as a reproducible build before today?
| 0xEFF wrote:
| Forgive my ignorance but isn't that Slackware?
| heisenzombie wrote:
| No, if I build Slackware on my computer and you build
| Slackware on yours; the binaries we end up with will not
| be bit-for-bit identical.
| chriswarbo wrote:
| > This is a two-decade long community effort.
|
| So is Nix/NixOS, which has reproducibility in mind from the
| start.
|
| The earliest example I can find is "Nix: A Safe and Policy-
| Free System for Software Deployment" from 2004 ( https://www.
| usenix.org/legacy/event/lisa04/tech/full_papers/... ):
|
| > Build farms are also important for release management - the
| production of software releases - which must be an automatic
| process to ensure reproducibility of releases, which is in
| turn important for software maintenance and support.
|
| Eelco's thesis (from 2006) also has this as the first bullet-
| point in its conclusion:
|
| > The purely functional deployment model implemented in Nix
| and the cryptographic hashing scheme of the Nix store in
| particular give us important features that are lacking in
| most deployment systems, such as complete dependencies,
| complete deployment, side-by-side deployment, atomic upgrades
| and rollbacks, transparent source/binary deployment and
| reproducibility (see Section 1.5).
| 0xbadcafebee wrote:
| Supply chain attacks are definitely important to deal with, but
| defense-in-depth saves us in the end. Even if a postgres
| container is backdoored, if the admins put postgres by itself
| in a network with no ingress or egress except the webserver
| querying it, an attack on the database itself would be very
| difficult. If on the other hand, the database is run on
| untrusted networks, and sensitive data kept on it... yeah,
| they're boned.
| dcposch wrote:
| In the case of a supply chain attack, you don't even need
| ingress or egress.
|
| Say the posgres binary or image is set to encrypt the data on
| a certain date. Then it asks you to pay X ZEC to a shielded
| address to get your decryption key. This would work even if
| the actual database was airgapped.
| radicalcentrist wrote:
| Reproducibility is necessary, but unfortunately not sufficient,
| to stop a "Trusting Trust" attack. Nixpkgs still relies on a
| bootstrap tarball containing e.g. gcc and binutils, so
| theoretically such an attack could trace its lineage back to
| the original bootstrap tarball, if it was built with a
| compromised toolchain.
| beermonster wrote:
| And also shipped firmware or binary blobs.
| mjg59 wrote:
| Diverse double compilation should allow a demonstration that
| the toolchain is trustworthy.
| smitty1e wrote:
| And how _about_ that hardware and firmware microcode?
| Foxboron wrote:
| Indeed, and with the work done by Guix and the Reproducible
| Builds project we do have a real-world example of diverse
| double compilation which is not just a toy example
| utilizing the GNU Mes C compiler.
|
| https://dwheeler.com/trusting-trust/#real-world
| dane-pgp wrote:
| Projects like GNU Mes are part of the Bootstrappable
| Builds effort[0]. Another great achievement in that area
| is the live-bootstrap project, which has automated a
| build pipeline that goes from a minimal binary seed up to
| tinycc then gcc 4 and beyond.[1]
|
| [0] https://www.bootstrappable.org/
|
| [1] https://github.com/fosslinux/live-
| bootstrap/blob/master/part...
| Foxboron wrote:
| I feel the need to point out that the "Bootstrappable
| Builds" project is a working group from a Reproducible
| Builds project which where interested in the next step
| beyond reproducing binaries. Obviously this project has
| seen most effort from Guix :)
|
| The GNU Mes C experiment mentioned above was also
| conducted during the 2019 Reproducible Builds summit in
| Marrakesh.
|
| https://reproducible-builds.org/events/Marrakesh2019/
| naniwaduni wrote:
| In principle, diverse double-compiling merely increases the
| number of compilers the adversary needs to subvert. There
| are obvious practical concerns, of course, but frankly this
| raises the bar _less_ than maintaining the backdoor across
| future versions of the same compiler did in the first
| place, since at least backdooring multiple contemporary
| compilers doesn 't rely on guessing, well ahead of time,
| what change future people are going to make.
|
| Critically, it shouldn't be taken as a _demonstration_ that
| the toolchain is trustworthy unless you trust whoever 's
| picking the compilers! This kind of ruins approaches based
| on having any particular outside organization certify
| certain compilers as "trusted".
| XorNot wrote:
| There is an uphill effort here to actually do this. While
| theoretically a very informed adversary might get it
| right first time, human adversaries are unlikely to and
| their resources are large, but far from infinite.
|
| Your entire effort is potentially brought down by someone
| making a change in a way you didn't expect and someone
| goes "huh, that's funny..."
| GauntletWizard wrote:
| Quite frankly, I'm surprised that is hasn't come up
| multiple times in the course of getting to NixOS and etc.
| The attacks are easy to hide and hard to attribute.
| User23 wrote:
| Really? How does that accomplish more than proving the
| build is a fixed point? An attacker may well be aware of
| the fixed point combinator after all.
|
| Edit: I think that tone may have come off as snarky, but I
| meant it as an honest question. If any expert can answer
| I'd really appreciate it.
| eru wrote:
| Fixed points don't come in here at all, unless you
| specifically want to talk about compiling compilers.
|
| Diverse double compilation is useful for run-of-the mill
| programs, too.
| chriswarbo wrote:
| Programs built by different compilers aren't generally
| binary comparable, e.g. we shouldn't expect empty output
| from `diff <(gcc run-of-the-mill.c) <(clang run-of-the-
| mill.c)`
|
| However, the _behaviour_ of programs built by different
| compilers should be the same. Run-of-the-mill programs
| could use this as part of a test suite, for example; but
| diverse double compilation goes a step further:
|
| We build compiler A using several different compilers X,
| Y, Z; then use those binaries A-built-with-X, A-built-
| with-Y, A-built-with-Z to compile A. The binaries
| A-built-with-(A-built-with-X), A-built-with-(A-built-
| with-Y), A-built-with-(A-built-with-Z) should all be
| identical. Hence for 'fully countering trusting trust
| through diverse double-compiling', we must compile
| compilers https://dwheeler.com/trusting-trust/
| tbrock wrote:
| Why does building from source help? It's not like people are
| reading every line of the source before building it anyway
| 99.99% of the time.
| xvector wrote:
| If the package maintainer's build pipeline is compromised
| (eg. Solarwinds), you are unlikely to be affected if you
| build from reviewed source yourself.
| pjmlp wrote:
| Except hardly anyone reviews a single line of code.
| squiggleblaz wrote:
| So? We are trying to protect against a malicious
| interloper damaging the machine of a trusted and
| trustworthy partner.
|
| You are bringing up red herrings about trusted partners
| being malicious and untrustworthy.
|
| Do you genuinely believe we should only solve a problem
| if it leads to a perfect outcome?
| pjmlp wrote:
| I genuinely believe to spend resources on issues where
| ROI is positive.
|
| So far exploits on FOSS kind of prove the point not
| everyone is using Gentoo, reading every line of code on
| their emerged packakges, let alone similar computing
| models.
|
| Now if we are speaking about driving the whole industry
| to where security bugs, caused by using languages like C
| that cannot save us from code reviews unless done by ISO
| C language lawyers and compiler experts in UB
| optimizations, are heavily punished like construction
| companies are for a fallen bridge, then that would be
| interesting.
| therealjumbo wrote:
| > I genuinely believe to spend resources on issues where
| ROI is positive.
|
| How are you measuring the ROI of security efforts inside
| an OSS distro like debian or nixos? The effort in such
| orgs is freely given, so nobody knows how much it costs.
| And how would you calculate the return on attacks that
| have been prevented? Even if an attack wasn't prevented
| you don't know how much it cost, and you might not even
| know if it happened (or if it happened due to a lapse in
| debian.)
|
| >So far exploits on FOSS kind of prove the point not
| everyone is using Gentoo, reading every line of code on
| their emerged packakges, let alone similar computing
| models.
|
| Reproducible builds is attempting to mitigate a very
| specific type of attack, not all attacks in general. That
| is, it focuses on a specific threat model and countering
| that, nothing else. It's not a cure for cancer either.
|
| >Now if we are speaking about driving the whole industry
| to where security bugs, caused by using languages like C
| that cannot save us from code reviews unless done by ISO
| C language lawyers and compiler experts in UB
| optimizations, are heavily punished like construction
| companies are for a fallen bridge, then that would be
| interesting.
|
| This is just a word salad of red herrings. Different
| people can work on different stuff at the same time.
| 1vuio0pswjnm7 wrote:
| "- Build from source. This will always be a deeply niche thing
| to do. It's slow, inconvenient, and inaccessible except to
| nerds."
|
| I prefer compiling from source to binary packages. For me it is
| neither slow, incovenient nor inaccessible.
|
| Only with larger, more complex programs does compiling from
| source become a PITA.
|
| The "solution" I take is to prefer smaller, less complex
| programs over larger, more complex ones.
|
| If I cannot compile a program from source relatively quickly
| and easily, I do not voluntarily choose it as a program to use
| daily and depend on.
|
| For compiling OS, I use NetBSD so perhaps I am spoiled because
| it is relatively easy to compile.
|
| That said, I understand the value of reproducible builds and
| appreciate the work being done on such projects.
| kixiQu wrote:
| "except to nerds" was conversationally phrased shorthand for
| "except to people with rarefied technical skills".
| [deleted]
| kaba0 wrote:
| You don't use a browser or an office suite? Because those are
| a pain in the ass to compile (in terms of time).
| 1vuio0pswjnm7 wrote:
| Not just time, IME. Also 1. highly resource intensive,
| e.g., cannot compile on small form factor computers (easier
| for me to compile a kernel than a "modern" browser) and 2.
| brittle.
| zucker42 wrote:
| Don't take this the wrong way, but I think you qualify as a
| nerd. :)
| brigandish wrote:
| Unfortunately, it's easy to break a lot of builds by things
| such as deciding not to install to /usr/local, or by building
| on a Mac. Pushing publishers to practices that aid
| reproducible builds would help both sides.
|
| I'd love to try building NetBSD, btw, I must try that!
| vore wrote:
| I think using NetBSD might put you in the nerd camp ;-)
| hsbauauvhabzb wrote:
| I don't have the resources to audit every component of my
| system. I favour enterprise distros who audit code which ends
| up in their repos and avoid pip, npm, etc. but there are some
| glaring trade offs on both productivity and scalability.
|
| The problem is unmaintainability, I can't imagine it'd be
| easier for medium sized teams where security isn't a priority,
| either.
| initplus wrote:
| Building from source doesn't have to be inaccessible, if the
| build tooling around it is strong. Modern compiled languages
| like Go (or modern toolchains on legacy languages like vcpkg)
| have a convention of building everything possible from source.
|
| So at least for software libraries building from source is
| definitely viable. Fro end user applications it's another story
| though, doubt we will ever be at a point where building your
| own browser from source makes sense...
| garmaine wrote:
| Binary reproducible builds are still pretty inaccessible
| though.
| bigiain wrote:
| Building from source also doesn't buy you very much, if you
| haven't inspected/audited the source.
|
| The upthread hypothetical of a compromised package manager
| equally applies to a compromised source repo.
|
| _Maybe _ you always check the hashes? _Maybe_ you always get
| the hashes from a different place to the code? _Maybe_ the
| hypothetical attacker couldn't replace both the code you
| download and the hash you use to check it?
|
| (And as Ken pointed out decades ago, maybe the attacker
| didn't fuck with your compiler so you had lost before you
| even started.)
| goodpoint wrote:
| Reminder: https://reproducible-builds.org/ was born in Debian and
| pioneered reproducible building.
|
| It took very significant efforts and largely benefit build tools
| (compilers, linkers, libraries) that are not Debian-specific.
| georgyo wrote:
| Mandatory link to the Debian single purpose site:
| https://isdebianreproducibleyet.com/
|
| However that is for everything in Debian, not just the iso. It is
| truly remarkable to see all the Linux distributions move the
| needle forward.
| Foxboron wrote:
| And Arch Linux :)
|
| https://reproducible.archlinux.org/
| [deleted]
| pabs3 wrote:
| I wonder when PyPI and similar ecosystems will get deterministic
| reproducible builds.
| mraza007 wrote:
| This might be a dumb question but what's a reproducible build
| egberts1 wrote:
| The ability to recreate a binary image from the same set of
| source files and getting that binary to be identical to the
| package-provided binary.
|
| This is useful form of ensuring that nothing is amiss during
| compile/link time.
|
| Today's GNU toolchain clutters the interior of binary files
| with random hash values, full file path (that you couldn't
| recreate ... easily), and random tmpfile directories.
|
| The idea is to make it easier to verify a binary, compare it
| with earlier-built-but-same-source binary, or to be able to
| reverse engineering it (and catch unexpected changes in code).
| mraza007 wrote:
| Oh okay thank you for explanation. You explained it really
| well
| aseipp wrote:
| I think r13y has said the minimal ISO was less than 10 packages
| away from 100% over _2 years_ now. The long tail has finally been
| overcome! Huge news.
| jonringer117 wrote:
| Some of the issues were really difficult to tackle, like the
| linux kernel generating random hashes.
|
| The last mile was done by removing the use of ruby (which uses
| some random tmp directories) from the final image. Asciidoctor
| (ruby) was replaced with asciidoc (python).
___________________________________________________________________
(page generated 2021-06-21 23:02 UTC)