[HN Gopher] Git archive checksums may change
___________________________________________________________________
Git archive checksums may change
Author : mcovalt
Score : 77 points
Date : 2023-01-30 21:48 UTC (1 hours ago)
(HTM) web link (github.blog)
(TXT) w3m dump (github.blog)
| wildfire wrote:
| See https://github.com/orgs/community/discussions/45830 for the
| fallout.
| doubleunplussed wrote:
| Ah, this will presumably break some Arch Linux AUR packages.
| Preparing for bug reports.
| elesiuta wrote:
| I always anticipated something like this could happen and it
| bothered me enough to create my own workflow [1] to archive,
| hash, and attach it to each release automatically for my AUR
| package
|
| [1]
| https://github.com/elesiuta/picosnitch/blob/master/.github/w...
| jiripospisil wrote:
| Yep, it has already broken labwc for me. ==>
| Validating source files with b2sums...
| labwc-0.6.1.tar.gz ... FAILED ==> ERROR: One or more
| files did not pass the validity check!
| jiripospisil wrote:
| GitHub will need to revert this change. They've just crippled
| pretty much every "from source" package manager out there.
| nick__m wrote:
| I prefer that tool be adapted to be more resilient and not
| depend on github particular implementation.
| swarfield wrote:
| Using SHA hashes when building guarantees that the code that
| you are building is what you think it is. How else would you
| verify dependencies like this, GPG signatures would have the
| same issue if you change the underlying bits.
| ErikCorry wrote:
| This seems like a weak argument.
|
| Firstly SHA is not a secure hash.
|
| Secondly if your build step involves uploading data to a
| third party then allowing them to transform it as they see
| fit and then checksumming the result then it's not really a
| reproducible build. For all you know, Github inserts a
| virus during the compression of the archive.
|
| What am I missing?
| IanCal wrote:
| I think the reproducible build part is about projects
| that depend on these outputs. The goal is ensuring you
| and I have both pulled exactly the same dependencies.
| blueflow wrote:
| 1) SHA-256 is reasonably secure
|
| 2) The checksum assures you that the file you have is the
| same your upstream looked at
| ArchOversight wrote:
| a git checkout of the code at that particular tag hasn't
| changed. Just the tarball that git archive generates has.
| vlovich123 wrote:
| The two main problems are:
|
| A) How do you catch tarballs that have extra files
| injected that aren't part of your manifest
|
| B) What does the performance of this look like? Certainly
| for traditional HDDs this is going to kill performance,
| but even for SSDs I think verifying a bunch of small
| files is going to be less efficient than verifying the
| tarball.
| ArchOversight wrote:
| A wouldn't be an issue since you are checking out a git
| tag.
|
| B would just be a normal git checkout, which already
| validates that all the objects are reachable and git tags
| (and commits for that matter) can be signed, and since
| the sha1 hash is signed as well it validates that the
| entire tree of commits has not been tampered with. So as
| long you trust git to not lie about what it is writing to
| disk, you have a valid checkout of that tag.
|
| And if you do expect it to lie, why do you expect tar to
| not lie about what it is unpacking?
| duped wrote:
| Ok, now guarantee that.
| metrognome wrote:
| Per the post, this was a change to git itself:
| https://github.com/git/git/commit/4f4be00d302bc52d0d9d5a3d47...
| forgotpwd16 wrote:
| What was the thought behind this change?
| fweimer wrote:
| They could just produce tar output and compress that using
| system gzip. The "git archive" tool supports many output
| formats.
| acdha wrote:
| If those tools incorrectly assume an API contract which doesn't
| exist, isn't the right answer to fix those tools?
| kentonv wrote:
| In theory, sure, that's what we'd do in an ideal world.
|
| In the real world it will take millions of dollars of eng
| labor just to update the hashes to fix everything that's
| currently broken and millions more to actually implement
| something better and move everyone over to it.
|
| This isn't worth it, GitHub needs to just revert the change
| and then engineer a way to keep hashes stable going forward.
| groestl wrote:
| See also: https://daniel.haxx.se/blog/2013/03/23/why-no-
| curl-8/
|
| "The amount of work done "out there" on hundreds or
| thousands of applications for a single little libcurl tweak
| can be enormous. The last time we bumped the ABI, we got a
| serious amount of harsh words and critical feedback and
| since then we've gotten many more users!"
| swarfield wrote:
| https://github.com/bazel-contrib/SIG-rules-authors/issues/11...
| forgotpwd16 wrote:
| Can anyone explain what happened? Thing changed, things broke,
| and things changed back in less than an hour.
| swarfield wrote:
| They have broken almost every open source project that builds
| external deps. Also broke homebrew apparently.
| gray_-_wolf wrote:
| Did people not know this? Honest question. I did run into this
| few times already before this change, so I assumed this would be
| wide-spread knowledge and mirrored everything.
| skobovm wrote:
| How would anyone (outside of GH) have known this? The checksums
| have been stable for years, and this issue resulted from an
| internal update to the version of Git being used. It also was
| not publicized, until this ex post facto blog post
| blueflow wrote:
| I remember contributing to package recipes where linking to
| Github was explicitly forbidden due to checksum instability.
| This was 7 years ago i think.
| anecdotal1 wrote:
| They have not been stable
|
| https://github.com/freebsd/freebsd-
| ports/commit/a43ec88422ee...
| mhitza wrote:
| https://xkcd.com/1053/
| ArchOversight wrote:
| I remember a similar breakage happening before due to internal
| git changes, and thought it was common knowledge to upload your
| own signed tarballs for releases.
| medellin wrote:
| Im thinking of all the bazel build rules that are about to break
| from my last company. Someone will have a fun day updating
| hundreds of hashes.
| jart wrote:
| If they're using multiple URLs like a good Bazel user then they
| shouldn't be impacted.
| medellin wrote:
| They did where applicable but i know that not all of them had
| multiple
| jart wrote:
| Well now they know why it's so important. https://github.co
| m/bazelbuild/bazel/commit/ed7ced0018dc5c5eb...
| ErikCorry wrote:
| Do they let Github generate the archives as one of the build
| rules instead of performing the archival and compression
| locally and uploading the result?
| medellin wrote:
| Correct. Silly stuff like this happens when you don't have
| systems in place that make it easy to store your own
| artifacts. Additionally a lot of people just want to get
| things done as quick as possible even if you have the tools
| in place.
| [deleted]
| UncleOxidant wrote:
| Lol... I was being burned by this just about an hour ago. Cloned
| a repo, did a build of the project (which uses bezel to fetch
| dependencies) and it reported errors due to mismatch in expected
| checksums.
| vlovich123 wrote:
| Hyrum's Law strikes again. It kind of doesn't matter what you
| document. If you weren't randomizing your checksum previously
| [1], you can't just spring this on the community and blame it for
| the fallout. I'm more shocked that there's resistance from the
| GitHub team saying "but we documented this isn't stable". Default
| stance for the team should be rollback & reevaluate an alternate
| path forward when the scope is this wide (e.g. only generating
| the new tarballs for future commits going forward).
|
| [1] Apparently googlesource did do this and just had people shift
| to using GitHub mirrors to avoid this problem.
| blueflow wrote:
| But look at it from the other side. Users that don't read your
| documentation and expect your software to work like they
| imagined are just a huge pain in the ass.
| kkirsche wrote:
| This. You have to draw the line somewhere. Was this specific
| choice that line? Maybe not, but sometimes users aren't right
| and changes just need to occur to ensure other asks from the
| same users can be delivered.
| sneak wrote:
| It's Microsoft. Just as the Apple of today is not the Apple of
| ten years ago, the GitHub today is not the GitHub of ten years
| ago. It's literally different people.
|
| The people who made the things you love have mostly moved on,
| and the brand is being run by different people with different
| values now.
|
| There's a little bit of an argument that such things are a
| bait-and-switch, but such is the nature of a large and
| multigenerational corporation.
| daniealapt wrote:
| https://xkcd.com/1172/
| [deleted]
| [deleted]
| vtbassmatt wrote:
| Hey folks. I'm the product manager for Git at GitHub. We're sorry
| for the breakage, we're reverting the change, and we'll
| communicate better about such changes in the future (including
| timelines).
|
| Also posted here: https://github.com/bazel-contrib/SIG-rules-
| authors/issues/11...
| kris-nova wrote:
| Thanks for the update! There is only 1 internet to watch and
| learn from. We are all in this together. <3
| denom wrote:
| In my particular use-case, I'm using a set of local dev tools
| hosted as a homebrew tap.
|
| The build looks up the github tar.gz release for each tag and
| commits the sha256sum of that file to the formula
|
| What's odd is that all the _historical_ tags have broken
| release shasums. Does this mean the entire set of zip/tar.gz
| archives has been rebuilt? That could be a problem, as perhaps
| you cannot easily back out of this change...
| crote wrote:
| The trick here is that a Github release is in essence simply
| a tag of a specific commit. There is no need to build
| archives in advance, as they can be dynamically generated
| from the git repo.
|
| However, if you change the compression algorithm used to
| generate the archive, it'll result in a different checksum!
| The _content_ is the same, but the _archive_ is not.
| Denvercoder9 wrote:
| > Does this mean the entire set of zip/tar.gz archives has
| been rebuilt?
|
| They are probably generated on-demand (and cached) from the
| Git repository, not prebuilt.
| [deleted]
| vtbassmatt wrote:
| We updated our Git version which made this change for the
| reasons explained. At the time we didn't foresee the impact.
| We're quickly rolling back the change now, as it's clear we
| need to look at this more closely to see if we can make the
| changes in a less disruptive way. Thanks for letting us know.
| phphphphp wrote:
| Consumers often mistake _hasn't changed_ for a commitment to
| never change: any sufficiently large product will be littered
| with these kind of implicit commitments made by the product
| to consumers that nobody has visibility into. You're
| unfortunate that we were all relying on this commitment
| you've never made, but the quick reversion is the best we can
| hope for. People will theorise how this could have been
| avoided but c'est la vie -- easy mistake that you've
| responded well to.
| dharmab wrote:
| Hyrum's Law:
|
| With a sufficient number of users of an API, it does not
| matter what you promise in the contract: all observable
| behaviors of your system will be depended on by somebody.
| nickitolas wrote:
| FWIW according to https://github.com/bazel-contrib/SIG-
| rules-authors/issues/11... a commitment _was_ made,
| although in an exchange in some support ticket, and not in
| documentation.
| VWWHFSfQ wrote:
| At this point they'll be stuck on old git for all of
| eternity unless they just roll their own archive/compress
| step out of band so the old hashes still work. Yikes.
| [deleted]
| jzelinskie wrote:
| Does anyone have the motivation for why the git project wants to
| use their own implementation of gzip? Did this implementation
| already exist and was being used for something else?
|
| I understand wanting fewer dependencies, but gut-reaction is that
| it's a bad move in the unsafe world of C to rewrite something
| that already has a far more audited, ubiquitous implementation.
| groestl wrote:
| I think "Drop the dependency on gzip" for something like Git
| trumps a bit more exposure (which can be mitigated with
| thorough reviews).
| nemetroid wrote:
| They're still using zlib to do the heavy lifting. It's not a
| large patch.
|
| https://public-inbox.org/git/1328fe72-1a27-b214-c226-d239099...
| rektide wrote:
| Now please give us compression options beyond gzip? :) Some zstd
| & lz4 please?
| WayToDoor wrote:
| https://github.com/orgs/community/discussions/45830#discussi...
|
| > Hey folks. I'm the product manager for Git at GitHub. We're
| sorry for the breakage, we're reverting the change, and we'll
| communicate better about such changes in the future (including
| timelines).
| robomc wrote:
| Think this also broke github codespaces (the downloading of
| devcontainer "features").
| skobovm wrote:
| I wonder what monetary loss in productivity was due to this
| change. We noticed this issue a bit before noon, tracked it down
| to GH, sent out company-wide comms notifying others of the
| problem, filed tickets with GH, had to modify numerous repos
| across multiple teams, and now it's 3pm and I'm here reading
| about it.
|
| It's crazy how such a seemingly innocuous change, like this,
| could lead to such widespread loss in productivity across the
| globe.
| daniealapt wrote:
| Any change breaks a workflow - https://xkcd.com/1172/
___________________________________________________________________
(page generated 2023-01-30 23:00 UTC)