[HN Gopher] Git archive checksums may change
       ___________________________________________________________________
        
       Git archive checksums may change
        
       Author : mcovalt
       Score  : 77 points
       Date   : 2023-01-30 21:48 UTC (1 hours ago)
        
 (HTM) web link (github.blog)
 (TXT) w3m dump (github.blog)
        
       | wildfire wrote:
       | See https://github.com/orgs/community/discussions/45830 for the
       | fallout.
        
       | doubleunplussed wrote:
       | Ah, this will presumably break some Arch Linux AUR packages.
       | Preparing for bug reports.
        
         | elesiuta wrote:
         | I always anticipated something like this could happen and it
         | bothered me enough to create my own workflow [1] to archive,
         | hash, and attach it to each release automatically for my AUR
         | package
         | 
         | [1]
         | https://github.com/elesiuta/picosnitch/blob/master/.github/w...
        
         | jiripospisil wrote:
         | Yep, it has already broken labwc for me.                   ==>
         | Validating source files with b2sums...
         | labwc-0.6.1.tar.gz ... FAILED         ==> ERROR: One or more
         | files did not pass the validity check!
        
       | jiripospisil wrote:
       | GitHub will need to revert this change. They've just crippled
       | pretty much every "from source" package manager out there.
        
         | nick__m wrote:
         | I prefer that tool be adapted to be more resilient and not
         | depend on github particular implementation.
        
           | swarfield wrote:
           | Using SHA hashes when building guarantees that the code that
           | you are building is what you think it is. How else would you
           | verify dependencies like this, GPG signatures would have the
           | same issue if you change the underlying bits.
        
             | ErikCorry wrote:
             | This seems like a weak argument.
             | 
             | Firstly SHA is not a secure hash.
             | 
             | Secondly if your build step involves uploading data to a
             | third party then allowing them to transform it as they see
             | fit and then checksumming the result then it's not really a
             | reproducible build. For all you know, Github inserts a
             | virus during the compression of the archive.
             | 
             | What am I missing?
        
               | IanCal wrote:
               | I think the reproducible build part is about projects
               | that depend on these outputs. The goal is ensuring you
               | and I have both pulled exactly the same dependencies.
        
               | blueflow wrote:
               | 1) SHA-256 is reasonably secure
               | 
               | 2) The checksum assures you that the file you have is the
               | same your upstream looked at
        
             | ArchOversight wrote:
             | a git checkout of the code at that particular tag hasn't
             | changed. Just the tarball that git archive generates has.
        
               | vlovich123 wrote:
               | The two main problems are:
               | 
               | A) How do you catch tarballs that have extra files
               | injected that aren't part of your manifest
               | 
               | B) What does the performance of this look like? Certainly
               | for traditional HDDs this is going to kill performance,
               | but even for SSDs I think verifying a bunch of small
               | files is going to be less efficient than verifying the
               | tarball.
        
               | ArchOversight wrote:
               | A wouldn't be an issue since you are checking out a git
               | tag.
               | 
               | B would just be a normal git checkout, which already
               | validates that all the objects are reachable and git tags
               | (and commits for that matter) can be signed, and since
               | the sha1 hash is signed as well it validates that the
               | entire tree of commits has not been tampered with. So as
               | long you trust git to not lie about what it is writing to
               | disk, you have a valid checkout of that tag.
               | 
               | And if you do expect it to lie, why do you expect tar to
               | not lie about what it is unpacking?
        
               | duped wrote:
               | Ok, now guarantee that.
        
         | metrognome wrote:
         | Per the post, this was a change to git itself:
         | https://github.com/git/git/commit/4f4be00d302bc52d0d9d5a3d47...
        
           | forgotpwd16 wrote:
           | What was the thought behind this change?
        
           | fweimer wrote:
           | They could just produce tar output and compress that using
           | system gzip. The "git archive" tool supports many output
           | formats.
        
         | acdha wrote:
         | If those tools incorrectly assume an API contract which doesn't
         | exist, isn't the right answer to fix those tools?
        
           | kentonv wrote:
           | In theory, sure, that's what we'd do in an ideal world.
           | 
           | In the real world it will take millions of dollars of eng
           | labor just to update the hashes to fix everything that's
           | currently broken and millions more to actually implement
           | something better and move everyone over to it.
           | 
           | This isn't worth it, GitHub needs to just revert the change
           | and then engineer a way to keep hashes stable going forward.
        
             | groestl wrote:
             | See also: https://daniel.haxx.se/blog/2013/03/23/why-no-
             | curl-8/
             | 
             | "The amount of work done "out there" on hundreds or
             | thousands of applications for a single little libcurl tweak
             | can be enormous. The last time we bumped the ABI, we got a
             | serious amount of harsh words and critical feedback and
             | since then we've gotten many more users!"
        
       | swarfield wrote:
       | https://github.com/bazel-contrib/SIG-rules-authors/issues/11...
        
       | forgotpwd16 wrote:
       | Can anyone explain what happened? Thing changed, things broke,
       | and things changed back in less than an hour.
        
       | swarfield wrote:
       | They have broken almost every open source project that builds
       | external deps. Also broke homebrew apparently.
        
       | gray_-_wolf wrote:
       | Did people not know this? Honest question. I did run into this
       | few times already before this change, so I assumed this would be
       | wide-spread knowledge and mirrored everything.
        
         | skobovm wrote:
         | How would anyone (outside of GH) have known this? The checksums
         | have been stable for years, and this issue resulted from an
         | internal update to the version of Git being used. It also was
         | not publicized, until this ex post facto blog post
        
           | blueflow wrote:
           | I remember contributing to package recipes where linking to
           | Github was explicitly forbidden due to checksum instability.
           | This was 7 years ago i think.
        
           | anecdotal1 wrote:
           | They have not been stable
           | 
           | https://github.com/freebsd/freebsd-
           | ports/commit/a43ec88422ee...
        
         | mhitza wrote:
         | https://xkcd.com/1053/
        
       | ArchOversight wrote:
       | I remember a similar breakage happening before due to internal
       | git changes, and thought it was common knowledge to upload your
       | own signed tarballs for releases.
        
       | medellin wrote:
       | Im thinking of all the bazel build rules that are about to break
       | from my last company. Someone will have a fun day updating
       | hundreds of hashes.
        
         | jart wrote:
         | If they're using multiple URLs like a good Bazel user then they
         | shouldn't be impacted.
        
           | medellin wrote:
           | They did where applicable but i know that not all of them had
           | multiple
        
             | jart wrote:
             | Well now they know why it's so important. https://github.co
             | m/bazelbuild/bazel/commit/ed7ced0018dc5c5eb...
        
         | ErikCorry wrote:
         | Do they let Github generate the archives as one of the build
         | rules instead of performing the archival and compression
         | locally and uploading the result?
        
           | medellin wrote:
           | Correct. Silly stuff like this happens when you don't have
           | systems in place that make it easy to store your own
           | artifacts. Additionally a lot of people just want to get
           | things done as quick as possible even if you have the tools
           | in place.
        
       | [deleted]
        
       | UncleOxidant wrote:
       | Lol... I was being burned by this just about an hour ago. Cloned
       | a repo, did a build of the project (which uses bezel to fetch
       | dependencies) and it reported errors due to mismatch in expected
       | checksums.
        
       | vlovich123 wrote:
       | Hyrum's Law strikes again. It kind of doesn't matter what you
       | document. If you weren't randomizing your checksum previously
       | [1], you can't just spring this on the community and blame it for
       | the fallout. I'm more shocked that there's resistance from the
       | GitHub team saying "but we documented this isn't stable". Default
       | stance for the team should be rollback & reevaluate an alternate
       | path forward when the scope is this wide (e.g. only generating
       | the new tarballs for future commits going forward).
       | 
       | [1] Apparently googlesource did do this and just had people shift
       | to using GitHub mirrors to avoid this problem.
        
         | blueflow wrote:
         | But look at it from the other side. Users that don't read your
         | documentation and expect your software to work like they
         | imagined are just a huge pain in the ass.
        
           | kkirsche wrote:
           | This. You have to draw the line somewhere. Was this specific
           | choice that line? Maybe not, but sometimes users aren't right
           | and changes just need to occur to ensure other asks from the
           | same users can be delivered.
        
         | sneak wrote:
         | It's Microsoft. Just as the Apple of today is not the Apple of
         | ten years ago, the GitHub today is not the GitHub of ten years
         | ago. It's literally different people.
         | 
         | The people who made the things you love have mostly moved on,
         | and the brand is being run by different people with different
         | values now.
         | 
         | There's a little bit of an argument that such things are a
         | bait-and-switch, but such is the nature of a large and
         | multigenerational corporation.
        
         | daniealapt wrote:
         | https://xkcd.com/1172/
        
       | [deleted]
        
       | [deleted]
        
       | vtbassmatt wrote:
       | Hey folks. I'm the product manager for Git at GitHub. We're sorry
       | for the breakage, we're reverting the change, and we'll
       | communicate better about such changes in the future (including
       | timelines).
       | 
       | Also posted here: https://github.com/bazel-contrib/SIG-rules-
       | authors/issues/11...
        
         | kris-nova wrote:
         | Thanks for the update! There is only 1 internet to watch and
         | learn from. We are all in this together. <3
        
         | denom wrote:
         | In my particular use-case, I'm using a set of local dev tools
         | hosted as a homebrew tap.
         | 
         | The build looks up the github tar.gz release for each tag and
         | commits the sha256sum of that file to the formula
         | 
         | What's odd is that all the _historical_ tags have broken
         | release shasums. Does this mean the entire set of zip/tar.gz
         | archives has been rebuilt? That could be a problem, as perhaps
         | you cannot easily back out of this change...
        
           | crote wrote:
           | The trick here is that a Github release is in essence simply
           | a tag of a specific commit. There is no need to build
           | archives in advance, as they can be dynamically generated
           | from the git repo.
           | 
           | However, if you change the compression algorithm used to
           | generate the archive, it'll result in a different checksum!
           | The _content_ is the same, but the _archive_ is not.
        
           | Denvercoder9 wrote:
           | > Does this mean the entire set of zip/tar.gz archives has
           | been rebuilt?
           | 
           | They are probably generated on-demand (and cached) from the
           | Git repository, not prebuilt.
        
         | [deleted]
        
         | vtbassmatt wrote:
         | We updated our Git version which made this change for the
         | reasons explained. At the time we didn't foresee the impact.
         | We're quickly rolling back the change now, as it's clear we
         | need to look at this more closely to see if we can make the
         | changes in a less disruptive way. Thanks for letting us know.
        
           | phphphphp wrote:
           | Consumers often mistake _hasn't changed_ for a commitment to
           | never change: any sufficiently large product will be littered
           | with these kind of implicit commitments made by the product
           | to consumers that nobody has visibility into. You're
           | unfortunate that we were all relying on this commitment
           | you've never made, but the quick reversion is the best we can
           | hope for. People will theorise how this could have been
           | avoided but c'est la vie -- easy mistake that you've
           | responded well to.
        
             | dharmab wrote:
             | Hyrum's Law:
             | 
             | With a sufficient number of users of an API, it does not
             | matter what you promise in the contract: all observable
             | behaviors of your system will be depended on by somebody.
        
             | nickitolas wrote:
             | FWIW according to https://github.com/bazel-contrib/SIG-
             | rules-authors/issues/11... a commitment _was_ made,
             | although in an exchange in some support ticket, and not in
             | documentation.
        
             | VWWHFSfQ wrote:
             | At this point they'll be stuck on old git for all of
             | eternity unless they just roll their own archive/compress
             | step out of band so the old hashes still work. Yikes.
        
             | [deleted]
        
       | jzelinskie wrote:
       | Does anyone have the motivation for why the git project wants to
       | use their own implementation of gzip? Did this implementation
       | already exist and was being used for something else?
       | 
       | I understand wanting fewer dependencies, but gut-reaction is that
       | it's a bad move in the unsafe world of C to rewrite something
       | that already has a far more audited, ubiquitous implementation.
        
         | groestl wrote:
         | I think "Drop the dependency on gzip" for something like Git
         | trumps a bit more exposure (which can be mitigated with
         | thorough reviews).
        
         | nemetroid wrote:
         | They're still using zlib to do the heavy lifting. It's not a
         | large patch.
         | 
         | https://public-inbox.org/git/1328fe72-1a27-b214-c226-d239099...
        
       | rektide wrote:
       | Now please give us compression options beyond gzip? :) Some zstd
       | & lz4 please?
        
       | WayToDoor wrote:
       | https://github.com/orgs/community/discussions/45830#discussi...
       | 
       | > Hey folks. I'm the product manager for Git at GitHub. We're
       | sorry for the breakage, we're reverting the change, and we'll
       | communicate better about such changes in the future (including
       | timelines).
        
       | robomc wrote:
       | Think this also broke github codespaces (the downloading of
       | devcontainer "features").
        
       | skobovm wrote:
       | I wonder what monetary loss in productivity was due to this
       | change. We noticed this issue a bit before noon, tracked it down
       | to GH, sent out company-wide comms notifying others of the
       | problem, filed tickets with GH, had to modify numerous repos
       | across multiple teams, and now it's 3pm and I'm here reading
       | about it.
       | 
       | It's crazy how such a seemingly innocuous change, like this,
       | could lead to such widespread loss in productivity across the
       | globe.
        
       | daniealapt wrote:
       | Any change breaks a workflow - https://xkcd.com/1172/
        
       ___________________________________________________________________
       (page generated 2023-01-30 23:00 UTC)