hngopher.com

       [HN Gopher] Whatever happened to SHA-256 support in Git?
       ___________________________________________________________________
        
       Whatever happened to SHA-256 support in Git?
        
       Author : chmaynard
       Score  : 267 points
       Date   : 2022-06-23 16:47 UTC (6 hours ago)
        
 (HTM) web link (lwn.net)
 (TXT) w3m dump (lwn.net)
        
       | yjftsjthsd-h wrote:
       | > In his view, the only "defensible" reason to use SHA-1 at this
       | point is interoperability with the Git forge providers.
       | 
       | Okay, but that's a pretty big reason! A git repo that can't be
       | pushed to github/lab is... not always useless, but certainly
       | extremely impaired.
        
         | kragen wrote:
         | In case anyone has forgotten, the process for pushing it to
         | your own server is three shell commands. You run, on the
         | server:                   git init --bare
         | public_html/mything.git         cd
         | public_html/mything.git/hooks/         mv post-update.sample
         | post-update  # runs git update-server-info on push
         | 
         | (This assumes that your public_html directory exists and is
         | mapped into webspace, as with the usual configuration of
         | Apache, NCSA httpd, and CERN httpd. If you don't have an
         | account on such a thing you can get such PHP shared hosting
         | accounts with shell access anywhere in the world for a dollar
         | or two a month.)
         | 
         | And then on your dev machine, it's precisely the same as for
         | pushing to Gitlab or whatever, except that you use your own
         | username instead of git@:                   git remote add
         | someremotename user@myserver:public_html/mything.git
         | git push -u someremotename master # assuming you want it to be
         | your upstream
         | 
         | Then anyone can clone from your repo with a command like this:
         | git clone https://myserver/~user/mything.git
         | 
         | They can also add the URL as a remote for pulls.
         | 
         | If you want them to be able to push, you'll need to give them
         | an account on the same server and either set umasks and group
         | ownerships and permissions appropriately or set a POSIX ACL.
         | Alternatively they can do the same thing on their server and
         | you can pull from it. There are reportedly permission bugs in
         | recent versions of Git (the last five years) that prevent this
         | from being safe with people you don't trust
         | (https://www.spinics.net/lists/git/msg298544.html).
         | 
         | Of course source control is only part of the overall
         | development project workflow, so for many purposes adding
         | SHA-256 support to Gogs or Gitlab or Gitea or sr.ht is probably
         | pretty important: you want a Wiki and CI integration and bug
         | tracking and merge requests. But the _git repo_ still works
         | fine with a bog-standard ssh and HTTP server, though slightly
         | less efficiently. It 's _easier_ than setting up a new repo on
         | GitLab etc.
         | 
         | Running a git repack -an && git update-server-info in the repo
         | on the server can help a lot with the efficiency, and for
         | having a browseable tree on the server as well as a clonable
         | repo I put this script at
         | http://canonical.org/~kragen/sw/dev3.git/hooks/post-update:
         | #!/bin/sh         set -e              echo -n 'updating... '
         | git update-server-info         echo 'done. going to dev3'
         | cd /home/kragen/public_html/sw/dev3         echo -n 'pulling...
         | '         env -u GIT_DIR git pull         echo -n 'updating...
         | '         env -u GIT_DIR git update-server-info         echo
         | 'done.'
         | 
         | That's very far from being GitLab (contrast
         | http://canonical.org/~kragen/sw/dev3 with any GitHub tree
         | view), and it's potentially dangerously powerful: if you're
         | doing this in a repo where you pull from other people, and the
         | server is configured to run PHP files or server-side includes
         | in your webspace (mine isn't!) or CGI scripts (mine is!), then
         | just dropping a file in the repo can run programs on the server
         | with your account privileges. This is great if that's what you
         | want, and it's a hell of a lot better than updating your PHP
         | site over FTP, but that code has full authority to, for
         | example, rewrite your Git history.
         | 
         | In theory you can do other things from your post-update hook as
         | well, like rebuild a Jekyll site, send a message on IRC or some
         | other message queueing system, or fire off a CI build in a
         | Docker container. (Some of these would run afoul of guardrails
         | common in cheap PHP shared hosting providers and you'd have to
         | upgrade to a US$5/month VPS.)
        
           | isomorphic wrote:
           | People also forget about Gitolite, which provides lightweight
           | shared access control around Git+SSH+server-repos. For me
           | it's a much simpler alternative than systems with a
           | heavyweight web UI. Although to be honest I don't know
           | whether Gitolite handles SHA256 hashes (I've never tested
           | it).
           | 
           | https://gitolite.com
           | 
           | https://github.com/sitaramc/gitolite
        
             | kragen wrote:
             | I did forget about Gitolite! Thanks for the reminder! Do
             | you have suggestions for what sorts of CI tooling and bug
             | trackers people might want to use with it?
        
       | armada651 wrote:
       | > Adding my own 0.02, what some of us are facing is resistance to
       | adopting git in our or client organizations because of the
       | presence of SHA-1. There are organizations where SHA-1 is blanket
       | banned across the board - regardless of its use. [...] Getting
       | around this blanket ban is a serious amount of work and I have
       | very recently seen customers move to older much less functional
       | (or useful) VCS platforms just because of SHA-1.
       | 
       | Seems like this company could just use the current SHA-256
       | support then? Especially if it's the type of company that does
       | all its development in-house and there's no need for SHA-1
       | interoperability.
        
         | gorkish wrote:
         | > There are organizations where SHA-1 is blanket banned across
         | the board - regardless of its use.
         | 
         | > I have very recently seen customers move to older much less
         | functional (or useful) VCS platforms just because of SHA-1.
         | 
         | A company this dysfunctional has problems far beyond their
         | choice of revision control system.
        
           | bostik wrote:
           | I can name a couple of industries where compliance (and their
           | enforcement arm, security[0]) teams require N+1 different
           | monitoring and enforcement agents on all systems because
           | Compliance[TM]. Due to these agents the systems' _IDLE_ load
           | is approaching 1.00 - on a good day. On a less good you need
           | four cores to have one of them available for workload
           | processing.
           | 
           | 0: I use the word "security" only because the teams
           | themselves are named like that. You can probably infer my
           | opinion from the tone.
        
           | the_biot wrote:
           | I definitely see your point -- who hasn't seen or heard of
           | companies ruined by officious rulemakers with no clue, rules
           | to make something more secure that do the exact opposite etc.
           | I've seen my share.
           | 
           | But blanket-banning an obsolete and insecure hash algorithm
           | isn't a bad thing, it's entirely reasonable. In this case, as
           | the article makes clear, it's git that's at fault.
        
         | cratermoon wrote:
         | Except said company likely uses one of the Git forge providers,
         | either in-house or as a SaaS, as the (oxymoronic for git)
         | central repo. Until they support SHA-256, or the company goes
         | with a its own git repo solution that is set up for it,
         | companies won't make the move.
        
           | wepple wrote:
           | Not just git forge but probably the myriad other ancillary
           | tools that assume SHA1
        
         | skissane wrote:
         | > > There are organizations where SHA-1 is blanket banned
         | across the board - regardless of its use.
         | 
         | Reminds me of the time a security audit (which literally just
         | involved running some scanning tool and dumping the results on
         | us) complained that some code I had written was using MD5 - but
         | in a use case in which we weren't relying on it for any
         | security purposes. I ended up replacing MD5 with CRC-32 - which
         | is even weaker than MD5, but made the security scanning tool
         | mark the issue as remediated. It was easier than trying to
         | argue that it was a false positive.
        
           | bawolff wrote:
           | Honestly, this isn't a bad idea.
           | 
           | The big problem with using sha1/md5 in non-secure contexts
           | is:
           | 
           | *Someone later might think its secure and rely on that when
           | extending the system.
           | 
           | *it can make it difficult for security people to audit code
           | later as you have to figure out if each usage is security
           | critical
           | 
           | Using a non crypto hash makes both those concerns go away
           | since everyone knows crc32 is insecure. The alternative of
           | using sha256 also works (performance wise it is close enough,
           | so why not just use the secure one and be done with it.)
        
       | harryvederci wrote:
       | Relevant quote from the Fossil website[0]:
       | 
       | "Fossil started out using 160-bit SHA-1 hashes to identify check-
       | ins, just as in Git. That changed in early 2017 when news of the
       | SHAttered attack broke, demonstrating that SHA-1 collisions were
       | now practical to create. Two weeks later, the creator of Fossil
       | delivered a new release allowing a clean migration to 256-bit
       | SHA-3 with full backwards compatibility to old SHA-1 based
       | repositories. [...] Meanwhile, the Git community took until
       | August 2018 to publish their first plan for solving the same
       | problem by moving to SHA-256, a variant of the older SHA-2
       | algorithm. As of this writing in February 2020, that plan hasn't
       | been implemented, as far as this author is aware, but there is
       | now a competing SHA-256 based plan which requires complete
       | repository conversion from SHA-1 to SHA-256, breaking all public
       | hashes in the repo."
       | 
       | [0]: https://fossil-scm.org/home/doc/trunk/www/fossil-v-
       | git.wiki#...
        
         | ludwigvan wrote:
         | Migrations are easier when you are the only one using your
         | software. :p
         | 
         | Joking aside, expected from a developer whose work is the
         | recommended storage format for Library of Congress.
        
       | Zamicol wrote:
       | This is one of the reasons why Go has its own versioning system.
       | From a project's `go.sum`:
       | 
       | example.com/example v0.0.0-20171218180944-5ea4d0ddac55
       | h1:jbGlDKdzAZ92NzK65hUP98ri0/r50vVVvmZsFP/nIqo=
       | 
       | Where "h1" is an upgradeable hash (h1 is SHA-256). If there's
       | ever a problem with h1, the hash can be simply upgraded.
       | 
       | Git's documentation describes how to sign a git commit:
       | 
       | $ git commit -a -S -m 'signed commit'
       | 
       | When signing a git commit using the built in gpg function the
       | project is not rehashed with a secure hash function, like SHA-256
       | or SHA3-256. Instead gpg signs the SHA-1 commit digest directly.
       | It's not signing the result of a secure hash algorithm.
       | 
       | SHA-1 has been considered weak for a long time (about 17 years).
       | Bruce Schneier warned in February 2005 that SHA-1 needed to be
       | replaced. Git development didn't start until April 2005. Before
       | git started development, SHA-1 was identified as needing
       | deprecation.
        
         | Groxx wrote:
         | There's no need to explicitly version your first version of
         | this though. Those first-version values are easy to identify:
         | they don't contain versioning information :)
         | 
         | E.g. say you have `5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8`.
         | What version is that?
         | 
         | Well. It's exactly as long as a SHA1 hash. It doesn't start
         | with "sha256:" or "md5:" or "h1:" or "rot13:". So it's SHA1.
         | Easy and totally unambiguous.
         | 
         | Versioning can almost always begin with version 2.
        
           | morelisp wrote:
           | Me, sowing: "Each record begins with a 4 octet BE value
           | indicating the record length."
           | 
           | Me, reaping: "Each record begins with a single byte
           | indicating the record format version. In version 0, this is
           | followed by a 3 octet BE value indicating the record length."
        
             | nh23423fefe wrote:
             | sow then reap
        
               | morelisp wrote:
               | Well this fucking sucks. What the fuck.
        
             | Groxx wrote:
             | if you're storing the raw binary rather than hex or base64:
             | yeah. there are often no illegal values, so there's no way
             | to safely extend it, unless you can differentiate on
             | length.
             | 
             | for those, you have to leave versioning room up-front. even
             | 1 bit is enough, since a `1` can imply "following data
             | describes the version", if a bit wastefully in the long
             | run.
        
             | kazinator wrote:
             | That's not applicable to Grox's example. The initial
             | version uses only hexadecimal digits for the SHA256.
             | 
             | If you had: "each record begins with an 8 character record
             | length, in hexadecimal, giving 32 bits", you have no
             | problems. The new version has a 'V' character in byte 0,
             | which is rejected as invalid by the old implementation.
        
               | morelisp wrote:
               | Love too put another branch in the decoder I need to run
               | a billion times.
        
               | guipsp wrote:
               | I beg you: please clone git, do the changes, and
               | benchmark them. I bet you won't be able to obtain a
               | statistically significant result from this single branch.
        
         | bawolff wrote:
         | Versioning hashes is definitely not a new idea with go - just
         | look at how unix stores password hashes.
        
           | barsonme wrote:
           | The author of the comment did not imply this.
        
         | lewisl9029 wrote:
         | Also check out multihash from the IPFS folks:
         | https://github.com/multiformats/multihash
         | 
         | It's a more robust, well-specified, interoperable version of
         | this concept.
         | 
         | Though it's probably overkill if you control both the consumer
         | and producer side (i.e. don't need the interoperability) and
         | are just looking to make hash upgrades smoother, in that case a
         | simple version prefix like Go's approach described above has
         | lower overhead.
        
         | kazinator wrote:
         | Whenever the word "upgrade" rears its head, beware.
         | 
         | The intent behind it is obsolescence and phasing out, resulting
         | in an endless make-work treadmill for the users.
         | 
         | If there is ever a "problem with h1", and you neglect to
         | upgrade your data right there and then, five to ten years, it
         | will be unreadable.
        
           | howinteresting wrote:
           | What in the world are you talking about? Generally, systems
           | with upgradeable hashes will remain backwards-compatible with
           | old ones forever.
        
         | kelnos wrote:
         | I think the implications for Go are a bit different, though.
         | It's a very simple matter to change the hash algorithm used for
         | go.mod. Even if there was no hash version prefix, it's trivial
         | to add one after the fact, though older tools would probably
         | give a confusing error message without foreknowledge of the
         | concept of an unrecognized hash algorithm. And adding a new
         | hash algorithm is just a matter of writing a relatively small
         | amount of code, and then probably waiting a few Go releases
         | before making it the default and assuming most people will have
         | it.
         | 
         | Git's _entire foundation_ relies on SHA1 hashes. Each commit is
         | its own hash, and contains a list of the hashes of all files
         | that are a part of it. Branches have hashes, tags have hashes.
         | Everything has a hash. A repository that uses a different hash
         | algorithm is a completely different repository, even if the
         | contents and commits are otherwise identical. You can 't even
         | _store_ your code on someone else 's server (well, aside from
         | manually copying the repository data over, though that won't be
         | too useful) unless that server has upgraded their git version.
        
           | samatman wrote:
           | The counterpoint: Fossil did it, it was easy, no big deal.
           | 
           | Well, Fossil's database is much better designed, you reply.
           | 
           | That it is!
        
         | er4hn wrote:
         | Just to nit on your portion of signing: wouldn't you need to
         | rehash all prior commits as well so that they used the better
         | hash function? Otherwise someone could find a collision for a
         | prior commit hashed with sha-1, slip that in, and the final
         | commit being hashed with sha256 wouldn't matter.
         | 
         | This then makes the signing code use its own form of hashing
         | that is different from the rest of git's commmit hashing, and
         | seems like a novel way to introduce tooling issues / bugs /
         | etc.
        
           | chimeracoder wrote:
           | > and the final commit being hashed with sha256 wouldn't
           | matter.
           | 
           | Git stores content, not diffs. So the signature verifies all
           | content stored in that commit. It does t verify anything that
           | came before it, unless those are specifically signed as well.
        
             | ElectricalUnion wrote:
             | > Git stores content, not diffs.
             | 
             | But the "contents" is just pointers to tree roots with a
             | trusted hash. If the hash is no longer secure, you can't
             | garantee that any such trees are your content, or safe.
        
       | YesThatTom2 wrote:
       | GitHub won't feel any heat about this until Microsoft salespeople
       | start demanding it.
       | 
       | I've added to my todo list a reminder to raise this issue with
       | mine. In fact, I'm going to give them a deadline for when we will
       | start evaluating competitors that do support SHA256.
       | 
       | I suspect that most people on HN do not interact with their MS
       | account team. That relationship is probably managed by your CIO
       | or IT department. They probably have monthly or quarterly
       | "business review" meetings. You should get this issue on the
       | agenda of that meeting.
        
         | codazoda wrote:
         | Is there something special about GitHub on this? This seems
         | like a Git issue and not a GitHub issue to me; unless I'm
         | missing something.
        
           | lucb1e wrote:
           | They don't accept pushes of repositories in that format.
           | 
           | The article says "none of the Git hosting providers appear to
           | be supporting SHA-256", and while GH is not mentioned by name
           | (and I applaud them for indeed not strengthening this "git ==
           | github-the-brand" trap), I can't imagine GH was left out of
           | scope when checking the major hosting providers.
        
           | evil-olive wrote:
           | as the article says, you can create a local git repository
           | with SHA-256 hashes today, and it should work fine...but the
           | moment you try to push your repo up to Github, you'll hit a
           | brick wall.
           | 
           | Gitlab also appears to be lacking support [0], and the same
           | with Gitea [1].
           | 
           | so it's a grey area where Git itself supports SHA-256-based
           | repos, but without the major Git hosting services _also_
           | supporting them, the support in core Git is somewhat useless.
           | 
           | 0: https://gitlab.com/groups/gitlab-org/-/epics/794
           | 
           | 1: https://github.com/go-gitea/gitea/issues/13794
        
         | vulcan01 wrote:
         | Git is not GitHub and GitHub is not Git. This article is about
         | Git, the software, not GitHub, the Git hosting service.
        
           | ElectricalUnion wrote:
           | Git supports it, GitHub doesn't. People use forges, therefore
           | they are mislead to believe Git doesn't support it.
        
             | chrisseaton wrote:
             | Forges?
        
               | lucb1e wrote:
               | I also found that confusing in the article. I actually
               | listened to it using text-to-speech and thought it was
               | some brand, like sourceforge. But now I think they just
               | mean any git hosting service.
        
               | sroussey wrote:
               | Code hosting was called a forge. Thus codeforge, etc.
        
           | sdfhdhjdw3 wrote:
           | Did you skip the bit that discusses hosting providers?
        
         | lewisl9029 wrote:
         | Just the other day, I was actually forced to downgrade the file
         | hash used in the product I'm working on to sha1 in order to
         | interact with GitHub's APIs efficiently (to avoid having to
         | download the entire file just to recompute a sha256 for
         | matching).
         | 
         | Luckily I've versioned the internal hash so the upgrade path
         | back to sha256 should be as smooth as the downgrade was. I'm
         | still bitter about it though.
        
         | sdfhdhjdw3 wrote:
         | Thank you.
        
         | bradhe wrote:
        
           | girvo wrote:
           | You sound like someone who didn't read the article.
           | 
           | Git basically supports it already. GitHub et al do not, and
           | that is what is holding it back.
        
       | mdavidn wrote:
       | I don't depend on the collision resistance of SHA-1 for the
       | security of my git repos because I don't accept pushes from
       | people I don't trust. If I did, objects with hash collisions
       | would not be transferred or (I hope) accepted. Am I missing
       | something?
       | 
       | Granted, signed tags do depend on this collision resistance, but
       | I don't use that feature. Signing entire releases from a trusted
       | repo seems like a better approach.
        
         | teraflop wrote:
         | It's not just the pushes themselves; anyone who can create
         | commits or blobs that _eventually_ get merged into your
         | repository, directly or indirectly, can potentially engage in a
         | collision attack.
         | 
         | Sure, if you use git with a very closed development model, this
         | doesn't necessarily affect you much. But it's (potentially) a
         | big problem for collaborative open-source projects, because it
         | requires trust in every single contributor. And the trust
         | requirement can't necessarily be mitigated using ordinary means
         | like code reviews.
        
           | pornel wrote:
           | Collision isn't a spooky action at distance. Even if they
           | tricked the victim into accepting a file they have a
           | collision for, they still can't do anything nefarious. Attack
           | requires an opportunity to replace the colliding file with
           | its evil twin, and that requires write access to victim's
           | repository or tricking the victim into re-fetching their
           | files from an attacker-controlled repository.
           | 
           | Besides, the known collision attack generates files with
           | blocks of binary garbage, which makes it difficult to trick
           | someone into accepting. It won't look like source code, and
           | if someone accepts binary blobs of executable code, you don't
           | need collisions to pwn them.
        
       | pornel wrote:
       | The worst thing about the SHA-1 collision is the tedium of
       | explaining the difference between a collision attack and a
       | preimage attack.
        
       | heynowheynow wrote:
       | It might be wiser to keep SHA1 and use SHA2, SHA3, etc. and GPG
       | as overlays for compatibility and simplicity reasons.
        
       | chmaynard wrote:
       | Previous articles on this topic:
       | 
       |  _A new hash algorithm for Git_ https://lwn.net/Articles/811068/
       | 
       |  _Updating the Git protocol for SHA-256_
       | https://lwn.net/Articles/823352/
        
       | ainar-g wrote:
       | The article mentions that "none of the Git hosting providers
       | appear to be supporting SHA-256", but what about self-hosted
       | solutions? In particular, sr.ht. Seems to be nothing[1] in their
       | issue tracker.
       | 
       | [1]: https://todo.sr.ht/~sircmpwn/git.sr.ht?search=sha-256
        
         | oynqr wrote:
         | How about https://todo.sr.ht/~sircmpwn/git.sr.ht?search=sha256
        
           | ainar-g wrote:
           | Hah! I guess there should be one about smarter search as
           | well, heh. Thanks!
        
       | WorldMaker wrote:
       | Almost feels like by the time git finally transitions to SHA-256
       | some bitcoin miner somewhere will have a solved preimage weakness
       | on SHA-256.
        
         | le-mark wrote:
         | Addition modulo 2 paired with xor is a motherfucker ie a very
         | difficult problem. That's not even considering rotation of
         | intermediate results.
        
         | jagger27 wrote:
         | Thankfully existing Bitcoin ASICs don't pose much of a threat
         | because they're only good for sha256(sha256(Bitcoin block)).
         | 
         | If a practical pre-image attack on SHA-256 comes around we have
         | bigger problems than git.
        
           | WorldMaker wrote:
           | Obviously the concern is not the ASICs themselves but the
           | ASIC designers. (Using miners here in the colloquial sense of
           | human collectives/corporations backing the machines than than
           | the specific sense of the raw machines themselves.)
           | 
           | Yes, a practical preimage weakness in SHA-256 is a nightmare
           | scenario with huge implications to the rest of internet
           | security beyond just get. It's why I sometimes can't sleep at
           | night knowing how much energy bitcoin spends daily on a
           | continuous massively distributed partial preimage attack on
           | SHA-256.
        
             | marktangotango wrote:
             | > how much energy bitcoin spends daily on a continuous
             | massively distributed partial preimage attack on SHA-256.
             | 
             | I would not be concerned about this. The way the asics
             | operate is they discard the results. Also, the hashes are
             | random strings which don't compress very well, so storing
             | trillions upon trillions of them (for later analysis) is
             | not practical.
        
               | WorldMaker wrote:
               | _Again_ , the point of the fear is not the specifics of
               | _current_ operations (ASIC details; which y 'all are
               | talking about as if all of the miners are using the same
               | hardware), but the fear of _future_ operations and that
               | there 's an _enormous_ industrial preimage attack effort
               | _at all_. One that we can see in real time, in global
               | energy consumption graphs.
               | 
               | Maybe you find "cold comfort" that because we can watch
               | it in real time if someone discovers a weakness we will
               | also watch its repercussions and the subsequent
               | horrifying fall in real time, too, but I certainly don't.
        
       | mjw1007 wrote:
       | > All that is left is the hard work of making the transition to a
       | new hash easy for users -- what could be thought of as "the other
       | 90%" of the job.
       | 
       | If that was all that was left, we could at least be using sha256
       | for new repositories.
       | 
       | It seems to me the big missing piece is support in libgit2, which
       | is at least showing signs of progress:
       | 
       | https://github.com/libgit2/libgit2/pull/6191
        
         | xyzzy_plugh wrote:
         | libgit2 isn't an official library, and even if it did support
         | sha256 dependents would still need to update, so I really don't
         | perceive this as a missing piece.
         | 
         | If everyone started using sha256 then all these problems would
         | be addressed practically overnight.
        
       | wepple wrote:
       | I was curious about the "sha1dc" that git uses and reportedly
       | helps protect against collision attacks.
       | 
       | Here's the paper: https://marc-
       | stevens.nl/research/papers/C13-S.pdf
        
       | donatj wrote:
       | Potentially stupid question, would it be reasonable to use
       | SHA-256 truncated to the first 40 digits?
       | 
       | It seems like that could ease much of the migration problems if
       | it's not a problem?
        
         | Zamicol wrote:
         | I don't believe the length is a major issue. It's "upgrading"
         | references to a new hashing algorithm that's the issue.
         | 
         | If for some reason length was an issue, a base64 encoded 256
         | bit string, like a SHA-256 digest, is 43 characters. That too
         | can be truncated to 40 characters, which has 238 bits of
         | security. SHA-256 is not only a better hashing algorithm than
         | SHA-1 but it could also result in higher effective security
         | even when truncated.
        
         | jjtheblunt wrote:
         | that makes collisions more likely
        
           | dspillett wrote:
           | It makes random collisions more likely when comparing
           | truncated SHA256 to pure SHA256, but given the collisions and
           | pre-image attacks shown so far is truncated SHA256 still
           | safer than SHA1 in that respect? I have seen an article that
           | claimed so (sorry, I can't re-find it ATM so I can't offer it
           | for criticism, if anyone else has good information either way
           | please respond with relevant links), and it is immune to
           | extension attacks which is a significant advantage if this is
           | part of your threat sensitivity surface and SHA1 is used
           | without other protective wrappers like HMAC.
        
             | bawolff wrote:
             | Truncated sha256 is safer than sha-1 (depending of course
             | on how much you truncate it, but given context lets assume
             | truncating to size of sha-1 - 160 bits).
             | 
             | SHA-1 is quite broken at this point. SHA-256 is not. There
             | aren't any practical non-generic attacks on full sha-256
             | and thus there wouldn't be any on the truncated version.
             | The Wikipedia article goes into the different attacks on
             | the two algorithms.
             | 
             | That said, if your concern is length extension attacks -
             | strongly reccomend using sha-512/256 instead of trying to
             | do your own custom thing.
        
           | pornel wrote:
           | Sigh, no it doesn't in any meaningful way.
           | 
           | 160 bit output, without a cryptographic weakness, is good for
           | about 30 trillion commits per second continuously for 1000
           | years.
           | 
           | For SHA the cryptographic strength isn't primarily from the
           | length of the hash, but from the internal number of rounds is
           | (e.g. 160-bit SHA-1 with fewer rounds has been badly broken
           | way earlier, and 160-bit SHA-1 with more rounds would be
           | safer).
           | 
           | Cryptographic hashes are designed to be safe to truncate and
           | still have all the safety the truncated length can provide.
           | It's basically a requirement for them being cryptographically
           | strong. Even in the SHA-2 family, the SHA-224 and SHA-384 are
           | just truncated versions of larger hashes.
        
         | stingraycharles wrote:
         | I found this, which says that the SHA algorithm allows for
         | truncation:
         | https://csrc.nist.gov/publications/detail/sp/800-107/rev-1/f...
        
           | Dylan16807 wrote:
           | Not just allows, it becomes more secure when you truncate.
        
             | tatersolid wrote:
             | Truncated SHA-* hashes are more secure against length-
             | extension attacks, but are very much _less secure_ against
             | collision and pre-image attacks (which are more important
             | in most scenarios).
        
               | Dylan16807 wrote:
               | But also, 256 is overkill for collisions and pre-image.
               | 
               | There's a point where truncating starts to make it
               | weaker, but when you first start chopping off bytes the
               | benefits outweigh the drawbacks.
        
             | neon_electro wrote:
             | Care to elaborate? This is not something I would've
             | intuited.
        
               | bawolff wrote:
               | Presumably they are referring to length extension
               | attacks. You can't pull them off if you truncate.
               | https://en.m.wikipedia.org/wiki/Length_extension_attack
               | 
               | Generally though length ext attacks have a solution -
               | HMAC, which is much more secure than truncate.
               | 
               | The more you truncate, the more vulnerable you are to
               | birthday attacks (practically speaking you would have to
               | truncate quite a lot)
        
               | kzrdude wrote:
               | The canonical solution is SHA-512/256 i.e 512 truncated
               | to 256 bits where "nothing is lost" compared to SHA-256
               | and something is gained. It might even be faster (due to
               | the 64-bit word formulation of SHA-512) in some
               | implementations.
        
               | NovemberWhiskey wrote:
               | Generally is faster (fewer rounds per byte). If you have
               | 256 bits available for your hash and you're on a 64 bit
               | architecture, I've yet to see a case where you're not
               | better off for performance and security choosing
               | SHA-512/256 over SHA-256, assuming you have the choice.
        
               | ztorkelson wrote:
               | Is this still true? I understood SHA256 to be faster than
               | SHA512 due to hardware acceleration on current CPUs;
               | dedicated instructions exist for the former but not the
               | latter.
        
       | kazinator wrote:
       | > _Given the threat that the SHA-1 hash poses_
       | 
       | I give -3 flying ducks about this, and don't want the Git storage
       | format to be diddled with in any way. Git in 2122 should read
       | _and_ write a git repo made in 2010.
       | 
       | Git is not a public crypto system.
       | 
       | If you think a commit is important and needs to be signed, you
       | need to sign the files and add the signature to the commit.
        
       | kerblang wrote:
       | The part of software engineering they don't teach in college is
       | _migration_. Some of the most creative work you 'll do is
       | figuring out how to get from X to Y without bringing everything
       | crashing down around you (or at least only a couple things
       | crashing down at a time).
        
       | jiggawatts wrote:
       | If you're going to "fix" the hash algorithm, do it properly!
       | 
       | Sha256 can only be computed in a single sequential stream
       | (thread) by definition.
       | 
       | For large files this is increasingly becoming a performance
       | limitation.
       | 
       | A Merkle tree based on SHA512 would have significant benefits.
       | 
       | SHA512 is _faster_ than SHA256 on modern CPUs because processes
       | 64 bits per internal register instead of 32 bits.
       | 
       | A tree-structured hash can be parallelised across all cores.
       | 
       | For repositories with files over 100MB in them on an SSD this
       | would make a noticeable difference...
        
         | dchest wrote:
         | Most git objects are tiny files, so internal tree-based
         | parallelization won't bring much compared to file
         | parallelization (git is a hash tree itself, with variable-
         | length leaves).
         | 
         | SHA256 is actually a lot faster on modern CPUs due to
         | https://en.wikipedia.org/wiki/Intel_SHA_extensions (and similar
         | on Arm), which are implemented for SHA-256 but not for SHA-512,
         | e.g. openssl speed sha256 sha512 on M1:                 type
         | 16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
         | sha256           89474.97k   283341.15k   901724.41k
         | 1730980.24k  2339109.86k       sha512           66160.19k
         | 262139.03k   365675.96k   487572.26k   545142.91k
        
           | jiggawatts wrote:
           | A fair point about the instruction sets, and it is also true
           | that "most" files are small.
           | 
           | But again, due precisely to their size, large files take a
           | disproportionate amount of time to process.
           | 
           | Don't confuse the typical use-case with the fundamental
           | concept: versioning.
           | 
           | Git could be a general purpose versioning system with many
           | more use-cases, but limitations like this hold it back
           | unnecessarily...
        
         | akvadrako wrote:
         | Actually, SHA256 is faster since many common processors have
         | special instructions to accelerate it.
        
       | avar wrote:
       | I'm the person and Git developer (AEvar) quoted in the article. I
       | didn't expect this to end up on LWN. I'm happy to answer any
       | questions here that people might have.
       | 
       | I don't think the LWN article can be said to take anything out of
       | context. But I think it's worth empathizing that this is a thread
       | on the Git ML in response to a user who's asking if Git/SHA-256
       | is something "that users should start changing over to[?]".
       | 
       | I stand by the comments that I think the current state of Git is
       | that we shouldn't be recommending to users that they use SHA-256
       | repositories without explaining some major caveats, mainly to do
       | with third party software support, particularly the lack of
       | support from the big online "forges".
       | 
       | But I don't think there's any disagreement in the Git development
       | community (and certainly not from me) that Git should be moving
       | towards migrating away from SHA-1.
        
         | michaelt wrote:
         | Thanks for your work on Git!
         | 
         |  _> I 'm happy to answer any questions here that people might
         | have._
         | 
         | Is there any way to achieve a gradual, staged rollout of
         | SHA256?
         | 
         | What's the impact of converting a repo to SHA256 - will old
         | commit IDs become invalid? Would signed commits' signatures be
         | invalidated?
        
           | avar wrote:
           | The answer is somewhat hand-waivy, because this code doesn't
           | exist as anything except out-of-tree WIP code (and even in
           | that case, incomplete). But yes, the plan is definitely to
           | support a gradual, hopefully mostly seamless rollout.
           | 
           | The design document for that is shipped as part of git.git,
           | and available online. Here's the relevant part: https://git-
           | scm.com/docs/hash-function-transition/#_translat...
           | 
           | Basically the idea is that you'd have a say a SHA-256 local
           | repository, and talk to a SHA-1 upstream server. Each time
           | you'd "pull" or "push" we'd "rehash" the content (which we do
           | anyway, even when using just one hash).
           | 
           | The interop-specific magic (covered in that documentation) is
           | that we'd use a translation table, so you could e.g. "git
           | show" on a SHA-1 object ID, and we'd be able to serve up the
           | locally packed SHA-256 content as a result.
           | 
           | But the hard parts of this still need to be worked out, and
           | problems shaken out. E.g. for hosting providers what you get
           | when you "git clone" is an already-hashed *.pack file that's
           | mostly served up as-is from disk. For simultaneously serving
           | clients of both hash formats you'd essentially need to double
           | your storage space.
           | 
           | There's also been past in-person developer meet-up discussion
           | (the last one being before Covid, the next one in fall this
           | year) about the gritty details of how such a translation
           | table will function exactly.
           | 
           | E.g. if linux.git switches they'd probably want a "flag day"
           | where they'd transition 100% to SHA-256, but many clients
           | would still probably want the SHA-1<->SHA-256 translation
           | table kept around for older commits, to e.g. look up hash
           | references from something like the mailing list archive, or
           | old comments in ticketing systems.
           | 
           | Currently the answer to how that'll work exactly is that
           | we'll see when someone submits completed patches for that
           | sort of functionality, and doubtless issues & edge cases will
           | emerge that we didn't or couldn't expect until the rubber
           | hits the road.
        
         | tux3 wrote:
         | Has there been any feedback/communication with forges
         | happening, on or off-list?
         | 
         | I'm curious how closely (if at all) they've been following this
         | effort
        
         | lalaland1125 wrote:
         | Have you considered moving over to a combined SHA-1, SHA-256
         | model where both hashes are calculated, with SHA-1 shown to the
         | user and SHA-256 only used in the background to prevent
         | collisions?
         | 
         | There is a compute cost for that, but it should be minimal
         | relative to the security benefits?
        
       | corbet wrote:
       | It's nice to see LWN on HN for the second time in one day, but
       | please remember: it is only LWN subscribers that make this kind
       | of writing possible. If you are enjoying it, please consider
       | becoming a subscriber yourself -- or, even better, getting your
       | employer to subscribe.
        
         | O__________O wrote:
         | For easy of reference, here is the link to subscribe, which
         | includes a description of the benefits:
         | 
         | https://lwn.net/subscribe/Info
         | 
         | And the Wikipedia page for LWN, if you're not familiar with it:
         | 
         | https://en.m.wikipedia.org/wiki/LWN.net
        
         | williadc wrote:
         | Googlers can subscribe through work by visiting go/lwn and
         | following the instructions.
        
         | jra_samba wrote:
         | Just want to second this ! Please subscribe to lwn. I learn new
         | things from lwn every week. It's really worth the money.
        
         | cockhole_desu wrote:
        
       | O__________O wrote:
       | Anyone aware of any exploits tied the SHA-1 weakness in the wild?
       | 
       | (I have seen proofs of concept [1], but never actually heard of
       | an exploit in the wild using it; for example, on: digital
       | certificate signatures, email PGP/GPG signatures, software vendor
       | signatures, software updates, ISO checksums, backup systems,
       | deduplication systems, Git, etc.)
       | 
       | [1] https://shattered.io/
        
         | bawolff wrote:
         | Most security critical systems have switched to sha256 at this
         | point, and making a fresh collision still costs tens of
         | thousands, so people arent really doing it for kicks (that
         | said, once you have one collision you can reuse it for free as
         | long as you keep the same prefix, so the proof of concept can
         | be repurposed with certain constraints).
         | 
         | The most in the wild one i have ever heard of was when webkit
         | accidentally broke their svn repo by checking in a collision.
         | 
         | However you can look at the history of md5 which had a similar
         | flaw which was exploited by the flame malware.
        
           | O__________O wrote:
           | Thanks, agree the Flame's use of a collision attack was both
           | comparable and notable:
           | 
           | https://en.m.wikipedia.org/wiki/Flame_(malware)
        
         | password4321 wrote:
         | Applications of that collision:
         | 
         | https://twitter.com/rauchg/status/834770508633694208 > _a SHA-1
         | "Pinata" [...] claimed_
         | 
         | https://news.ycombinator.com/item?id=13723892 > _Make your own
         | colliding PDFs_
         | 
         | https://news.ycombinator.com/item?id=13917990 > _Collision
         | Detection_
        
       | slim wrote:
       | Why didn't Linux migrate their repo to SHA-256 ? is it that
       | difficult to migrate a repo ?
        
       | ivoras wrote:
       | Is there an explanation of what would go wrong with the naive
       | approach? E.g.:
       | 
       | - Change the binary file format in repos to support arbitrary
       | hash algorithms, in a way which unambigously makes old software
       | fail.
       | 
       | - Increment the Git major version number to 3.0
       | 
       | - Make the new version support both the old version repos and the
       | new ones. Make it a per-repo config item that allows/disallows
       | old/new hash formats. In theory, there's nothing wrong with
       | having objects hashed with mixed algorithms as long as the
       | software knows how to deal with that.
       | 
       | - The old format will probably have to be supported forever
       | because of Linux.
       | 
       | Most user-facing utilities don't care what the hash algo actually
       | is, they just use the hash as an opaque string.
        
         | runeks wrote:
         | Releasing new software is the simple part. The problem is that
         | versioning is lacking in the old software, and therefore it
         | doesn't know how to talk to the new software. So for the old
         | software there's no difference between "invalid data" and "I'm
         | too old, please upgrade me".
        
           | dingleberry420 wrote:
           | > So for the old software there's no difference between
           | "invalid data" and "I'm too old, please upgrade me".
           | 
           | And why is this an issue? Release the new version that can
           | read new repo formats, but doesn't write them yet. Wait a
           | year. Release new version that can write new repo formats and
           | encourage users to upgrade.
           | 
           | Anyone who hasn't upgraded in the past year probably doesn't
           | care about security and should be left behind. Besides, once
           | they google the error message they'll figure it out soon
           | enough. It's not like git is known for its great UX anyway.
        
         | [deleted]
        
         | kzrdude wrote:
         | All of what you wrote, except the version bump, is already
         | implemented. It's the nicer features that are missing, the nice
         | migration path.
        
         | kelnos wrote:
         | > _In theory, there 's nothing wrong with having objects hashed
         | with mixed algorithms as long as the software knows how to deal
         | with that._
         | 
         | That's an interesting idea, actually. I'm not sure they plan to
         | support that, though? That would make things a lot easier on
         | existing repositories; without support for mixed hashes, repos
         | would have to have their history entirely rewritten, which
         | would invalidate things like signed commits/tags.
        
       | TillE wrote:
       | Bjarmason has a good response about the practicalities of an
       | attack; it explains why a "broken" hash is rarely a running-
       | around-with-your-hair-on-fire level emergency. It would clearly
       | be better to use a better hash, but is it actually urgent for
       | anyone? Probably not.
        
       | encryptluks2 wrote:
       | It seems like something more modern, like b3sum would be
       | better... no? What about b2sum?
        
         | kortex wrote:
         | I love the performance of blake3 but my understanding is it's
         | still a bit of the new kid on the block. Blake2 is a SHA-3
         | finalist so should be perfectly sufficient, plus it has
         | variable digest sizes, reasonably fast, and other nice
         | features.
         | 
         | Either way, anything relying on hashes for data integrity
         | should at least be flexible to the option of multiple hash
         | algos. But with git, it's going to be hard enough as is to
         | change to SHA-256, and I don't know how parametric it'll be.
        
         | MrStonedOne wrote:
        
       | bradhe wrote:
       | Does the usage of SHA-1 in Git actually have security
       | implications, though? It's basically only used to generate
       | addresses for refs and hunks and all that.
        
         | nonameiguess wrote:
         | It's difficult to exploit, but possible.
         | 
         | I think the actual issue here is environment accreditation not
         | allowing the use of sha-1 at all, but that is still rare. It'll
         | become a much larger issue if a future FIPS standard ever
         | disallows sha-1, because that will impact a ton of
         | environments. It means git won't even work on your servers any
         | more.
        
         | saghm wrote:
         | I don't think it does; sure, someone could potentially craft a
         | malicious commit that causes a SHA1 collision in your repo, but
         | I think if you are merging commits from malicious authors,
         | you've got way bigger problems than that.
        
           | corbet wrote:
           | ...and if you're merging commits from a developer who,
           | unknown to either of you, had their laptop compromised and
           | their repo corrupted? Remember that the compromise of
           | kernel.org happened via a developer's laptop, and it was only
           | the security of the hash chains that preserved confidence in
           | the repositories stored there.
           | 
           | As noted in the article, an SHA-1 collision attack does not
           | appear practical now, but that is a situation that can
           | change.
        
           | shakna wrote:
           | GitHub actually makes pull requests available as an unlisted
           | part of the original repository under refs/pull/$PR/head and
           | refs/pull/$PR/merge, which allows a malicious author to add
           | themselves to your index, without your involvement.
           | 
           | Not to say that this attack is in any way practical, yet.
           | Just that some providers don't require active involvement to
           | try and attempt it.
        
         | mmastrac wrote:
         | If a repo accepts third-party contributions, you can create a
         | split brain where half the people see one set of contents and
         | the others see a different set, but the same hashes are
         | available.
         | 
         | I don't know if this would survive additional commits on top as
         | I'm not familiar enough with git's internals.
        
           | progval wrote:
           | It will survive it until someone touches the affected blob,
           | then they'll converge to the version that person has.
        
         | blakesterz wrote:
         | The article does address that:
         | 
         | "Given the threat that the SHA-1 hash poses, one might think
         | that there would be a stronger incentive for somebody to
         | support this work. But, as Bjarmason continued, that incentive
         | is not actually all that strong. The project adopted the
         | SHA-1DC variant of SHA-1 for the 2.13 release in 2017, which
         | makes the project more robust against the known SHA-1 collision
         | attacks, so there does not appear to be any sort of imminent
         | threat of this type of attack against Git. Even if creating a
         | collision were feasible for an attacker, Bjarmason pointed out,
         | that is only the first step in the development of a successful
         | attack. Finding a collision of any type is hard; finding one
         | that is still working code, that has the functionality the
         | attacker is after, and that looks reasonable to both humans and
         | compilers is quite a bit harder -- if it is possible at all."
        
           | avar wrote:
           | (I'm the "Bjarmason" quoted in the article)
           | 
           | To elaborate a bit: One thing that makes a viable attack
           | against Git especially hard is that aside from the hash it's
           | using has a behavior of never replacing an already hashed
           | object[1].
           | 
           | So let's say I have a tool that can take a given file & SHA-1
           | pair and produce a collision, the next step is quite hard. I
           | could in this scenario produce a file with an exploit whose
           | hash matches that of Linus's kernel/pid.c or whatever.
           | 
           | But how do I get that object to propagate among forks of
           | linux.git to distribute my exploited code?
           | 
           | If I e.g. push it to a fork of linux.git on a hosting
           | provider that Linus uses the the remote "git-index-pack"
           | process will hash my colliding object, but before it stores
           | it check whether such an object ID exists in its object
           | store, if it does it'll drop it on the floor. You don't need
           | to store data you've already got in a content-addressable
           | filesystem.
           | 
           | Which is not to say that a hash collision is a non-issue, and
           | Git should certainly be migrating from SHA-1. There's no
           | disagreement about that in the Git development community.
           | 
           | But it matters for how much you should panic how the software
           | you're using could be exploited in the case of a hash
           | collision.
           | 
           | Also, the scenario above presupposes a preimage attack, which
           | is a much worse attack on a hash function than a collision
           | attack. Currently no viable preimage attack on SHA-1 exists,
           | only a collision attack.
           | 
           | Which means that before any of the above I'd have to have
           | produced a viable version of say kernel/pid.c that Linus was
           | willing to merge, knowing that my evil twin of that version
           | is something I intended to exploit people with.
           | 
           | Then I'd need to patiently wait for that version to make it
           | into a release, knowing that even a one-byte change to the
           | file would foil my plans...
           | 
           | 1. On the topic of running with scissors: I wrote a patch to
           | disable that collision check for an ex employer, it helped in
           | that I/O-bound setup, and we were confident in the lessened
           | security being a non-issue for us in _that_ particular setup.
           | The patch never made it into git 's mainline. The patch won't
           | apply anymore, but the embedded docs elaborate on the topic: 
           | https://lore.kernel.org/git/20181113201910.11518-1-avarab@gm.
           | ..
        
             | mvkg wrote:
             | Regarding the collision attack replacement check, do you
             | know if that is carried over into other git implementations
             | (e.g. libgit2)?
        
               | avar wrote:
               | I had to look, but in the case of libgit2 yes they have.
               | Like git they have a way to select SHA-1 backends, and
               | the default is the SHA1DC library.
               | 
               | But, even supposing a libgit2 that didn't use SHA1DC I
               | think most users would be protected in practice if the
               | "git" they use used SHA1DC. Hosting providers, local
               | editors etc. use libgit2 for a lot of things, but I think
               | in most cases (certainly in the case of the popular
               | hosting providers) it's some version of "/usr/bin/git"
               | that's handling your push, and actually propagating your
               | objects.
               | 
               | For stopping a colliding hash it's enough that any part
               | of the chain of propagation is able to stop it.
        
           | Salgat wrote:
           | From what I've heard it's as simple as injecting the
           | necessary garbage into a comment to fit the required hash for
           | modified code.
        
             | jandrese wrote:
             | The comment full of random garbage will probably look weird
             | to a human, but by the time a person is looking at the code
             | it will probably be too late.
             | 
             | But you could also hide it as a fake lookup table or inline
             | XPM or something like that.
        
             | prepend wrote:
             | > as simple as injecting the necessary garbage into a
             | comment to fit the required hash for modified code.
             | 
             | This seems true yet there are no demos or documented
             | attacks using this method.
             | 
             | I think practically speaking it's kind of a pain to do.
        
             | bawolff wrote:
             | There is a big difference between having 2 file with the
             | same garbage comment but different content that have the
             | same hash and creating a new file that had a garbage
             | comment and has the same hash as some other file not chosen
             | by the attacker. (preimage vs collision).
             | 
             | Sha1 has a collision attack. We are far away from a
             | preimage attack
        
               | layer8 wrote:
               | There is a middle course: You could get a pull request
               | accepted with good content, but including a sensible
               | comment whose exact wording you can choose, so later you
               | can replace the contents of that commit with malicious
               | code and a garbage comment. Such a collision is easier to
               | create than a preimage attack (because you have _some_
               | control over the preimage), but harder than if you could
               | choose the preimage arbitrarily (which wouldn't be
               | accepted in the pull request). I admit that I have no
               | idea how to quantify the difference in difficulty.
        
         | Zamicol wrote:
         | This is concerning from a signing perspective.
         | 
         | Example: `git commit -a -S -m 'signed commit'` signs the SHA-1
         | hash directly.
         | 
         | Even if the SHA-1 digest is rehashed with a secure hashing
         | algorithm, SHA-256, it would hide the fact that the reference
         | is to an insecure hashing algorithm. The project itself needs
         | to be rehashed with a secure hashing algorithm for signing to
         | be secure.
        
           | Dylan16807 wrote:
           | It's more complicated than that. If the most recent
           | signatures are entirely based on SHA-256, and you trust those
           | signatures sufficiently, then they act as protection for all
           | ancestor commits. In that case a SHA1-based signature on an
           | older commit isn't a big deal.
        
             | Zamicol wrote:
             | >then they act as protection for all ancestor commits
             | 
             | How does that work? My understanding was that a git gpg
             | signature only signs the project at that commit state.
             | 
             | It says nothing about past (or future) commits outside of a
             | digest reference to past commits, which if that digest
             | wasn't upgraded, would be considered insecure.
             | 
             | Said another way: Git does not rehash past commits, or the
             | present commit, when gpg signing. A commit itself only
             | includes the SHA-1 digest of the previous commit.
        
               | layer8 wrote:
               | You are correct. In the AdES signature world, the
               | solution is to have a cryptographic (signed) timestamp
               | using a newer hash algorithm that rehashes all previous
               | commits, and to include that timestamp into a new commit.
               | When verifying the hashes of old commits, the software
               | would verify that those are covered by an appropriate
               | timestamp that proves that they were created before the
               | old hash algorithm was considered too weak.
               | 
               | This is very similar to the following: Instead of
               | rehashing, i.e. replacing old hashes with new hashes, add
               | the new hashes alongside the old ones, and sign the new
               | hashes, together with the time mark, by a trusted
               | authority. The old hashes and signatures then remain
               | valid indefinitely as long as the new hashes and
               | signatures are verified successfully.
        
               | Dylan16807 wrote:
               | If you convert a repo to SHA-256, then surely it will
               | recalculate all the hashes back to the start, right?
               | Otherwise that's not a conversion. And then new
               | signatures will use a hash that's SHA-256 all the way
               | down.
               | 
               | The old signatures will still be SHA-1. But if you try to
               | replace any part of a commit, the SHA-256 won't match. So
               | the combination of "the commit is an ancestor of multiple
               | securely signed commits in this repo" and "the SHA1 on
               | the signature matches" is enough to know you have the
               | right data in most use cases.
        
       | sdfhdhjdw3 wrote:
       | > Even if creating a collision were feasible for an attacker,
       | Bjarmason pointed out, that is only the first step in the
       | development of a successful attack. Finding a collision of any
       | type is hard; finding one that is still working code, that has
       | the functionality the attacker is after, and that looks
       | reasonable to both humans and compilers is quite a bit harder --
       | if it is possible at all.
       | 
       | Sounds like there's money in this.
        
       | simias wrote:
       | It's frankly amateurish for the git dev to delay this. The longer
       | this lasts, the more painful it'll be whenever the switch will
       | finally take place.
       | 
       | Linus shouldn't have used SHA-1 in the first place, it was
       | already being deprecated by the time git got its original
       | release. Then every time a new milestone is reached to break
       | SHA-1 we see the same rationalization about how it's not a big
       | deal and it's not a direct threat to git and blablabla.
       | 
       | It'll keep not mattering until it matters and the longer their
       | wait the more churn it'll create. Let's rip the bandaid that's
       | been hanging there for over 15 years now.
        
         | hinkley wrote:
         | I worked on code signing for civilian aviation years ago and
         | there were people trying to pressure me into supporting MD5 and
         | SHA-1 signatures. I told the first group to jump off a cliff,
         | and the second group got a firm no. The first papers on
         | theoretical SHA-1 attacks had already been published, we were
         | still a couple years out from active use, and people were
         | already beginning to talk about starting to organize the SHA-3
         | process.
         | 
         | Once a system expects to handle SHA-1, then you have to deal
         | with old assets that have deprecated signatures, and that's a
         | fight I 1) didn't want to have and 2) was fairly sure I
         | wouldn't be around to win.
         | 
         | Git was still brand new, largely unproven at that point, and I
         | don't understand why he picked SHA-1.
        
         | runeks wrote:
         | > Linus shouldn't have used SHA-1 in the first place, it was
         | already being deprecated by the time git got its original
         | release.
         | 
         | Using SHA-1 to begin with was fine. However, commit hashes
         | should have been prepended with a version byte to make it
         | easier to transition to the next hash algorithm.
         | 
         | This would mean an old Git client could report an error to the
         | user of the nature "please upgrade your software to support
         | cloning from this Git server" instead of failing with an error
         | that's inseparable from "the Git server is broken" when trying
         | to clone a Git repo using SHA-256.
        
           | jackweirdy wrote:
           | There's already a version byte: if it's [0-9a-f], that's
           | version 1 ;)
        
             | LeifCarrotson wrote:
             | That's a 4-bit nibble, the version byte is 0x00 to 0xFF.
        
           | layer8 wrote:
           | The problem is not a missing version byte. SHA-256 is
           | trivially distinguishable from SHA-1 by hash length. The
           | problem is that that the length of a SHA-1 hash (20 bytes) is
           | (or was) hardcoded in too many places.
        
           | simias wrote:
           | By the time Git was first released the first attacks on SHA-1
           | had already been published, but I agree with your general
           | point about allowing for backward compatible updates.
        
         | wahern wrote:
         | Linus' original excuse for using SHA-1 was that Git hash trees
         | and hash identifiers were never meant to be cryptographically
         | secure. GnuPG signing support, the popular belief that Git
         | trees had a strong security property, etc, came afterward,
         | along with increasingly awkward excuse-making.
         | 
         | So strictly speaking Linus and subsequent maintainers weren't
         | being amateurish in the beginning. (You didn't say that
         | explicitly, but it would be a fair criticism given what was
         | known about SHA-1 at the time, including known by Linus--he
         | knew and made a choice.) Rather, in the beginning it was
         | naivety in believing that people wouldn't begin to depend on
         | Git's apparent security properties.
        
           | jopsen wrote:
           | Yeah, on hindsight maybe he should have made his own 160bit
           | CRC variant :)
           | 
           | Honestly, I think it's fair to say that hashes isn't meant to
           | be a security feature.
           | 
           | But signed tags/commits/etc. probably need a better hash.
        
       ___________________________________________________________________
       (page generated 2022-06-23 23:00 UTC)