[HN Gopher] Whatever happened to SHA-256 support in Git?
___________________________________________________________________
Whatever happened to SHA-256 support in Git?
Author : chmaynard
Score : 267 points
Date : 2022-06-23 16:47 UTC (6 hours ago)
(HTM) web link (lwn.net)
(TXT) w3m dump (lwn.net)
| yjftsjthsd-h wrote:
| > In his view, the only "defensible" reason to use SHA-1 at this
| point is interoperability with the Git forge providers.
|
| Okay, but that's a pretty big reason! A git repo that can't be
| pushed to github/lab is... not always useless, but certainly
| extremely impaired.
| kragen wrote:
| In case anyone has forgotten, the process for pushing it to
| your own server is three shell commands. You run, on the
| server: git init --bare
| public_html/mything.git cd
| public_html/mything.git/hooks/ mv post-update.sample
| post-update # runs git update-server-info on push
|
| (This assumes that your public_html directory exists and is
| mapped into webspace, as with the usual configuration of
| Apache, NCSA httpd, and CERN httpd. If you don't have an
| account on such a thing you can get such PHP shared hosting
| accounts with shell access anywhere in the world for a dollar
| or two a month.)
|
| And then on your dev machine, it's precisely the same as for
| pushing to Gitlab or whatever, except that you use your own
| username instead of git@: git remote add
| someremotename user@myserver:public_html/mything.git
| git push -u someremotename master # assuming you want it to be
| your upstream
|
| Then anyone can clone from your repo with a command like this:
| git clone https://myserver/~user/mything.git
|
| They can also add the URL as a remote for pulls.
|
| If you want them to be able to push, you'll need to give them
| an account on the same server and either set umasks and group
| ownerships and permissions appropriately or set a POSIX ACL.
| Alternatively they can do the same thing on their server and
| you can pull from it. There are reportedly permission bugs in
| recent versions of Git (the last five years) that prevent this
| from being safe with people you don't trust
| (https://www.spinics.net/lists/git/msg298544.html).
|
| Of course source control is only part of the overall
| development project workflow, so for many purposes adding
| SHA-256 support to Gogs or Gitlab or Gitea or sr.ht is probably
| pretty important: you want a Wiki and CI integration and bug
| tracking and merge requests. But the _git repo_ still works
| fine with a bog-standard ssh and HTTP server, though slightly
| less efficiently. It 's _easier_ than setting up a new repo on
| GitLab etc.
|
| Running a git repack -an && git update-server-info in the repo
| on the server can help a lot with the efficiency, and for
| having a browseable tree on the server as well as a clonable
| repo I put this script at
| http://canonical.org/~kragen/sw/dev3.git/hooks/post-update:
| #!/bin/sh set -e echo -n 'updating... '
| git update-server-info echo 'done. going to dev3'
| cd /home/kragen/public_html/sw/dev3 echo -n 'pulling...
| ' env -u GIT_DIR git pull echo -n 'updating...
| ' env -u GIT_DIR git update-server-info echo
| 'done.'
|
| That's very far from being GitLab (contrast
| http://canonical.org/~kragen/sw/dev3 with any GitHub tree
| view), and it's potentially dangerously powerful: if you're
| doing this in a repo where you pull from other people, and the
| server is configured to run PHP files or server-side includes
| in your webspace (mine isn't!) or CGI scripts (mine is!), then
| just dropping a file in the repo can run programs on the server
| with your account privileges. This is great if that's what you
| want, and it's a hell of a lot better than updating your PHP
| site over FTP, but that code has full authority to, for
| example, rewrite your Git history.
|
| In theory you can do other things from your post-update hook as
| well, like rebuild a Jekyll site, send a message on IRC or some
| other message queueing system, or fire off a CI build in a
| Docker container. (Some of these would run afoul of guardrails
| common in cheap PHP shared hosting providers and you'd have to
| upgrade to a US$5/month VPS.)
| isomorphic wrote:
| People also forget about Gitolite, which provides lightweight
| shared access control around Git+SSH+server-repos. For me
| it's a much simpler alternative than systems with a
| heavyweight web UI. Although to be honest I don't know
| whether Gitolite handles SHA256 hashes (I've never tested
| it).
|
| https://gitolite.com
|
| https://github.com/sitaramc/gitolite
| kragen wrote:
| I did forget about Gitolite! Thanks for the reminder! Do
| you have suggestions for what sorts of CI tooling and bug
| trackers people might want to use with it?
| armada651 wrote:
| > Adding my own 0.02, what some of us are facing is resistance to
| adopting git in our or client organizations because of the
| presence of SHA-1. There are organizations where SHA-1 is blanket
| banned across the board - regardless of its use. [...] Getting
| around this blanket ban is a serious amount of work and I have
| very recently seen customers move to older much less functional
| (or useful) VCS platforms just because of SHA-1.
|
| Seems like this company could just use the current SHA-256
| support then? Especially if it's the type of company that does
| all its development in-house and there's no need for SHA-1
| interoperability.
| gorkish wrote:
| > There are organizations where SHA-1 is blanket banned across
| the board - regardless of its use.
|
| > I have very recently seen customers move to older much less
| functional (or useful) VCS platforms just because of SHA-1.
|
| A company this dysfunctional has problems far beyond their
| choice of revision control system.
| bostik wrote:
| I can name a couple of industries where compliance (and their
| enforcement arm, security[0]) teams require N+1 different
| monitoring and enforcement agents on all systems because
| Compliance[TM]. Due to these agents the systems' _IDLE_ load
| is approaching 1.00 - on a good day. On a less good you need
| four cores to have one of them available for workload
| processing.
|
| 0: I use the word "security" only because the teams
| themselves are named like that. You can probably infer my
| opinion from the tone.
| the_biot wrote:
| I definitely see your point -- who hasn't seen or heard of
| companies ruined by officious rulemakers with no clue, rules
| to make something more secure that do the exact opposite etc.
| I've seen my share.
|
| But blanket-banning an obsolete and insecure hash algorithm
| isn't a bad thing, it's entirely reasonable. In this case, as
| the article makes clear, it's git that's at fault.
| cratermoon wrote:
| Except said company likely uses one of the Git forge providers,
| either in-house or as a SaaS, as the (oxymoronic for git)
| central repo. Until they support SHA-256, or the company goes
| with a its own git repo solution that is set up for it,
| companies won't make the move.
| wepple wrote:
| Not just git forge but probably the myriad other ancillary
| tools that assume SHA1
| skissane wrote:
| > > There are organizations where SHA-1 is blanket banned
| across the board - regardless of its use.
|
| Reminds me of the time a security audit (which literally just
| involved running some scanning tool and dumping the results on
| us) complained that some code I had written was using MD5 - but
| in a use case in which we weren't relying on it for any
| security purposes. I ended up replacing MD5 with CRC-32 - which
| is even weaker than MD5, but made the security scanning tool
| mark the issue as remediated. It was easier than trying to
| argue that it was a false positive.
| bawolff wrote:
| Honestly, this isn't a bad idea.
|
| The big problem with using sha1/md5 in non-secure contexts
| is:
|
| *Someone later might think its secure and rely on that when
| extending the system.
|
| *it can make it difficult for security people to audit code
| later as you have to figure out if each usage is security
| critical
|
| Using a non crypto hash makes both those concerns go away
| since everyone knows crc32 is insecure. The alternative of
| using sha256 also works (performance wise it is close enough,
| so why not just use the secure one and be done with it.)
| harryvederci wrote:
| Relevant quote from the Fossil website[0]:
|
| "Fossil started out using 160-bit SHA-1 hashes to identify check-
| ins, just as in Git. That changed in early 2017 when news of the
| SHAttered attack broke, demonstrating that SHA-1 collisions were
| now practical to create. Two weeks later, the creator of Fossil
| delivered a new release allowing a clean migration to 256-bit
| SHA-3 with full backwards compatibility to old SHA-1 based
| repositories. [...] Meanwhile, the Git community took until
| August 2018 to publish their first plan for solving the same
| problem by moving to SHA-256, a variant of the older SHA-2
| algorithm. As of this writing in February 2020, that plan hasn't
| been implemented, as far as this author is aware, but there is
| now a competing SHA-256 based plan which requires complete
| repository conversion from SHA-1 to SHA-256, breaking all public
| hashes in the repo."
|
| [0]: https://fossil-scm.org/home/doc/trunk/www/fossil-v-
| git.wiki#...
| ludwigvan wrote:
| Migrations are easier when you are the only one using your
| software. :p
|
| Joking aside, expected from a developer whose work is the
| recommended storage format for Library of Congress.
| Zamicol wrote:
| This is one of the reasons why Go has its own versioning system.
| From a project's `go.sum`:
|
| example.com/example v0.0.0-20171218180944-5ea4d0ddac55
| h1:jbGlDKdzAZ92NzK65hUP98ri0/r50vVVvmZsFP/nIqo=
|
| Where "h1" is an upgradeable hash (h1 is SHA-256). If there's
| ever a problem with h1, the hash can be simply upgraded.
|
| Git's documentation describes how to sign a git commit:
|
| $ git commit -a -S -m 'signed commit'
|
| When signing a git commit using the built in gpg function the
| project is not rehashed with a secure hash function, like SHA-256
| or SHA3-256. Instead gpg signs the SHA-1 commit digest directly.
| It's not signing the result of a secure hash algorithm.
|
| SHA-1 has been considered weak for a long time (about 17 years).
| Bruce Schneier warned in February 2005 that SHA-1 needed to be
| replaced. Git development didn't start until April 2005. Before
| git started development, SHA-1 was identified as needing
| deprecation.
| Groxx wrote:
| There's no need to explicitly version your first version of
| this though. Those first-version values are easy to identify:
| they don't contain versioning information :)
|
| E.g. say you have `5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8`.
| What version is that?
|
| Well. It's exactly as long as a SHA1 hash. It doesn't start
| with "sha256:" or "md5:" or "h1:" or "rot13:". So it's SHA1.
| Easy and totally unambiguous.
|
| Versioning can almost always begin with version 2.
| morelisp wrote:
| Me, sowing: "Each record begins with a 4 octet BE value
| indicating the record length."
|
| Me, reaping: "Each record begins with a single byte
| indicating the record format version. In version 0, this is
| followed by a 3 octet BE value indicating the record length."
| nh23423fefe wrote:
| sow then reap
| morelisp wrote:
| Well this fucking sucks. What the fuck.
| Groxx wrote:
| if you're storing the raw binary rather than hex or base64:
| yeah. there are often no illegal values, so there's no way
| to safely extend it, unless you can differentiate on
| length.
|
| for those, you have to leave versioning room up-front. even
| 1 bit is enough, since a `1` can imply "following data
| describes the version", if a bit wastefully in the long
| run.
| kazinator wrote:
| That's not applicable to Grox's example. The initial
| version uses only hexadecimal digits for the SHA256.
|
| If you had: "each record begins with an 8 character record
| length, in hexadecimal, giving 32 bits", you have no
| problems. The new version has a 'V' character in byte 0,
| which is rejected as invalid by the old implementation.
| morelisp wrote:
| Love too put another branch in the decoder I need to run
| a billion times.
| guipsp wrote:
| I beg you: please clone git, do the changes, and
| benchmark them. I bet you won't be able to obtain a
| statistically significant result from this single branch.
| bawolff wrote:
| Versioning hashes is definitely not a new idea with go - just
| look at how unix stores password hashes.
| barsonme wrote:
| The author of the comment did not imply this.
| lewisl9029 wrote:
| Also check out multihash from the IPFS folks:
| https://github.com/multiformats/multihash
|
| It's a more robust, well-specified, interoperable version of
| this concept.
|
| Though it's probably overkill if you control both the consumer
| and producer side (i.e. don't need the interoperability) and
| are just looking to make hash upgrades smoother, in that case a
| simple version prefix like Go's approach described above has
| lower overhead.
| kazinator wrote:
| Whenever the word "upgrade" rears its head, beware.
|
| The intent behind it is obsolescence and phasing out, resulting
| in an endless make-work treadmill for the users.
|
| If there is ever a "problem with h1", and you neglect to
| upgrade your data right there and then, five to ten years, it
| will be unreadable.
| howinteresting wrote:
| What in the world are you talking about? Generally, systems
| with upgradeable hashes will remain backwards-compatible with
| old ones forever.
| kelnos wrote:
| I think the implications for Go are a bit different, though.
| It's a very simple matter to change the hash algorithm used for
| go.mod. Even if there was no hash version prefix, it's trivial
| to add one after the fact, though older tools would probably
| give a confusing error message without foreknowledge of the
| concept of an unrecognized hash algorithm. And adding a new
| hash algorithm is just a matter of writing a relatively small
| amount of code, and then probably waiting a few Go releases
| before making it the default and assuming most people will have
| it.
|
| Git's _entire foundation_ relies on SHA1 hashes. Each commit is
| its own hash, and contains a list of the hashes of all files
| that are a part of it. Branches have hashes, tags have hashes.
| Everything has a hash. A repository that uses a different hash
| algorithm is a completely different repository, even if the
| contents and commits are otherwise identical. You can 't even
| _store_ your code on someone else 's server (well, aside from
| manually copying the repository data over, though that won't be
| too useful) unless that server has upgraded their git version.
| samatman wrote:
| The counterpoint: Fossil did it, it was easy, no big deal.
|
| Well, Fossil's database is much better designed, you reply.
|
| That it is!
| er4hn wrote:
| Just to nit on your portion of signing: wouldn't you need to
| rehash all prior commits as well so that they used the better
| hash function? Otherwise someone could find a collision for a
| prior commit hashed with sha-1, slip that in, and the final
| commit being hashed with sha256 wouldn't matter.
|
| This then makes the signing code use its own form of hashing
| that is different from the rest of git's commmit hashing, and
| seems like a novel way to introduce tooling issues / bugs /
| etc.
| chimeracoder wrote:
| > and the final commit being hashed with sha256 wouldn't
| matter.
|
| Git stores content, not diffs. So the signature verifies all
| content stored in that commit. It does t verify anything that
| came before it, unless those are specifically signed as well.
| ElectricalUnion wrote:
| > Git stores content, not diffs.
|
| But the "contents" is just pointers to tree roots with a
| trusted hash. If the hash is no longer secure, you can't
| garantee that any such trees are your content, or safe.
| YesThatTom2 wrote:
| GitHub won't feel any heat about this until Microsoft salespeople
| start demanding it.
|
| I've added to my todo list a reminder to raise this issue with
| mine. In fact, I'm going to give them a deadline for when we will
| start evaluating competitors that do support SHA256.
|
| I suspect that most people on HN do not interact with their MS
| account team. That relationship is probably managed by your CIO
| or IT department. They probably have monthly or quarterly
| "business review" meetings. You should get this issue on the
| agenda of that meeting.
| codazoda wrote:
| Is there something special about GitHub on this? This seems
| like a Git issue and not a GitHub issue to me; unless I'm
| missing something.
| lucb1e wrote:
| They don't accept pushes of repositories in that format.
|
| The article says "none of the Git hosting providers appear to
| be supporting SHA-256", and while GH is not mentioned by name
| (and I applaud them for indeed not strengthening this "git ==
| github-the-brand" trap), I can't imagine GH was left out of
| scope when checking the major hosting providers.
| evil-olive wrote:
| as the article says, you can create a local git repository
| with SHA-256 hashes today, and it should work fine...but the
| moment you try to push your repo up to Github, you'll hit a
| brick wall.
|
| Gitlab also appears to be lacking support [0], and the same
| with Gitea [1].
|
| so it's a grey area where Git itself supports SHA-256-based
| repos, but without the major Git hosting services _also_
| supporting them, the support in core Git is somewhat useless.
|
| 0: https://gitlab.com/groups/gitlab-org/-/epics/794
|
| 1: https://github.com/go-gitea/gitea/issues/13794
| vulcan01 wrote:
| Git is not GitHub and GitHub is not Git. This article is about
| Git, the software, not GitHub, the Git hosting service.
| ElectricalUnion wrote:
| Git supports it, GitHub doesn't. People use forges, therefore
| they are mislead to believe Git doesn't support it.
| chrisseaton wrote:
| Forges?
| lucb1e wrote:
| I also found that confusing in the article. I actually
| listened to it using text-to-speech and thought it was
| some brand, like sourceforge. But now I think they just
| mean any git hosting service.
| sroussey wrote:
| Code hosting was called a forge. Thus codeforge, etc.
| sdfhdhjdw3 wrote:
| Did you skip the bit that discusses hosting providers?
| lewisl9029 wrote:
| Just the other day, I was actually forced to downgrade the file
| hash used in the product I'm working on to sha1 in order to
| interact with GitHub's APIs efficiently (to avoid having to
| download the entire file just to recompute a sha256 for
| matching).
|
| Luckily I've versioned the internal hash so the upgrade path
| back to sha256 should be as smooth as the downgrade was. I'm
| still bitter about it though.
| sdfhdhjdw3 wrote:
| Thank you.
| bradhe wrote:
| girvo wrote:
| You sound like someone who didn't read the article.
|
| Git basically supports it already. GitHub et al do not, and
| that is what is holding it back.
| mdavidn wrote:
| I don't depend on the collision resistance of SHA-1 for the
| security of my git repos because I don't accept pushes from
| people I don't trust. If I did, objects with hash collisions
| would not be transferred or (I hope) accepted. Am I missing
| something?
|
| Granted, signed tags do depend on this collision resistance, but
| I don't use that feature. Signing entire releases from a trusted
| repo seems like a better approach.
| teraflop wrote:
| It's not just the pushes themselves; anyone who can create
| commits or blobs that _eventually_ get merged into your
| repository, directly or indirectly, can potentially engage in a
| collision attack.
|
| Sure, if you use git with a very closed development model, this
| doesn't necessarily affect you much. But it's (potentially) a
| big problem for collaborative open-source projects, because it
| requires trust in every single contributor. And the trust
| requirement can't necessarily be mitigated using ordinary means
| like code reviews.
| pornel wrote:
| Collision isn't a spooky action at distance. Even if they
| tricked the victim into accepting a file they have a
| collision for, they still can't do anything nefarious. Attack
| requires an opportunity to replace the colliding file with
| its evil twin, and that requires write access to victim's
| repository or tricking the victim into re-fetching their
| files from an attacker-controlled repository.
|
| Besides, the known collision attack generates files with
| blocks of binary garbage, which makes it difficult to trick
| someone into accepting. It won't look like source code, and
| if someone accepts binary blobs of executable code, you don't
| need collisions to pwn them.
| pornel wrote:
| The worst thing about the SHA-1 collision is the tedium of
| explaining the difference between a collision attack and a
| preimage attack.
| heynowheynow wrote:
| It might be wiser to keep SHA1 and use SHA2, SHA3, etc. and GPG
| as overlays for compatibility and simplicity reasons.
| chmaynard wrote:
| Previous articles on this topic:
|
| _A new hash algorithm for Git_ https://lwn.net/Articles/811068/
|
| _Updating the Git protocol for SHA-256_
| https://lwn.net/Articles/823352/
| ainar-g wrote:
| The article mentions that "none of the Git hosting providers
| appear to be supporting SHA-256", but what about self-hosted
| solutions? In particular, sr.ht. Seems to be nothing[1] in their
| issue tracker.
|
| [1]: https://todo.sr.ht/~sircmpwn/git.sr.ht?search=sha-256
| oynqr wrote:
| How about https://todo.sr.ht/~sircmpwn/git.sr.ht?search=sha256
| ainar-g wrote:
| Hah! I guess there should be one about smarter search as
| well, heh. Thanks!
| WorldMaker wrote:
| Almost feels like by the time git finally transitions to SHA-256
| some bitcoin miner somewhere will have a solved preimage weakness
| on SHA-256.
| le-mark wrote:
| Addition modulo 2 paired with xor is a motherfucker ie a very
| difficult problem. That's not even considering rotation of
| intermediate results.
| jagger27 wrote:
| Thankfully existing Bitcoin ASICs don't pose much of a threat
| because they're only good for sha256(sha256(Bitcoin block)).
|
| If a practical pre-image attack on SHA-256 comes around we have
| bigger problems than git.
| WorldMaker wrote:
| Obviously the concern is not the ASICs themselves but the
| ASIC designers. (Using miners here in the colloquial sense of
| human collectives/corporations backing the machines than than
| the specific sense of the raw machines themselves.)
|
| Yes, a practical preimage weakness in SHA-256 is a nightmare
| scenario with huge implications to the rest of internet
| security beyond just get. It's why I sometimes can't sleep at
| night knowing how much energy bitcoin spends daily on a
| continuous massively distributed partial preimage attack on
| SHA-256.
| marktangotango wrote:
| > how much energy bitcoin spends daily on a continuous
| massively distributed partial preimage attack on SHA-256.
|
| I would not be concerned about this. The way the asics
| operate is they discard the results. Also, the hashes are
| random strings which don't compress very well, so storing
| trillions upon trillions of them (for later analysis) is
| not practical.
| WorldMaker wrote:
| _Again_ , the point of the fear is not the specifics of
| _current_ operations (ASIC details; which y 'all are
| talking about as if all of the miners are using the same
| hardware), but the fear of _future_ operations and that
| there 's an _enormous_ industrial preimage attack effort
| _at all_. One that we can see in real time, in global
| energy consumption graphs.
|
| Maybe you find "cold comfort" that because we can watch
| it in real time if someone discovers a weakness we will
| also watch its repercussions and the subsequent
| horrifying fall in real time, too, but I certainly don't.
| mjw1007 wrote:
| > All that is left is the hard work of making the transition to a
| new hash easy for users -- what could be thought of as "the other
| 90%" of the job.
|
| If that was all that was left, we could at least be using sha256
| for new repositories.
|
| It seems to me the big missing piece is support in libgit2, which
| is at least showing signs of progress:
|
| https://github.com/libgit2/libgit2/pull/6191
| xyzzy_plugh wrote:
| libgit2 isn't an official library, and even if it did support
| sha256 dependents would still need to update, so I really don't
| perceive this as a missing piece.
|
| If everyone started using sha256 then all these problems would
| be addressed practically overnight.
| wepple wrote:
| I was curious about the "sha1dc" that git uses and reportedly
| helps protect against collision attacks.
|
| Here's the paper: https://marc-
| stevens.nl/research/papers/C13-S.pdf
| donatj wrote:
| Potentially stupid question, would it be reasonable to use
| SHA-256 truncated to the first 40 digits?
|
| It seems like that could ease much of the migration problems if
| it's not a problem?
| Zamicol wrote:
| I don't believe the length is a major issue. It's "upgrading"
| references to a new hashing algorithm that's the issue.
|
| If for some reason length was an issue, a base64 encoded 256
| bit string, like a SHA-256 digest, is 43 characters. That too
| can be truncated to 40 characters, which has 238 bits of
| security. SHA-256 is not only a better hashing algorithm than
| SHA-1 but it could also result in higher effective security
| even when truncated.
| jjtheblunt wrote:
| that makes collisions more likely
| dspillett wrote:
| It makes random collisions more likely when comparing
| truncated SHA256 to pure SHA256, but given the collisions and
| pre-image attacks shown so far is truncated SHA256 still
| safer than SHA1 in that respect? I have seen an article that
| claimed so (sorry, I can't re-find it ATM so I can't offer it
| for criticism, if anyone else has good information either way
| please respond with relevant links), and it is immune to
| extension attacks which is a significant advantage if this is
| part of your threat sensitivity surface and SHA1 is used
| without other protective wrappers like HMAC.
| bawolff wrote:
| Truncated sha256 is safer than sha-1 (depending of course
| on how much you truncate it, but given context lets assume
| truncating to size of sha-1 - 160 bits).
|
| SHA-1 is quite broken at this point. SHA-256 is not. There
| aren't any practical non-generic attacks on full sha-256
| and thus there wouldn't be any on the truncated version.
| The Wikipedia article goes into the different attacks on
| the two algorithms.
|
| That said, if your concern is length extension attacks -
| strongly reccomend using sha-512/256 instead of trying to
| do your own custom thing.
| pornel wrote:
| Sigh, no it doesn't in any meaningful way.
|
| 160 bit output, without a cryptographic weakness, is good for
| about 30 trillion commits per second continuously for 1000
| years.
|
| For SHA the cryptographic strength isn't primarily from the
| length of the hash, but from the internal number of rounds is
| (e.g. 160-bit SHA-1 with fewer rounds has been badly broken
| way earlier, and 160-bit SHA-1 with more rounds would be
| safer).
|
| Cryptographic hashes are designed to be safe to truncate and
| still have all the safety the truncated length can provide.
| It's basically a requirement for them being cryptographically
| strong. Even in the SHA-2 family, the SHA-224 and SHA-384 are
| just truncated versions of larger hashes.
| stingraycharles wrote:
| I found this, which says that the SHA algorithm allows for
| truncation:
| https://csrc.nist.gov/publications/detail/sp/800-107/rev-1/f...
| Dylan16807 wrote:
| Not just allows, it becomes more secure when you truncate.
| tatersolid wrote:
| Truncated SHA-* hashes are more secure against length-
| extension attacks, but are very much _less secure_ against
| collision and pre-image attacks (which are more important
| in most scenarios).
| Dylan16807 wrote:
| But also, 256 is overkill for collisions and pre-image.
|
| There's a point where truncating starts to make it
| weaker, but when you first start chopping off bytes the
| benefits outweigh the drawbacks.
| neon_electro wrote:
| Care to elaborate? This is not something I would've
| intuited.
| bawolff wrote:
| Presumably they are referring to length extension
| attacks. You can't pull them off if you truncate.
| https://en.m.wikipedia.org/wiki/Length_extension_attack
|
| Generally though length ext attacks have a solution -
| HMAC, which is much more secure than truncate.
|
| The more you truncate, the more vulnerable you are to
| birthday attacks (practically speaking you would have to
| truncate quite a lot)
| kzrdude wrote:
| The canonical solution is SHA-512/256 i.e 512 truncated
| to 256 bits where "nothing is lost" compared to SHA-256
| and something is gained. It might even be faster (due to
| the 64-bit word formulation of SHA-512) in some
| implementations.
| NovemberWhiskey wrote:
| Generally is faster (fewer rounds per byte). If you have
| 256 bits available for your hash and you're on a 64 bit
| architecture, I've yet to see a case where you're not
| better off for performance and security choosing
| SHA-512/256 over SHA-256, assuming you have the choice.
| ztorkelson wrote:
| Is this still true? I understood SHA256 to be faster than
| SHA512 due to hardware acceleration on current CPUs;
| dedicated instructions exist for the former but not the
| latter.
| kazinator wrote:
| > _Given the threat that the SHA-1 hash poses_
|
| I give -3 flying ducks about this, and don't want the Git storage
| format to be diddled with in any way. Git in 2122 should read
| _and_ write a git repo made in 2010.
|
| Git is not a public crypto system.
|
| If you think a commit is important and needs to be signed, you
| need to sign the files and add the signature to the commit.
| kerblang wrote:
| The part of software engineering they don't teach in college is
| _migration_. Some of the most creative work you 'll do is
| figuring out how to get from X to Y without bringing everything
| crashing down around you (or at least only a couple things
| crashing down at a time).
| jiggawatts wrote:
| If you're going to "fix" the hash algorithm, do it properly!
|
| Sha256 can only be computed in a single sequential stream
| (thread) by definition.
|
| For large files this is increasingly becoming a performance
| limitation.
|
| A Merkle tree based on SHA512 would have significant benefits.
|
| SHA512 is _faster_ than SHA256 on modern CPUs because processes
| 64 bits per internal register instead of 32 bits.
|
| A tree-structured hash can be parallelised across all cores.
|
| For repositories with files over 100MB in them on an SSD this
| would make a noticeable difference...
| dchest wrote:
| Most git objects are tiny files, so internal tree-based
| parallelization won't bring much compared to file
| parallelization (git is a hash tree itself, with variable-
| length leaves).
|
| SHA256 is actually a lot faster on modern CPUs due to
| https://en.wikipedia.org/wiki/Intel_SHA_extensions (and similar
| on Arm), which are implemented for SHA-256 but not for SHA-512,
| e.g. openssl speed sha256 sha512 on M1: type
| 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
| sha256 89474.97k 283341.15k 901724.41k
| 1730980.24k 2339109.86k sha512 66160.19k
| 262139.03k 365675.96k 487572.26k 545142.91k
| jiggawatts wrote:
| A fair point about the instruction sets, and it is also true
| that "most" files are small.
|
| But again, due precisely to their size, large files take a
| disproportionate amount of time to process.
|
| Don't confuse the typical use-case with the fundamental
| concept: versioning.
|
| Git could be a general purpose versioning system with many
| more use-cases, but limitations like this hold it back
| unnecessarily...
| akvadrako wrote:
| Actually, SHA256 is faster since many common processors have
| special instructions to accelerate it.
| avar wrote:
| I'm the person and Git developer (AEvar) quoted in the article. I
| didn't expect this to end up on LWN. I'm happy to answer any
| questions here that people might have.
|
| I don't think the LWN article can be said to take anything out of
| context. But I think it's worth empathizing that this is a thread
| on the Git ML in response to a user who's asking if Git/SHA-256
| is something "that users should start changing over to[?]".
|
| I stand by the comments that I think the current state of Git is
| that we shouldn't be recommending to users that they use SHA-256
| repositories without explaining some major caveats, mainly to do
| with third party software support, particularly the lack of
| support from the big online "forges".
|
| But I don't think there's any disagreement in the Git development
| community (and certainly not from me) that Git should be moving
| towards migrating away from SHA-1.
| michaelt wrote:
| Thanks for your work on Git!
|
| _> I 'm happy to answer any questions here that people might
| have._
|
| Is there any way to achieve a gradual, staged rollout of
| SHA256?
|
| What's the impact of converting a repo to SHA256 - will old
| commit IDs become invalid? Would signed commits' signatures be
| invalidated?
| avar wrote:
| The answer is somewhat hand-waivy, because this code doesn't
| exist as anything except out-of-tree WIP code (and even in
| that case, incomplete). But yes, the plan is definitely to
| support a gradual, hopefully mostly seamless rollout.
|
| The design document for that is shipped as part of git.git,
| and available online. Here's the relevant part: https://git-
| scm.com/docs/hash-function-transition/#_translat...
|
| Basically the idea is that you'd have a say a SHA-256 local
| repository, and talk to a SHA-1 upstream server. Each time
| you'd "pull" or "push" we'd "rehash" the content (which we do
| anyway, even when using just one hash).
|
| The interop-specific magic (covered in that documentation) is
| that we'd use a translation table, so you could e.g. "git
| show" on a SHA-1 object ID, and we'd be able to serve up the
| locally packed SHA-256 content as a result.
|
| But the hard parts of this still need to be worked out, and
| problems shaken out. E.g. for hosting providers what you get
| when you "git clone" is an already-hashed *.pack file that's
| mostly served up as-is from disk. For simultaneously serving
| clients of both hash formats you'd essentially need to double
| your storage space.
|
| There's also been past in-person developer meet-up discussion
| (the last one being before Covid, the next one in fall this
| year) about the gritty details of how such a translation
| table will function exactly.
|
| E.g. if linux.git switches they'd probably want a "flag day"
| where they'd transition 100% to SHA-256, but many clients
| would still probably want the SHA-1<->SHA-256 translation
| table kept around for older commits, to e.g. look up hash
| references from something like the mailing list archive, or
| old comments in ticketing systems.
|
| Currently the answer to how that'll work exactly is that
| we'll see when someone submits completed patches for that
| sort of functionality, and doubtless issues & edge cases will
| emerge that we didn't or couldn't expect until the rubber
| hits the road.
| tux3 wrote:
| Has there been any feedback/communication with forges
| happening, on or off-list?
|
| I'm curious how closely (if at all) they've been following this
| effort
| lalaland1125 wrote:
| Have you considered moving over to a combined SHA-1, SHA-256
| model where both hashes are calculated, with SHA-1 shown to the
| user and SHA-256 only used in the background to prevent
| collisions?
|
| There is a compute cost for that, but it should be minimal
| relative to the security benefits?
| corbet wrote:
| It's nice to see LWN on HN for the second time in one day, but
| please remember: it is only LWN subscribers that make this kind
| of writing possible. If you are enjoying it, please consider
| becoming a subscriber yourself -- or, even better, getting your
| employer to subscribe.
| O__________O wrote:
| For easy of reference, here is the link to subscribe, which
| includes a description of the benefits:
|
| https://lwn.net/subscribe/Info
|
| And the Wikipedia page for LWN, if you're not familiar with it:
|
| https://en.m.wikipedia.org/wiki/LWN.net
| williadc wrote:
| Googlers can subscribe through work by visiting go/lwn and
| following the instructions.
| jra_samba wrote:
| Just want to second this ! Please subscribe to lwn. I learn new
| things from lwn every week. It's really worth the money.
| cockhole_desu wrote:
| O__________O wrote:
| Anyone aware of any exploits tied the SHA-1 weakness in the wild?
|
| (I have seen proofs of concept [1], but never actually heard of
| an exploit in the wild using it; for example, on: digital
| certificate signatures, email PGP/GPG signatures, software vendor
| signatures, software updates, ISO checksums, backup systems,
| deduplication systems, Git, etc.)
|
| [1] https://shattered.io/
| bawolff wrote:
| Most security critical systems have switched to sha256 at this
| point, and making a fresh collision still costs tens of
| thousands, so people arent really doing it for kicks (that
| said, once you have one collision you can reuse it for free as
| long as you keep the same prefix, so the proof of concept can
| be repurposed with certain constraints).
|
| The most in the wild one i have ever heard of was when webkit
| accidentally broke their svn repo by checking in a collision.
|
| However you can look at the history of md5 which had a similar
| flaw which was exploited by the flame malware.
| O__________O wrote:
| Thanks, agree the Flame's use of a collision attack was both
| comparable and notable:
|
| https://en.m.wikipedia.org/wiki/Flame_(malware)
| password4321 wrote:
| Applications of that collision:
|
| https://twitter.com/rauchg/status/834770508633694208 > _a SHA-1
| "Pinata" [...] claimed_
|
| https://news.ycombinator.com/item?id=13723892 > _Make your own
| colliding PDFs_
|
| https://news.ycombinator.com/item?id=13917990 > _Collision
| Detection_
| slim wrote:
| Why didn't Linux migrate their repo to SHA-256 ? is it that
| difficult to migrate a repo ?
| ivoras wrote:
| Is there an explanation of what would go wrong with the naive
| approach? E.g.:
|
| - Change the binary file format in repos to support arbitrary
| hash algorithms, in a way which unambigously makes old software
| fail.
|
| - Increment the Git major version number to 3.0
|
| - Make the new version support both the old version repos and the
| new ones. Make it a per-repo config item that allows/disallows
| old/new hash formats. In theory, there's nothing wrong with
| having objects hashed with mixed algorithms as long as the
| software knows how to deal with that.
|
| - The old format will probably have to be supported forever
| because of Linux.
|
| Most user-facing utilities don't care what the hash algo actually
| is, they just use the hash as an opaque string.
| runeks wrote:
| Releasing new software is the simple part. The problem is that
| versioning is lacking in the old software, and therefore it
| doesn't know how to talk to the new software. So for the old
| software there's no difference between "invalid data" and "I'm
| too old, please upgrade me".
| dingleberry420 wrote:
| > So for the old software there's no difference between
| "invalid data" and "I'm too old, please upgrade me".
|
| And why is this an issue? Release the new version that can
| read new repo formats, but doesn't write them yet. Wait a
| year. Release new version that can write new repo formats and
| encourage users to upgrade.
|
| Anyone who hasn't upgraded in the past year probably doesn't
| care about security and should be left behind. Besides, once
| they google the error message they'll figure it out soon
| enough. It's not like git is known for its great UX anyway.
| [deleted]
| kzrdude wrote:
| All of what you wrote, except the version bump, is already
| implemented. It's the nicer features that are missing, the nice
| migration path.
| kelnos wrote:
| > _In theory, there 's nothing wrong with having objects hashed
| with mixed algorithms as long as the software knows how to deal
| with that._
|
| That's an interesting idea, actually. I'm not sure they plan to
| support that, though? That would make things a lot easier on
| existing repositories; without support for mixed hashes, repos
| would have to have their history entirely rewritten, which
| would invalidate things like signed commits/tags.
| TillE wrote:
| Bjarmason has a good response about the practicalities of an
| attack; it explains why a "broken" hash is rarely a running-
| around-with-your-hair-on-fire level emergency. It would clearly
| be better to use a better hash, but is it actually urgent for
| anyone? Probably not.
| encryptluks2 wrote:
| It seems like something more modern, like b3sum would be
| better... no? What about b2sum?
| kortex wrote:
| I love the performance of blake3 but my understanding is it's
| still a bit of the new kid on the block. Blake2 is a SHA-3
| finalist so should be perfectly sufficient, plus it has
| variable digest sizes, reasonably fast, and other nice
| features.
|
| Either way, anything relying on hashes for data integrity
| should at least be flexible to the option of multiple hash
| algos. But with git, it's going to be hard enough as is to
| change to SHA-256, and I don't know how parametric it'll be.
| MrStonedOne wrote:
| bradhe wrote:
| Does the usage of SHA-1 in Git actually have security
| implications, though? It's basically only used to generate
| addresses for refs and hunks and all that.
| nonameiguess wrote:
| It's difficult to exploit, but possible.
|
| I think the actual issue here is environment accreditation not
| allowing the use of sha-1 at all, but that is still rare. It'll
| become a much larger issue if a future FIPS standard ever
| disallows sha-1, because that will impact a ton of
| environments. It means git won't even work on your servers any
| more.
| saghm wrote:
| I don't think it does; sure, someone could potentially craft a
| malicious commit that causes a SHA1 collision in your repo, but
| I think if you are merging commits from malicious authors,
| you've got way bigger problems than that.
| corbet wrote:
| ...and if you're merging commits from a developer who,
| unknown to either of you, had their laptop compromised and
| their repo corrupted? Remember that the compromise of
| kernel.org happened via a developer's laptop, and it was only
| the security of the hash chains that preserved confidence in
| the repositories stored there.
|
| As noted in the article, an SHA-1 collision attack does not
| appear practical now, but that is a situation that can
| change.
| shakna wrote:
| GitHub actually makes pull requests available as an unlisted
| part of the original repository under refs/pull/$PR/head and
| refs/pull/$PR/merge, which allows a malicious author to add
| themselves to your index, without your involvement.
|
| Not to say that this attack is in any way practical, yet.
| Just that some providers don't require active involvement to
| try and attempt it.
| mmastrac wrote:
| If a repo accepts third-party contributions, you can create a
| split brain where half the people see one set of contents and
| the others see a different set, but the same hashes are
| available.
|
| I don't know if this would survive additional commits on top as
| I'm not familiar enough with git's internals.
| progval wrote:
| It will survive it until someone touches the affected blob,
| then they'll converge to the version that person has.
| blakesterz wrote:
| The article does address that:
|
| "Given the threat that the SHA-1 hash poses, one might think
| that there would be a stronger incentive for somebody to
| support this work. But, as Bjarmason continued, that incentive
| is not actually all that strong. The project adopted the
| SHA-1DC variant of SHA-1 for the 2.13 release in 2017, which
| makes the project more robust against the known SHA-1 collision
| attacks, so there does not appear to be any sort of imminent
| threat of this type of attack against Git. Even if creating a
| collision were feasible for an attacker, Bjarmason pointed out,
| that is only the first step in the development of a successful
| attack. Finding a collision of any type is hard; finding one
| that is still working code, that has the functionality the
| attacker is after, and that looks reasonable to both humans and
| compilers is quite a bit harder -- if it is possible at all."
| avar wrote:
| (I'm the "Bjarmason" quoted in the article)
|
| To elaborate a bit: One thing that makes a viable attack
| against Git especially hard is that aside from the hash it's
| using has a behavior of never replacing an already hashed
| object[1].
|
| So let's say I have a tool that can take a given file & SHA-1
| pair and produce a collision, the next step is quite hard. I
| could in this scenario produce a file with an exploit whose
| hash matches that of Linus's kernel/pid.c or whatever.
|
| But how do I get that object to propagate among forks of
| linux.git to distribute my exploited code?
|
| If I e.g. push it to a fork of linux.git on a hosting
| provider that Linus uses the the remote "git-index-pack"
| process will hash my colliding object, but before it stores
| it check whether such an object ID exists in its object
| store, if it does it'll drop it on the floor. You don't need
| to store data you've already got in a content-addressable
| filesystem.
|
| Which is not to say that a hash collision is a non-issue, and
| Git should certainly be migrating from SHA-1. There's no
| disagreement about that in the Git development community.
|
| But it matters for how much you should panic how the software
| you're using could be exploited in the case of a hash
| collision.
|
| Also, the scenario above presupposes a preimage attack, which
| is a much worse attack on a hash function than a collision
| attack. Currently no viable preimage attack on SHA-1 exists,
| only a collision attack.
|
| Which means that before any of the above I'd have to have
| produced a viable version of say kernel/pid.c that Linus was
| willing to merge, knowing that my evil twin of that version
| is something I intended to exploit people with.
|
| Then I'd need to patiently wait for that version to make it
| into a release, knowing that even a one-byte change to the
| file would foil my plans...
|
| 1. On the topic of running with scissors: I wrote a patch to
| disable that collision check for an ex employer, it helped in
| that I/O-bound setup, and we were confident in the lessened
| security being a non-issue for us in _that_ particular setup.
| The patch never made it into git 's mainline. The patch won't
| apply anymore, but the embedded docs elaborate on the topic:
| https://lore.kernel.org/git/20181113201910.11518-1-avarab@gm.
| ..
| mvkg wrote:
| Regarding the collision attack replacement check, do you
| know if that is carried over into other git implementations
| (e.g. libgit2)?
| avar wrote:
| I had to look, but in the case of libgit2 yes they have.
| Like git they have a way to select SHA-1 backends, and
| the default is the SHA1DC library.
|
| But, even supposing a libgit2 that didn't use SHA1DC I
| think most users would be protected in practice if the
| "git" they use used SHA1DC. Hosting providers, local
| editors etc. use libgit2 for a lot of things, but I think
| in most cases (certainly in the case of the popular
| hosting providers) it's some version of "/usr/bin/git"
| that's handling your push, and actually propagating your
| objects.
|
| For stopping a colliding hash it's enough that any part
| of the chain of propagation is able to stop it.
| Salgat wrote:
| From what I've heard it's as simple as injecting the
| necessary garbage into a comment to fit the required hash for
| modified code.
| jandrese wrote:
| The comment full of random garbage will probably look weird
| to a human, but by the time a person is looking at the code
| it will probably be too late.
|
| But you could also hide it as a fake lookup table or inline
| XPM or something like that.
| prepend wrote:
| > as simple as injecting the necessary garbage into a
| comment to fit the required hash for modified code.
|
| This seems true yet there are no demos or documented
| attacks using this method.
|
| I think practically speaking it's kind of a pain to do.
| bawolff wrote:
| There is a big difference between having 2 file with the
| same garbage comment but different content that have the
| same hash and creating a new file that had a garbage
| comment and has the same hash as some other file not chosen
| by the attacker. (preimage vs collision).
|
| Sha1 has a collision attack. We are far away from a
| preimage attack
| layer8 wrote:
| There is a middle course: You could get a pull request
| accepted with good content, but including a sensible
| comment whose exact wording you can choose, so later you
| can replace the contents of that commit with malicious
| code and a garbage comment. Such a collision is easier to
| create than a preimage attack (because you have _some_
| control over the preimage), but harder than if you could
| choose the preimage arbitrarily (which wouldn't be
| accepted in the pull request). I admit that I have no
| idea how to quantify the difference in difficulty.
| Zamicol wrote:
| This is concerning from a signing perspective.
|
| Example: `git commit -a -S -m 'signed commit'` signs the SHA-1
| hash directly.
|
| Even if the SHA-1 digest is rehashed with a secure hashing
| algorithm, SHA-256, it would hide the fact that the reference
| is to an insecure hashing algorithm. The project itself needs
| to be rehashed with a secure hashing algorithm for signing to
| be secure.
| Dylan16807 wrote:
| It's more complicated than that. If the most recent
| signatures are entirely based on SHA-256, and you trust those
| signatures sufficiently, then they act as protection for all
| ancestor commits. In that case a SHA1-based signature on an
| older commit isn't a big deal.
| Zamicol wrote:
| >then they act as protection for all ancestor commits
|
| How does that work? My understanding was that a git gpg
| signature only signs the project at that commit state.
|
| It says nothing about past (or future) commits outside of a
| digest reference to past commits, which if that digest
| wasn't upgraded, would be considered insecure.
|
| Said another way: Git does not rehash past commits, or the
| present commit, when gpg signing. A commit itself only
| includes the SHA-1 digest of the previous commit.
| layer8 wrote:
| You are correct. In the AdES signature world, the
| solution is to have a cryptographic (signed) timestamp
| using a newer hash algorithm that rehashes all previous
| commits, and to include that timestamp into a new commit.
| When verifying the hashes of old commits, the software
| would verify that those are covered by an appropriate
| timestamp that proves that they were created before the
| old hash algorithm was considered too weak.
|
| This is very similar to the following: Instead of
| rehashing, i.e. replacing old hashes with new hashes, add
| the new hashes alongside the old ones, and sign the new
| hashes, together with the time mark, by a trusted
| authority. The old hashes and signatures then remain
| valid indefinitely as long as the new hashes and
| signatures are verified successfully.
| Dylan16807 wrote:
| If you convert a repo to SHA-256, then surely it will
| recalculate all the hashes back to the start, right?
| Otherwise that's not a conversion. And then new
| signatures will use a hash that's SHA-256 all the way
| down.
|
| The old signatures will still be SHA-1. But if you try to
| replace any part of a commit, the SHA-256 won't match. So
| the combination of "the commit is an ancestor of multiple
| securely signed commits in this repo" and "the SHA1 on
| the signature matches" is enough to know you have the
| right data in most use cases.
| sdfhdhjdw3 wrote:
| > Even if creating a collision were feasible for an attacker,
| Bjarmason pointed out, that is only the first step in the
| development of a successful attack. Finding a collision of any
| type is hard; finding one that is still working code, that has
| the functionality the attacker is after, and that looks
| reasonable to both humans and compilers is quite a bit harder --
| if it is possible at all.
|
| Sounds like there's money in this.
| simias wrote:
| It's frankly amateurish for the git dev to delay this. The longer
| this lasts, the more painful it'll be whenever the switch will
| finally take place.
|
| Linus shouldn't have used SHA-1 in the first place, it was
| already being deprecated by the time git got its original
| release. Then every time a new milestone is reached to break
| SHA-1 we see the same rationalization about how it's not a big
| deal and it's not a direct threat to git and blablabla.
|
| It'll keep not mattering until it matters and the longer their
| wait the more churn it'll create. Let's rip the bandaid that's
| been hanging there for over 15 years now.
| hinkley wrote:
| I worked on code signing for civilian aviation years ago and
| there were people trying to pressure me into supporting MD5 and
| SHA-1 signatures. I told the first group to jump off a cliff,
| and the second group got a firm no. The first papers on
| theoretical SHA-1 attacks had already been published, we were
| still a couple years out from active use, and people were
| already beginning to talk about starting to organize the SHA-3
| process.
|
| Once a system expects to handle SHA-1, then you have to deal
| with old assets that have deprecated signatures, and that's a
| fight I 1) didn't want to have and 2) was fairly sure I
| wouldn't be around to win.
|
| Git was still brand new, largely unproven at that point, and I
| don't understand why he picked SHA-1.
| runeks wrote:
| > Linus shouldn't have used SHA-1 in the first place, it was
| already being deprecated by the time git got its original
| release.
|
| Using SHA-1 to begin with was fine. However, commit hashes
| should have been prepended with a version byte to make it
| easier to transition to the next hash algorithm.
|
| This would mean an old Git client could report an error to the
| user of the nature "please upgrade your software to support
| cloning from this Git server" instead of failing with an error
| that's inseparable from "the Git server is broken" when trying
| to clone a Git repo using SHA-256.
| jackweirdy wrote:
| There's already a version byte: if it's [0-9a-f], that's
| version 1 ;)
| LeifCarrotson wrote:
| That's a 4-bit nibble, the version byte is 0x00 to 0xFF.
| layer8 wrote:
| The problem is not a missing version byte. SHA-256 is
| trivially distinguishable from SHA-1 by hash length. The
| problem is that that the length of a SHA-1 hash (20 bytes) is
| (or was) hardcoded in too many places.
| simias wrote:
| By the time Git was first released the first attacks on SHA-1
| had already been published, but I agree with your general
| point about allowing for backward compatible updates.
| wahern wrote:
| Linus' original excuse for using SHA-1 was that Git hash trees
| and hash identifiers were never meant to be cryptographically
| secure. GnuPG signing support, the popular belief that Git
| trees had a strong security property, etc, came afterward,
| along with increasingly awkward excuse-making.
|
| So strictly speaking Linus and subsequent maintainers weren't
| being amateurish in the beginning. (You didn't say that
| explicitly, but it would be a fair criticism given what was
| known about SHA-1 at the time, including known by Linus--he
| knew and made a choice.) Rather, in the beginning it was
| naivety in believing that people wouldn't begin to depend on
| Git's apparent security properties.
| jopsen wrote:
| Yeah, on hindsight maybe he should have made his own 160bit
| CRC variant :)
|
| Honestly, I think it's fair to say that hashes isn't meant to
| be a security feature.
|
| But signed tags/commits/etc. probably need a better hash.
___________________________________________________________________
(page generated 2022-06-23 23:00 UTC)