[HN Gopher] Colliding with the SHA prefix of Linux's initial Git...
___________________________________________________________________
Colliding with the SHA prefix of Linux's initial Git commit
Author : 2bluesc
Score : 199 points
Date : 2024-12-30 22:49 UTC (1 days ago)
(HTM) web link (people.kernel.org)
(TXT) w3m dump (people.kernel.org)
| philips wrote:
| Relatedly: Kees's keynote on Linux security from a month ago was
| great: https://www.youtube.com/watch?v=orO8czP5Bxw
| Terr_ wrote:
| There were some plans to migrate to SHA256, but somehow it still
| hasn't happened.
|
| The practical upshot is a git commit hash is _not enough l_ to
| know you are distributing /executing the legitimate code, as
| opposed to a malicious doppelganger. This is particularly
| important for tools that rely on it for dependency management,
| local caches, etc.
| shakna wrote:
| Compatibility between remotes using one or the other hasn't
| arrived yet, and git doesn't want to break compatibility. But
| you can create SHA256 one's today. [0]
|
| [0] https://lwn.net/Articles/898522/
| onedognight wrote:
| The is not a full git commit hash collision. It has to do with
| a git note which only needs to matche a 12 character prefix of
| the git commit.
| usr1106 wrote:
| While you corrected one mistake you added a new one:)
|
| Those are git trailers, see git-interpret-trailers(1).
|
| git-notes(1) is something completely different and not used
| by the kernel.
| theamk wrote:
| people don't really care, because current collision methods are
| mitigated.
|
| Git does not actually use "sha1", despite what all the docs
| say, it uses "sha1dc", which is just like sha1 except for
| inputs which can cause collisions, in which case it either
| fails with clear error message or returns completely different
| value.
|
| https://news.ycombinator.com/item?id=17825441
|
| so don't worry, git hashes _are_ enough to know you are
| distributing/executing the legitimate code.
|
| (not to mention you need a preimage attack to replace known
| commit, and this is not yet possible with sha1)
| akoboldfrying wrote:
| From your link:
|
| >In this case "hash" will be the same as SHA1(input) in all
| cases, except those where the input is detected to be
| malicious (as in the SHAttered attack)
|
| I don't see how this can be more than a fundamentally
| forward-incompatible sticking plaster over the problem. The
| problem isn't merely that "detecting maliciousness" seems
| fraught in itself (how does one infer intent reliably?) --
| it's that today's SHA1DC() implementation can only detect and
| optionally correct _today 's_ known attacks, so each new
| attack necessitates a new, incompatible version of SHA1DC().
| Dylan16807 wrote:
| Sure it's plaster, but plaster can last a good while.
| Sufficiently new attacks don't come around all that often,
| and _every_ hash has a risk of new attacks showing up.
| hcs wrote:
| fwiw, I wasn't too familiar with this usage, looked it up
| and sticking plaster is in the sense of bandaid here
| https://en.wikipedia.org/wiki/Adhesive_bandage
|
| But agreed that it's a rather tougher patch than I'd
| originally thought, reading elsewhere in this thread.
| theamk wrote:
| Each new _unrelated_ SHA1 attack will need an update in
| SHA1DC. But it has to be truly unrelated, as the collision
| detection method is fairly robust. I recommend reading the
| original "Counter-cryptanalysis" paper [0] for details on
| how attacks work and how they can be mitigated (there is
| certain internal state in SHA1 that is used in all known
| attacks). BTW, this paper has an interesting anecdote:
| apparently Flame malware had exploited MD5 collisions using
| novel unpublished attack method... and yet it was detected
| by collision detector (section 3.2). Another example is
| that SHA-mbles attack, published 3 years after "Counter-
| cryptanalysis", was detected as well, with no required code
| changes.
|
| No, there is nothing "fundamentally incompatible" in the
| new SHA1DC method. After all, git came out in 2005, 11
| years before SHA1 attacks were known, so it used regular
| SHA1. The collision detector was added in 2017 and nothing
| broke, because false positive chance is 2^-90 [1].
|
| I have not heard of any new SHA1 collision results, but if
| they are based on no-difference differential paths, git has
| nothing to worry about. And if they are not, it may be
| possible to extend DC detector to seamlessly detect and
| prevent those attacks, and then only upgrade git clients,
| keeping backward and forward compatibility for data.
|
| Of course there is always a chance that someone will come
| out with all-new SHA1 preimage attack that cannot be
| detected without high rate of false positive, so it's
| prudent to switch git to sha256. There is a lot of work
| being done: git's sha256 mode went out of beta in 2.42
| (2023), but neither github nor gitlab support it.
|
| But since the current state of git's sha is that there is
| nothing broken, and git git commit hash _is_ "enough to
| know you are distributing/executing the legitimate code, as
| opposed to a malicious doppelganger", there is no real
| pressure.
|
| [0] https://marc-stevens.nl/research/papers/C13-S.pdf
|
| [1] https://github.com/cr-
| marcstevens/sha1collisiondetection
| bawolff wrote:
| > The problem isn't merely that "detecting maliciousness"
| seems fraught in itself (how does one infer intent
| reliably?)
|
| Its not detecting "intent" it is detecting that the hash is
| one vulnerable to the attack, which is extremely unlikely
| to happen by accident, so if you see it you can assume
| malice.
|
| It might be a band-aid, and sha256 is certainly a much
| better solution, but its more robust than it sounds at
| first glance (since it sounds crazy at first glance)
| bandrami wrote:
| The hash space is atoms-in-the-universe range; this is a
| collision in a much, much smaller subset of that space
| TheDong wrote:
| > The practical upshot is a git commit hash is not enough l to
| know you are distributing/executing the legitimate code, as
| opposed to a malicious doppelganger
|
| Really now? Mind if I challenge you?
|
| I have on my machine a git repo with commit
| '75eb4e3b1369706a4dcd61cc80e49660ac341ea4'.
|
| If you can give me a second git repo with such a commit
| containing different contents, I'll happily send you $10k USD,
| or donate it to a charity of your choice.
| sgjohnson wrote:
| > If you can give me a second git repo with such a commit
| containing different contents, I'll happily send you $10k
| USD, or donate it to a charity of your choice.
|
| Calculating that SHA1 collision is going to be a bit more
| expensive than $10k, by a couple of orders of magnitude.
|
| Finding it in the wild is improbable, but calculating it is
| definitely possible, and has been done before.
| http://shattered.io/
| chippiewill wrote:
| Shattered didn't produce a collision for an arbitrary hash,
| it produced two documents with the same hash (which is a
| slightly easier problem, about 100,000x faster).
|
| SHA1 is certainly insecure at this point, but not even
| close to trivially so.
| codeflo wrote:
| That is enough to distribute malicious code though, at
| least in certain scenarios. Someone might create a setup
| where reviewers check/sign one version of the source
| code, and what gets distributed is another version with
| the same hash.
| ExoticPearTree wrote:
| Can you create a proof of concept and show it here?
| codeflo wrote:
| What's your point?
| ExoticPearTree wrote:
| My point is that you need to put the money where your
| mouth is.
| usr1106 wrote:
| Code review in the Linux kernel still happens by email to
| a large degree.
|
| Further up in contribution tree there is additional
| signing. Would that further complicate the insertion of a
| false commit? I am not convinced that signing is used all
| the way down to every contribution.
| codeflo wrote:
| Linux probably has enough eyeballs on its source to make
| attacks like that unlikely anyway, but Git isn't just
| used by Linux.
| bawolff wrote:
| Indeed. We can't even do this for md5, let alone sha1.
|
| Preimage attacks are very different from collision
| attacks.
| oefrha wrote:
| Switching to SHA-256 and switching to longer substrings of
| hashes for identification are basically orthogonal problems.
| The former is hardly going to help with the latter, except in
| the we already broke everything so why not take the chance to
| break some more sense.
| usr1106 wrote:
| TFA has nothing to do with SHA-1 or SHA-1 collisions. It's
| about abbreviated hash values introduced for readability by
| humans. Now these values are used by auxiliary scripts. Which
| again has little to do with git proper. It's just what the
| kernel community writes into commit messages and what scripts
| they use to parse those messages.
| dec0dedab0de wrote:
| I think multiple hashes is the way to go to avoid collisions.
| it can even be something simple like md5. the chances of
| finding a collision that matches two or more algorithms is near
| impossible. Obviously that doesn't work for passwords, but for
| verifying that data hasn't been tampered with, it works.
| throw0101d wrote:
| First twelve and last twelve characters are the same:
| $ echo -n retr0id_662d970782071aa7a038dce6 | sha256sum
| 307e0e71a409d2bf67e76c676d81bd0ff87ee228cd8f991714589d0564e6ea9a
| - $ echo -n retr0id_430d19a6c51814d895666635 |
| sha256sum
| 307e0e71a4098e7fb7d72c86cd041a006181c6d8e29882b581d69d0564e6ea9a
| -
|
| * Via: https://news.ycombinator.com/item?id=38668893
| Retr0id wrote:
| Oh hey, it's me. Shortly after then I did a write-up on how I
| crafted them, which was discussed here:
| https://news.ycombinator.com/item?id=38718314
| susam wrote:
| Excellent article. Thanks for resharing it here.
| userbinator wrote:
| In other words, 6 out of 20 bytes match. Good luck colliding the
| other 112 bits.
| bandrami wrote:
| The problem is some (broken and wrong but in the wild) tooling
| relies on substrings like that
| Dilettante_ wrote:
| Somebody ought to tell them not to do that! [Spoken in the
| tone of 'You can't park there sir!']
| cwillu wrote:
| https://people.kernel.org/kees/colliding-with-the-sha-prefix...
| is the actual link; the lwn article is just a link to it with a
| one sentence summary and a short quote.
| dang wrote:
| Changed. Thanks!
|
| (Submitted URL was https://lwn.net/Articles/1003797/)
| timewizard wrote:
| Would tooling break if the radix was changed from 16 to something
| higher? You could double your resistance by just changing to base
| 32 representation in the same length
| Dylan16807 wrote:
| _So much_ would break if you changed the character set. And
| there would be pretty much no benefit. Making it a hundred
| times harder* is not nearly enough to prevent short collisions.
|
| * The exact amount would depend on repo and method. Using base
| 32 and changing nothing else would slow collisions by 2^length.
| timewizard wrote:
| > would slow collisions by 2^length.
|
| It would add a bit to each position, going from the current
| 48 bits, to 60 bits. Or 4,096 more resistant to _accidental_
| collision. You can have as many bits as you want you just
| make it more time consuming to create one, but if you're
| entire infrastructure blithely relies on one never
| _intentionally_ being submitted, you've solved nothing
| really.
|
| Moreover, if you find a commit with a collision, it's very
| easy to slightly alter it's contents to alleviate the
| problem, presuming we're only having to contend with
| unintentional conflicts.
| Dylan16807 wrote:
| The length of the short hash automatically adjusts to keep
| the chance of accidental collisions low. There's no need to
| lower the risk of accidental collisions.
|
| So changing the character set doesn't notably improve the
| accidental situation, and it doesn't notably improve the
| deliberate situation. And it would be a huge pain to do.
| timewizard wrote:
| In the kernel case the length of the short hash is fixed
| at 12 characters. This is part of the article.
| Dylan16807 wrote:
| Okay, sorry, I was talking about the _normal_ short hash
| rather than this specific tool. And I first looked before
| the link was changed.
|
| Still, they already figured out a solution. They don't
| need multiple mechanisms for the same thing.
| chrishill89 wrote:
| Presumably the problem is that these tools only take the
| abbreviated hash into account. Not also the subject:
| <abbrev. hash> ("<subject>")
|
| You also have another data point. You only need to search in the
| history from the commit that you are reading. Assuming that the
| "Fixes" commit is an ancestor of the commit whose commit footer
| you are reading.
|
| I always just assumed that tools would take all the data into
| account. Which means that you both need to collide with the
| abbreviated hash as well as the subject. Now I don't do that
| since I just copy-paste the hash, but I would quickly notice in
| case the subject is different (and likely the commit message and
| the diff just look irrelevant).
|
| I don't understand why the Linux Kernel has this hard-coded
| rule[2] -- again, you were going to get collisions eventually, so
| the tools should have just taken all the data into account (at
| least the subject) from the start. The recommendation in the Git
| project is to use `git show -s --pretty=reference`, without any
| fiddling with the abbreviation: <abbrev. hash>
| (subject, ISO date)
|
| Although the Git maintainer uses `--abbrev=8` since git-show will
| just use a longer abbreviation in case the output would be
| ambiguous[1].
|
| They could have used this instead if they wanted simpler, future-
| proof tooling: Fixes: <full hash>
|
| Just like tools like git-revert and git-cherry-pick do.
|
| [1]: https://lore.kernel.org/git/xmqq34j5h7v9.fsf@gitster.g/
|
| [2]: Edit: hard-coded as opposed to Git just figuring out how
| long the abbreviation should be based on how many objects there
| are.
| chrishill89 wrote:
| > Presumably the problem is that these tools only take the
| abbreviated hash into account. Not also the subject:
|
| Well the first mentioned script:
|
| > > Tools like linux-next's "Fixes tag checker",
|
| has `get_full_hash`[1] which uses the subject to search through
| the abbreviated matches.
|
| Edit: And that check was added two weeks ago by Kees [2].
|
| [1]: https://github.com/kees/kernel-
| tools/blob/trunk/helpers/chec...
|
| [2]: https://github.com/kees/kernel-
| tools/commit/5bf6a1e71df59a23...
| weinzierl wrote:
| git itself does not use a fixed size abbreviation but
| determines the length necessary with the birthday paradox
| formula. It just happens to be seven characters most of the
| time, because most repos are small.
|
| This is just for the abbreviated hash which git uses only for
| specific cases like display in the git log and similar, where
| it is a user experience improvement and relatively safe.
| dotancohen wrote:
| > with the birthday paradox formula.
|
| At what probability? Can that probability be configured as a
| global Git option?
| cesnja wrote:
| It can be configured with `core.abbrev`.
|
| https://git-scm.com/docs/git-config#Documentation/git-
| config...
| dotancohen wrote:
| I love when the word "hopefully" is used in technical
| documentation ))
| MadnessASAP wrote:
| All technical documentation is wishful thinking, they're
| just being explicit about it.
| chrishill89 wrote:
| I know. I don't understand why the Linux Kernel uses a hard-
| coded abbreviation instead of just letting Git figure out how
| long it should be (edited my comment now).
| quotemstr wrote:
| We could just encode hashes on base64 instead of hex and get
| several times more entropy with the same size text
| Retr0id wrote:
| base64 only gets you 1.5x more entropy per symbol (measuring in
| bits)
| quotemstr wrote:
| Oh, right. It's logarithmic. That'll teach me to post before
| coffee.
| hn_throwaway_99 wrote:
| Great example of Hyrum's Law, https://www.hyrumslaw.com/.
|
| Comments about SHA256 are irrelevant - you can misuse the prefix
| of a SHA256 hash just as easily. The issue is that people got
| used to human-readable hash prefixes of 10-12 characters as
| "unique" for all intents and purposes, despite the fact that
| there were never any uniqueness guarantees for prefixes and git
| has always handled collisions with short object IDs as ambiguous
| - it's just that it's so rare to happen in the real world that
| lots of script writers treated that "mostly unique" as a
| guarantee.
|
| IMO support for short object IDs is a mistake, as is any behavior
| that "works this way 99.999% of the time, but hey developer don't
| forget you need to also code for that .001% edge case". I'm
| always just copying and pasting things around anyway, so it
| really doesn't make much difference to me if I'm copying 12 chars
| or 64.
| loeg wrote:
| Cute. I've seen CPU commit prefix brute-forcers but not GPU ones.
| On the CPU, you can generate chosen 7-8 digit prefixes pretty
| easily (within a few minutes, I forget exactly). 12 digits is, of
| course, a massively bigger (factor of 2^16) space.
___________________________________________________________________
(page generated 2024-12-31 23:01 UTC)