[HN Gopher] SHA-1 'fully and practically broken' by new collisio...
___________________________________________________________________
SHA-1 'fully and practically broken' by new collision (2020)
Author : ofou
Score : 140 points
Date : 2021-10-12 19:41 UTC (3 hours ago)
(HTM) web link (duo.com)
(TXT) w3m dump (duo.com)
| dang wrote:
| Some past threads:
|
| _SHA-1 collisions now cost $45k [pdf]_ -
| https://news.ycombinator.com/item?id=23350223 - May 2020 (62
| comments)
|
| _The first chosen-prefix collision for SHA-1_ -
| https://news.ycombinator.com/item?id=21979333 - Jan 2020 (352
| comments)
|
| _Abusing SHA-1 collisions for Chromium updates_ -
| https://news.ycombinator.com/item?id=20114809 - June 2019 (36
| comments)
|
| _A SHA-1 chosen-prefix collision attack_ -
| https://news.ycombinator.com/item?id=19907127 - May 2019 (71
| comments)
|
| _From Collisions to Chosen-Prefix Collisions Application to Full
| SHA-1 [pdf]_ - https://news.ycombinator.com/item?id=19878917 -
| May 2019 (18 comments)
|
| _SHA-1 Collision Detection on GitHub.com_ -
| https://news.ycombinator.com/item?id=13917990 - March 2017 (90
| comments)
|
| _Linus ' reply on Git and SHA-1 collision_ -
| https://news.ycombinator.com/item?id=13719368 - Feb 2017 (262
| comments)
|
| _When Will We See Collisions for SHA-1? (2012)_ -
| https://news.ycombinator.com/item?id=13719079 - Feb 2017 (1
| comment)
|
| _Announcing the first SHA-1 collision_ -
| https://news.ycombinator.com/item?id=13713480 - Feb 2017 (485
| comments)
|
| _How would Git handle a SHA-1 collision on a blob?_ -
| https://news.ycombinator.com/item?id=13547348 - Feb 2017 (5
| comments)
|
| _Why it's harder to forge a SHA-1 certificate than to find a
| SHA-1 collision_ - https://news.ycombinator.com/item?id=10778773
| - Dec 2015 (43 comments)
|
| _The Cost of Creating Collisions Using SHA-1_ -
| https://news.ycombinator.com/item?id=8629906 - Nov 2014 (9
| comments)
|
| _When Will We See Collisions for SHA-1?_ -
| https://news.ycombinator.com/item?id=4618069 - Oct 2012 (51
| comments)
| cletus wrote:
| Git was created 16 years ago. The impending breakage of SHA-1 was
| known even at that time, just like how MD5 had been broken before
| it.
|
| I'm honestly still shocked that updating the hashing algorithm
| wasn't built into Git from day one. I really wonder why. Did
| people think this wouldn't happen? Were they so in love with the
| performance of C/C++ being able to pass around 20 byte hashes on
| the stack without worrying about a more complicated structure (eg
| a collection of variable length hashes)?
|
| It just seems like such a massive and foreseeable oversight
| that's going to cause pain for some time to come for really no
| reason at all.
| tantalor wrote:
| > going to cause pain
|
| What kind of attack on a git repo are you worried about?
| snovv_crash wrote:
| Subrepo hash substitution could enable a supply-chain attack
| dec0dedab0de wrote:
| Linus posted about it on google plus in 2017. I haven't re-read
| it yet, but I remember one of the ideas floating around hn at
| the time was to just have two hashes per commit. That is, two
| insecure hashes may be secure together for git's purposes. Even
| though we can generate collisions for md5, and sha1, it would
| be much more difficult to have a file generate an arbitrary
| collision for both at the same time.
|
| Here is a link to Linus's post on the internet archive
|
| https://web.archive.org/web/20170717192607/https://plus.goog...
|
| And here are some hn posts around the same time
|
| https://news.ycombinator.com/item?id=13733481
|
| https://news.ycombinator.com/item?id=13719368
| bawolff wrote:
| > That is, two insecure hashes may be secure together for
| git's purposes.
|
| Generally that's not true for Merkle-Damgard (e.g. sha1,
| sha2) hashes - afaik the difficulty of finding a collision in
| the more difficult hash is the same (up to big-oh) as finding
| it in both hashes.
| umvi wrote:
| "We note that classical collisions and chosen-prefix collisions
| do not threaten all usages of SHA-1."
| ori_b wrote:
| For someone to be able to break your repo using sha1
| collisions, they need to be able to commit to it.
|
| If you don't trust someone, don't let them commit to your repo.
|
| > _Were they so in love with the performance of C /C++ being
| able to pass around 20 byte hashes on the stack without
| worrying about a more complicated structure (eg a collection of
| variable length hashes)?_
|
| The hashes show up everywhere. They're how every object in git
| is identified. They make it onto disk. They're not just commit
| ids.
|
| Changing hashes changes the disk format.
| lamontcg wrote:
| Yeah I don't think any of these proposed attacks work without
| write access to the repo, at which point the game is already
| pretty much over already.
| [deleted]
| alerighi wrote:
| SHA-1 will still work fine for the purpose of git. It is just
| no longer considered secure for cryptographic operations, such
| as digital signature, that doesn't mean that you can't use it
| for other purposes, like git does. Using it is still fine and
| will ever be fine.
|
| Making the hashing algorithm exchangeable would have introduces
| complexity in a software that is already complex, and also less
| efficient (one of the reasons git was created was speed for
| large project like the Linux kernel) for no real purpose. If
| you want to change the algorithm, given that you will break
| compatibility with all existing repositories, tools, and
| clients, you make a fork of git because you are changing too
| much.
|
| I don't see why migrating to SHA-256. The collisions are still
| very unlikely to generate accidentally, and sure, if you want
| to do damage to a repository you can create one on purpose, as
| you can as well commit some hooks that contains malware, or
| alter the git history in any way you want, so what's the point?
| simias wrote:
| Honestly I find these rationalizations around the use of
| SHA-1 annoying and counter-productive. The rule is simple:
| don't use SHA-1. If you already use SHA-1 migrate away from
| it. You know that plenty of software out there that
| interfaces with git expecting that the commit hash will be
| unique. Is it a security risk? Maybe, maybe not. I don't care
| to find out.
|
| It doesn't matter until it starts mattering. If the Git devs
| had done the right thing over a decade ago we wouldn't be
| having this discussion. The longer they wait the more painful
| the migration will be.
|
| SHA-2 was published in 2001, git was released in 2005 and now
| we're in 2021 and we're having this discussion again. The
| first concrete attack was released in "early 2005" according
| to wikipedia, so there's really no excuse.
|
| Just do it, make a major change where you replace SHA-1 with
| SHA-256 and call it a day. It's going to be painful for a few
| months and then we'll move on.
|
| For me these discussions demonstrate the immaturity of
| software engineering. In other industries regulators would've
| banned the use of SHA-1 and you couldn't get certified if you
| used it.
|
| Do electronic engineers regularly try to argue "well ok RoHS
| says we can't have lead solder in this product but frankly
| for this one it's fine the casing is completely waterproof
| and there are no risks for the customer"? No, they don't. If
| the spec says no lead, then either it's no lead or you can't
| sell your product. End of story.
|
| SHA-1 is the lead solder of software engineering. Only
| acceptable for military use.
| umvi wrote:
| > You know that plenty of software out there that
| interfaces with git expecting that the commit hash will be
| unique
|
| You also know that plenty of software out there that
| interfaces with git has hardcoded assumptions (like, for
| example, the assumption that the commit hash will be
| exactly 40 characters long). Some tools parse the output of
| git log and other commit-bearing commands to make
| decisions. Will changing git to SHA-256 create new
| unforeseen security risks due to breakage of those tools
| (for example, by only grabbing the first 40 characters of a
| SHA-256 digest instead of all 64 or by just outright
| crashing)? Maybe, maybe not.
|
| IMO I think you would create _more_ security risks with the
| git integration breakage that would accompany migrating to
| sha256 vs. staying with sha1.
|
| At this point it's almost like you want a new tool/new
| command. `git` vs. `git2`. New projects use git2, existing
| projects use git (or something like that). Otherwise
| confusion and backwards-compatibility breakage will abound.
| musicale wrote:
| > New projects use git2, existing projects use git (or
| something like that). Otherwise confusion and backwards-
| compatibility breakage will abound.
|
| Like uh, python2 and python3? ;-p
| nybble41 wrote:
| > It is just no longer considered secure for cryptographic
| operations, such as digital signature, that doesn't mean that
| you can't use it for other purposes, like git does.
|
| Git effectively uses its hashes as a component of a digital
| signature scheme, in the form of signed tags and signed
| commits. The GPG signature covers the commit object, which
| identifies the content (tree) by its hash.
| madars wrote:
| Collisions definitely do matter for git security: many people
| pin explicit git hashes for their dependancies, and thus they
| can be tricked in running malicious forks. This requires
| placing a chosen commit in the git repo (so unlike second
| preimage break it does not mean that you could attack repos
| you have no control over) but that's not an unrealistic
| threat model overall.
| y4mi wrote:
| What is the thread model though?
|
| I don't think it's possible to create a collision that's
| also executeable code which adds a security hole or
| anything.
|
| So what exactly would they achieve with the collision?
|
| And how do they push these gigantic files that have the
| hash collisions to a server? The upload time would be
| significant.
| noway421 wrote:
| The possible attack is to prepare 2 versions of a commit,
| both resulting in the same commit id. Then later on,
| after the project is successful/etc, swap out the commit
| with the second version, while keeping the other commits
| intact.
|
| Granted, the file that the commit touches would need to
| be not touched in other commits. That's not out of
| question in a typical software project - maybe a file in
| the utils folder which is only written once and never
| changed?
|
| > I don't think it's possible to create a collision
| that's also executeable code
|
| You can include an unreadable binary blob in the commit.
| Tweak the blob to find the collision while keeping the
| code the way attack requires.
| [deleted]
| DSingularity wrote:
| That one is nasty.
| OJFord wrote:
| A denial of service of sorts? (Something broken and
| unusable is delivered instead, as distinct from something
| usable but maliciously so.)
|
| I agree that the chances of ever getting a second pre-
| image that not only makes sense, but does so in some
| malicious way may as well be zero, surely?
| lvh wrote:
| The part about the 2nd pre-image chances being
| effectively zero may be true, but the nasty cases
| described upthread don't need a 2nd pre-image. (You could
| do a lot worse with one, granted!)
| [deleted]
| madars wrote:
| The files don't need to be gigantic. You could, for
| example, have a binary config file which in one colliding
| version encodes a potentially dangerous debugging
| setting, e.g., "allow_unauthenticated_rpc = false" but in
| other has it to "true".
| lvh wrote:
| 1) People systematically underestimate the possibility of
| creating collisions that still do something
| "interesting", like being polyglots (files that can be
| interpreted in multiple formats, executable or
| otherwise). See PoC||GTFO, specifically anything by Ange
| Albertini, for examples; grep
| https://github.com/angea/pocorgtfo/blob/master/README.md
| for "MD5". I specifically recommend this writeup: https:/
| /github.com/angea/pocorgtfo/blob/master/writeups/19/R...
| .
|
| 1bis) You can use an existing collision to create new
| collisions. People seem to think you need to generate all
| the work again from scratch; this is not true. See
| PoC||GTFO for proof by example.
|
| 1cis) The files do not need to be gigantic. See PoC||GTFO
| for proof by example.
|
| 2) You can do the collision in advance, and publish the
| malicious version later. What it accomplishes is that the
| concept of "this Git hash unambiguously specifies a
| revision" no longer works, and one of them can be
| malicious.
|
| 3) The standard should be "obviously safe beyond a
| reasonable doubt", not "not obviously unsafe to a non-
| expert". By the latter standard, pretty much any random
| encryption construction is fine. (The examples I gave use
| MD5, not SHA-1, but that's a matter of degrees.)
|
| 4) SHA-256 was published years before git first was.
| whoisburbansky wrote:
| What do you mean by the `bis` and `cis` suffixes to your
| entry labels?
| lvh wrote:
| It's just a subdivision; it might as well have said 1a,
| 1b... -- but "bis" and "cis/tris" (and possibly tetrakis)
| tend to emphasize that they're addenda, not equal points.
| whoisburbansky wrote:
| Ah, gotcha. I thought I recognized the prefixes from
| Organic, but I don't think I've seen it used like this
| here. Neat!
| [deleted]
| occamrazor wrote:
| In addition to the attack described in a sibling comment,
| when a hashing algorithm has been broken in some way, it
| is safe to assume that other more advanced collision
| attacks will be soon discovered.
| cogman10 wrote:
| And, to that point, I'm not really convinced that a
| cryptographic hashing algorithm is really a great choice for
| git.
|
| It is nice that it checks off the boxes for even distribution
| of hashes, but there's a bunch of other hashing algorithms
| that can do that without the performance penalty inherent in
| crypto hashes. For example, FNV seems like a good fit for
| something like git.
| simias wrote:
| Is hashing a significant bottleneck in any git deployment?
| I'd expect that the diffing would be vastly more expensive
| for instance.
|
| Besides don't many modern CPU supports things like SHA-256
| in hardware?
| TonyTrapp wrote:
| FNV is really cool and has a reasonable quality given its
| simplicity. But it does have issues (sticky state, and I
| think the avalanche characteristics also weren't great)
| that are solved by only slightly more complex hashing
| algorithms.
| lvh wrote:
| > SHA-1 will still work fine for the purpose of git. It is
| just no longer considered secure for cryptographic
| operations, such as digital signature, that doesn't mean that
| you can't use it for other purposes, like git does. Using it
| is still fine and will ever be fine.
|
| This suggests git does not rely on its hash for security
| properties, which seems false? What is the purpose of pinning
| revisions or signing git tags?
| elfchief wrote:
| Git was not intended (AFAIK) to be cryptographically secure.
| Being unsuitable for crypto != being unsuitable for other uses.
| wcoenen wrote:
| Surely signed tags and signed commits in git are supposed to
| be cryptographically secure? [0]
|
| Doesn't the security of those signatures depend on the
| security of the SHA-1 hashes that are being signed?
|
| [0] https://git-scm.com/book/en/v2/Git-Tools-Signing-Your-
| Work
| danachow wrote:
| No - after all, the most common use of digital signatures
| is to sign documents that can be easily tampered. All the
| security is in the signature, not the content being signed.
| lvh wrote:
| This makes no sense. Sure: you can generate an X.509
| certificate that says whatever you want, but the point is
| that you can validate the signature and see that it's a
| forgery. In the case of a hash-addressed system like git,
| the problem is that the signature is over a collision, so
| it no longer certifies the thing its supposed to certify.
| Git uses the hash as a shorthand for a revision,
| including its entire history--so yes, it is using the
| hash that way.
|
| By that logic, would MD5 be fine? MD4? CRC32?
| wongarsu wrote:
| A digital signature usually signs (~encrypts) a hash of
| the content. So asking what exactly signing a commit or
| tag entails is a very valid question. I would expect that
| signing a tag is only as strong as SHA-1, since a tag is
| essentially a label for a commit hash. For signing
| commits I have no clue, but would be quite interested as
| well.
| formerly_proven wrote:
| Git is designed around content-addressed storage and while you
| _can_ cater for hash migrations, it gets pretty messy design-
| wise. Gits core data structures were also designed really
| quickly to patch a very urgent need of the kernel hackers. I
| doubt it has anything to do with being in love passing 20 byte
| strings on the stack. The fine git folks have produced a rather
| detailed document about the hash migration (https://git-
| scm.com/docs/hash-function-transition/) and it is not
| particularly simple to do this.
| john_alan wrote:
| > Our work show that SHA-1 is now fully and practically broken
| for use in digital signatures
|
| Yet 2nd preimage resistance is intact. Zzz.
| r0f1 wrote:
| Obligatory link: https://shattered.io/
| nikeee wrote:
| The article states that the collision found is something
| different than that one found by Google et al.
|
| It's a chosen-prefix attack.
| TravisHusky wrote:
| Of course it has its own website ;)
| fanf2 wrote:
| That was an earlier break; this article is about https://sha-
| mbles.github.io/
| ofou wrote:
| For the ones interested, you can use SHA-256 today. Warning:
| Still in experimental mode. # .gitconfig file
| [extensions] objectFormat = sha256
| musicale wrote:
| Old collision (2020). ;-)
| bjarneh wrote:
| > The technique that the researchers developed is quite complex
| and required two months of computations on 900 individual GPUs.
| motohagiography wrote:
| I only half joke when I ask, when you break a cryptogrpahic
| hash, does it mean that now it's just a really amazing
| compression algorithm, but with a very heavy compute
| requirement? The non-joking half is speculating what data
| compression in a post-quantum compute world looks like.
| fwip wrote:
| No, because reversing the hash has infinite possible answers
| (well, "very many" for bounded input size).
|
| You can't decompress 32 bytes into 1GB because you don't know
| which of the 1GB-sized answers is the intended one.
| ganzuul wrote:
| Natural data is easily identifiable.
| dTal wrote:
| Not it's not. You can't distinguish the complete works of
| Shakespeare from the complete works of Shakespeare with
| the words changed a bit, or switching to Harry Potter for
| a bit in the middle, or a completely dazzling and
| original work of fiction. Any hash might generate all
| three of these, and essentially boundless other works.
| It's a library of Babel.
| ganzuul wrote:
| Your examples really fall out of the scope of the
| premise.
| fwip wrote:
| No, they're 100% correct.
| contravariant wrote:
| Only if you count all natural looking data of which there
| is way more than you could possibly imagine.
|
| It is of course likely that humanity hasn't yet created
| more than 2^100 (~10^30) files, so in _theory_ given a
| registry of all files in existence you might be able to
| identify it by its hash. However while this is simple it
| 's definitely not easy.
| ganzuul wrote:
| I can imagine the output of program space. That's pretty
| big. All possible universes in fact.
|
| It's an issue of probability and bins. While natural data
| is infinite, there is vastly more unnatural data. At some
| point you have enough metadata (e.g. natural vs. random)
| to know that the original data came from Earth to pick
| out the right needle from the needle stack.
|
| Unless the data is from a completely alien source, we are
| close enough to that source for our heuristics to be of a
| manageable size.
| contravariant wrote:
| It's somewhat irrelevant how many unnatural data there
| is. The point is that there's way more plausible data (or
| sentences if we're restricting ourselves to natural
| language) than there are possible 160; 256 or 512 bit
| hashes.
|
| Unless of course you enumerate all of human
| communication, but like I said that doesn't count as
| 'easy'.
| fwip wrote:
| Nope, not on these scales.
|
| Say you have a 32 byte hash, that's 2^(32 _8) possible
| values, right? Now say your input is 128 bytes, that
| means that (on average) each 32 byte hash maps to 2^(96_
| 8) values.
|
| "Natural looking" data will occur an astronomical number
| of times in a search space that large - which is, again,
| only 96 bytes larger than your hash.
| __s wrote:
| That's because natural data has low entropy
|
| But let's say every paragraph only offers 1 bit of
| entropy. Then a 160 bit hash gives you fuzzy accuracy up
| to 160 paragraphs. After that you'll have to extend the
| hash with hints to guide which sequence of paragraphs
| you're looking for, & hints for where the typos are
|
| ofc, 100x compression of English text doesn't require
| this amount of compute to decompress:
| https://en.wikipedia.org/wiki/Hutter_Prize
|
| It's also impractical since most compression use cases
| want to put the work in compression, & have decompression
| be straight forward
|
| edit: 100x was misreading, it's currently 8.6x
| http://prize.hutter1.net/#prev
| ganzuul wrote:
| This is very abstract, but I believe that since both
| input and output are very close in program space compared
| to the total size of PS, the heuristics to map between
| them are going to be of a manageable size. This is based
| on an intuition about a single source for all activity in
| this universe.
| AnimalMuppet wrote:
| "Paragraph" is highly optimistic. IIRC, each English
| _character_ has about 1 bit of entropy.
| __s wrote:
| Shannon estimated 0.6 to 1.3:
| https://cs.fit.edu/~mmahoney/dissertation/entropy1.html
|
| For the sake of argument I figured I'd be highly
| optimistic. The linked prize shows practical evidence of
| 1GB of Wikipedia being compressed below 1 bit per
| character _( & that's with a limit of 50 hours cpu time,
| 10GB memory, & 100GB disk)_
| motohagiography wrote:
| This is what made it a funny thought experiement to me. I
| was doing some space related stuff a while back and when
| you're dealing with something as small as 100mb, and you
| add the constraint of up to a light-year or more of
| latency, which makes error correction immensely costly,
| having a quantum processor reciever find the collision of a
| unique hash could be faster than transmitting the data with
| errors.
|
| There's probably a distance where the latency in light
| years is longer than it takes for a quantum processor with
| some shortcuts to exhaust the search space for the
| collision on the hash - and potential hashing algorithms
| that contain information and hints about a hyperplane that
| generates the original data over the field of
| possibilities. To a quantum computer with sufficient qbits
| at a long enough latency distance away, the 32-bit hash of
| a 2048bit RSA key may "arrive" faster than transmitting the
| whole key.
|
| It feels like a fundamental misunderstanding of something
| in communication theory, but this was the thinking that
| prompted it.
| mypalmike wrote:
| Using the pigeonhole principle, consider how many 100
| megabit values must, on average, hash to the same 32 bit
| value. The answer is a huge number (I could be wrong but
| it seems to me it might be 2 ^ (100,000,000 - 32)?). This
| is a bit of a paradox in cryptographic hashing, where
| finding a collision is difficult, but the number of
| collisions is enormous.
| akomtu wrote:
| Not so simple. A hash corresponds to infinitely many
| messages, but how many of them are in ASCII charset under
| 1KB long? It may happen that each hash has a unique message
| within these constraints.
| hyperpape wrote:
| Ignoring punctuation, capitalization and spaces, you have
|
| (26 ^ 1024) / (2 ^ 160) ascii texts, which is too large
| for python to do the division.
|
| Thinking harder, that's:
|
| >>> (13 * (160)) * (26 ** (1024 - 160)) 58601538220582696
| 072767254440146387121221583753570902541264313546485610004
| 750780866792754920589068132679848156271786567137191486170
| 703327101040110575724309837442235432328033598945697712388
| 381451978878967640960102861159559384620162207350857311340
| 050840762653291815079540852172190063832289697996458447828
| 737045986675145162241398406780211500684400272119809812611
| 929904403730528597187389968557044373171358434578669341421
| 589594031073341308966343982813670005358851607748477530217
| 997935791824321484740631022470362180786257680155724965742
| 143671853261978124815484529745096216129616265428595187946
| 218158257908697520717067407972750772493927919233876373156
| 233168124018474649093065762508852701249697457996541217284
| 731525708907010521446628919717790946317667295418177373934
| 569079380730731418728467142214924963037213836468723407639
| 463340501537517771097304361708288408926134607971881944851
| 903293709136443956469032501471787137653780375970169583703
| 004025580731889262167424195839803018053029467192083418365
| 771596772896857995516305314630974667718236900515720973033
| 360569178506327940782971768109891923840004914279569619570
| 809054106685577577395046255601816801966032983011243478952
| 485788667076648479153854024730784994930592675717244288051
| 727831378594097970916753614065993732935733615600050932039
| 565604760497376113470099235874052260244552013678784657526
| 956911961147614534279815393143753046456632251016766323098
| 5347176517861376
|
| If you restrict it to words, assume that there's only
| 1000 words in the English language, averaging 5 letters
| each, you get:
|
| (1000 ^ 200) / (2 ^ 160) which becomes:
|
| >>> (500 ** 160) * (1000 ** 40) 6842277657836020854119773
| 355907793609766904013068924666782559979930620520927053718
| 196475529111921787261962890625000000000000000000000000000
| 000000000000000000000000000000000000000000000000000000000
| 000000000000000000000000000000000000000000000000000000000
| 000000000000000000000000000000000000000000000000000000000
| 000000000000000000000000000000000000000000000000000000000
| 000000000000000000000000000000000000000000000000000000000
| 000000000000000000000000000000000000000000000000000000000
| 000000000000000000000000000000000000000000000000000000000
| 00000000000000
|
| There will be massive numbers of realistic texts that
| match any given hash.
| dTal wrote:
| The pigeonhole principle forbids it, since the hash
| itself is an ASCII message under 1KB long. Even within
| your message constraints, the message can contain contain
| any possible hash, and more besides. Therefore there are
| more valid messages than hashes.
| FartyMcFarter wrote:
| Compression algorithms must encode data in a reversible way,
| which hash functions can't do in general.
|
| For example, SHA-1 outputs a 160 bit hash, which means some
| input of 161 or more bits will definitely have the same hash
| as some other input of the same size. Even smaller inputs may
| have collisions too.
| [deleted]
| tomsmeding wrote:
| These hashes, regardless of their actual cryptographic
| strength, are intended to be indistinguishable from a random
| generator keyed with the input, kind of. Assuming that they
| succeed in that, hashes should be quite well distributed.
| That means that that for each hash n-bit hash output, there
| will be about two (n+1)-bit input strings that produce that
| hash, about four (n+2)-bit input strings, eight (n+3)-bit
| input strings, etc. So while there are probably few _ASCII_
| input strings that map to one particular hash, for example,
| there will be loads and loads of arbitrary bitstrings that
| map to that hash.
|
| Hashes are very bad compressors. :)
| kazinator wrote:
| The compression joke evokes a misconception (as STEM jokes
| often do) which, in this case, is that breaking a hash means
| finding a way to recover the full text of a multi-megabyte
| document, from a 100-something byte hash.
|
| Whereas all it means is that it's become feasible for someone
| to produce a fake document with the same hash as the genuine
| one; the attack _depends_ on the existence of collisions,
| which depend on the hash being one-way, such that an infinite
| set of documents could be "recovered" from the hash.
|
| Most of the documents you could "decompress" from a hash are
| going to be gibberish, but some of them are going to be
| legit. If a hash is "broken", that means it's somehow unduly
| easy to search that ocean of gibberish collisions to find a
| chosen text which goes to a certain desired hash.
|
| Nothing quantum can redefine a function that isn't one-to-one
| and onto into one that is. The hashing function remains ever
| the mathematical object that it is: a one-way function. But
| calculations of or searches through that function can be
| faster.
|
| In a post-quantum world, pre-quantum crypto and hashing could
| well turn out to be "for shit" as such, but not due to ruses
| like suddenly reversing irreversible functions.
|
| Not saying that the joke isn't funny! We can imagine some
| hard-core computer science or math sitcom in which the fall
| characters say things like this, and only the very narrow
| target audience gets it.
| lallysingh wrote:
| Or roughly 1.3M GPU hours. Less as GPUs get upgraded. Not sure
| how many AWS F1 (FPGA) hours it would take.
| wongarsu wrote:
| So somewhere between $300,000 (running consumer GPUs cheaply)
| and $30 million (paying AWS on-demand prices for Tesla V100).
|
| Even the upper bound for that seems well worth it for well-
| funded attackers if the target is juicy enough.
| asdfman123 wrote:
| I wonder how much money in Bitcoin they would have made if they
| had used those hashes to mine instead.
|
| Speaking of which, it would be funny if they were pretending to
| do security research but added a secret backdoor to mine
| bitcoin instead, somehow exporting those hashes or using the
| partial results of SHA-1 calculations (BTC isn't SHA-1).
|
| I'm just joking, but I wonder if that's possible. If anyone is
| Machiavellian enough for that, it's security researchers.
| rgovostes wrote:
| (Jan 7, 2020)
| ImJasonH wrote:
| Reminder that GitHub has blocked Git commit collisions since
| 2017, and as far as anybody is aware hasn't seen one in the wild.
|
| https://github.blog/2017-03-20-sha-1-collision-detection-on-...
| tantalor wrote:
| > A higher probability exists that every member of your
| programming team will be attacked and killed by wolves in
| unrelated incidents on the same night.
|
| - Scott Chacon
| asdfman123 wrote:
| That's even true if you make your developers commute into
| Yellowstone National Park by ski every day
| bawolff wrote:
| This is stupid. The probability of any attack happening at
| random is close to zero. E.g. the probability of a buffer
| overflowing just right to give you a shell through random
| chance is less than being eaten by wolves.
|
| The question is about the risk of someone intentionally
| performing the attack, not the probability it will
| accidentally happen at random.
| api wrote:
| Random collisions in 160-bit space are incredibly unlikely.
| This is talking about intentional collision, and means that
| it's entirely feasible for someone with significant compute
| power to create a git commit that has the exact same hash as
| another git commit. This could allow someone to silently modify
| a git commit history to e.g. inject malware or a known "bug"
| into a piece of software. The modified repository would be
| indistinguishable if you're only using git hashes.
|
| Git's uses SHA-1 for unique identifiers, which is technically
| okay as long as they are not considered secure. If git were
| designed today it would probably use SHA2 or SHA3 but it's
| probably not going to change due to the massive install base.
|
| Edit: anyone know if git's PGP signing feature creates a larger
| hash of the data in the repo? If not maybe git should add a
| feature where signing is done after the computation of a larger
| hash such as SHA-512 over all commits since the previous
| signature.
| asdfman123 wrote:
| > Random collisions in 160-bit space are incredibly unlikely
|
| Just for fun: to get a 5% chance of a hash collision between
| ANY two numbers in an 160 bit space, you'd have to generate
| 3.9e23 hashes.
|
| So you'd have to generate 1000 hashes per second for * _12
| trillion years.*_
|
| Formula:
|
| n = sqrt(2 * 2^160 * ln(1/(1-0.05))
|
| https://en.wikipedia.org/wiki/Birthday_problem#Probability_o.
| ..
| tialaramex wrote:
| The defence used by GitHub specifically defends _against_
| these intentional collisions, not some mirage of random
| collisions.
|
| Basically you collide a hash like SHA-1 or MD5 by getting it
| into a state where transitions don't twiddle as many bits,
| and then smashing the remaining bits by brute force trial.
| But, such states are _weird_ so from inside the hash
| algorithm you can notice "Huh, this is that weird state I
| care about" and flag that at a cost of making the algorithm a
| little slower. The tweaked SHA1 code is publicly available.
|
| If you're thinking "Oh! I should rip out our safe SHA256 code
| and use this unsafe but then retro-actively safer SHA1" No.
| Don't do that. SHA-256 is safer and faster. This is an
| emergency patch for people for whom apparently 20 years
| notice wasn't enough warning.
|
| In theory the known way to do this isn't the only way, but,
| we have re-assuring evidence for MD5 that independent forces
| (probably the NSA) who have every reason to choose a
| different way to attack the hash to avoid detection _do_
| trigger the same weird states even though they 're spending
| the eye-watering sum of money to break hashes themselves not
| just copy-pasting a result from a published paper.
| [deleted]
| noway421 wrote:
| From a practical point of view, how would injecting malware
| happen? If you're trying to insert a malicious diff somewhere
| deep in the git history, you would need to recompute all the
| other commits after the injected commit - which would most
| certainly change their commit ids too if they are touching
| the same file. When other commit ids change, the malicious
| change becomes detectable.
|
| There's also the case for auditing: force pushing into an
| existing repo triggers an event in GitHub and is logged.
| While the logging event can be missed, it leaves a paper
| trail.
|
| With things like reproduce-able builds, this also becomes
| harder. Distributing (through a means of a fork, or putting
| it up on a website mytotallylegitgitrepos.com) source code
| which builds into a binary which doesn't match upstream hash
| is suspicious.
| AnimalMuppet wrote:
| Can you create an intentional collision, and _also_ have the
| counterfeit have compilable code? (Or, in other
| circumstances, intelligible English text?)
| jfrunyon wrote:
| Yes, just like you can create an intentional collision, and
| _also_ have the counterfeit be a PDF.
| [https://shattered.io/]
| [deleted]
| throw_away wrote:
| Looks like there's a migration plan for git to SHA-256:
|
| https://git-scm.com/docs/hash-function-transition/
| jfrunyon wrote:
| https://stackoverflow.com/questions/23584990/what-data-is-
| be... says git signing just signs the commit hash (plus
| metadata: committer, commit message, etc).
| a_t48 wrote:
| As far as I know it only signs individual commits.
| oconnore wrote:
| I was surprise that no one suggested truncating SHA-256 to
| 160 bits (same as for SHA2-256/224, or SHA2-512/256). The
| attacks on SHA-1 are not directly based on the length of the
| hash, they are based on weaknesses in the algorithm.
|
| Even attacking SHA2-256/128 would be quite difficult as I
| understand it, even though it's the same length as MD5.
|
| Truncated hashes also of course have the great property that
| they mitigate the length extension in Merkle-Damgard
| adrian_b wrote:
| It is not yet possible "to create a git commit that has the
| exact same hash as another git commit" in the sense that if
| someone else has already done a commit you can make another
| commit with the same hash.
|
| What is possible now is something that is much easier: if you
| have enough money and time, you can create 2 commits with the
| same hash, which start with some different parts, which may
| be chosen arbitrarily, then they have to include some parts
| that must be computed to ensure that the hashes will match
| and which will be gibberish that might be disguised in some
| constants or some binary blob, if possible.
|
| Then you can commit one of them and presumably you can later
| substitute the initial commit with the other one without
| anybody being able to detect the substitution.
| outworlder wrote:
| > presumably you can later substitute the initial commit
| with the other one without anybody being able to detect the
| substitution.
|
| How? Which operation would be involved? Will it not show
| anywhere else(reflog)?
| adrian_b wrote:
| Because such an operation is unlikely to succeed even if
| the hashes match, replacing SHA-1 was not a high priority
| for Git.
| noway421 wrote:
| `git push --force` I assume. In GitHub, this will leave a
| trail of events though - Webhook events and auditing.
| DSMan195276 wrote:
| I would think it's not that simple because `git push
| --force` doesn't do anything if it thinks the histories
| are the same, and in this case we've created a history
| that appears to be the same. You'd likely need a custom
| git client (which isn't a problem) but I don't know
| enough about the internals of a git push to know if the
| server would even accept objects if they match hashes of
| objects it already has (it may just go "I don't need
| that" and ignore it, because what's the point in doing
| anything with it?). Presumably it would depend on the
| exact server implementation whether you could get it to
| replace an existing object with a "new" one which is
| actually the same hash, but frankly I think that's
| unlikely because it would be pointless work from the view
| of the server. If it does happen I'm not sure what
| auditing you would actually be able to see, webhooks and
| such might not be triggered because the history didn't
| actually "change".
|
| What you could do however is just host it yourself
| somewhere else, say put it on a fork. Or if you have
| access to the actual repository hosting the original
| version, you could just manually replace it yourself. git
| clients aren't going to just automatically pull the "new"
| version though so you'll have some combo of people with
| the old and people with the new, and it gets a little
| messier from there.
| sverhagen wrote:
| If you can force push, why would you then not push a
| different commit before pushing your updated commit with
| the same, original hash, or does that also not work?
| formerly_proven wrote:
| The reflog shows changes to references. References are
| hashes. Git is CAS, and uses the CAS assumption: H(m1) =
| H(M2) <=> m1 = m2.
| outworlder wrote:
| > This could allow someone to silently modify a git commit
| history to e.g. inject malware or a known "bug" into a piece
| of software.
|
| You need a collision. You also need it to be syntactically
| correct. You need it to not raise any red flags if you are
| contributing a patch. And ultimately you need it to do what
| you want.
|
| That's a pretty tall order.
| api wrote:
| All you need is one variable sufficiently large that you
| can change until you find a collision, such as a comment
| (which about all languages allow) or a meaningless number.
|
| You could even vary whitespace until it fits, like spaces
| at the end of lines.
| ImJasonH wrote:
| Yep. GitHub is saying they would block an object that looked
| like it was crafted to produce a collision using SHAttered,
| and hasn't seen it, intentionally or otherwise, in the wild.
| lifthrasiir wrote:
| The chosen-prefix collision with a complexity of 2^63.4 (2020).
| akomtu wrote:
| Looks like they needed 1.5 millions GPU hours to find a
| collision with a random hash.
| smegcicle wrote:
| thats it, it's over, git is finished
| [deleted]
___________________________________________________________________
(page generated 2021-10-12 23:00 UTC)