[HN Gopher] Hash collisions and exploitations - Instant MD5 coll...
___________________________________________________________________
Hash collisions and exploitations - Instant MD5 collision
Author : losfair
Score : 121 points
Date : 2022-09-20 11:07 UTC (11 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| retrocryptid wrote:
| And yet MD5 is still recommended in Applied Cryptography.
| traceroute66 wrote:
| > And yet MD5 is still recommended in Applied Cryptography.
|
| I'm guessing this is the usual HN dig at Schneier's book ?
|
| Fact is that Schneier was already warning against MD5 as far
| back as 1996. And if you look on more recent blog posts, he is
| not exactly silent on dissuading people against MD5.
|
| Fact is that like it or not, even today, MD5 still sees usage
| in the crypto landscape (see AWS S3 Content-MD5, for example).
| Therefore being able to understand how it works is still
| relevant, and in that respect Schneier's book is as good as any
| other.
| retrocryptid wrote:
| It's more of a dig on people who claim MD5 is okay because
| it's in Schneier's book. Not so much a dig on Schneier
| himself.
|
| And yes. I understand MD5 etag headers are common. I even
| understand that MD5 still passes the avalanche test --
| Robshaw's observations on MD5 in 1996 are just as valid
| today.
|
| But according to Sasaki & Aoki, //both// collisions and
| second pre-image calculation are faster than brute force.
| Maybe MD5 //shouldn't// be used if all you need is avalanche.
|
| It turns out that SHA256 has both collision and second pre-
| image calculation resistance //as well as// avalanche effect.
| Maybe use that one instead.
| fluoridation wrote:
| I don't remember the last time I saw anyone use MD5 for
| cryptography. It's still used a lot for data integrity,
| because it still works perfectly well for that and it's
| faster than the SHA family (when there's no hardware
| implementation).
| retrocryptid wrote:
| Though be careful about the use of the phrase "data
| integrity" -- comparing two files on a file-system by
| their MD5 hash is probably fine, but comparing the
| serialization of two PDUs over the wire based on their
| MD5 hash may be problematic.
|
| I think I understand your meaning to be MD5 certainly
| still exhibits an avalanche effect; changing a single bit
| in the input changes about half the bits in the output.
| And if you trust the way you retrieve the hashed data and
| the hash (like it's on a local hard drive) then yes, it's
| certainly acceptable for that use. But collisions and
| second pre-image generation being faster than brute force
| are why people generally don't want to use it (MD5) when
| it's use spans trust domains.
|
| My point is:
|
| a. Don't use MD5 just because Bruce Schneier published a
| popular book that said it was okay RIGHT BEFORE all the
| research damning it came out. (Personally... I think
| Bruce should publish a third edition of this text
| expressly to remove the bit about MD5 being okay. I
| cannot tell you how many hours of billable time I've
| wasted explaining to software engineers that no... MD5 is
| not recommended for use even though at one time it was
| considered acceptable. And if you know not to use MD5,
| you're not the software engineer I'm talking about.)
|
| b. You can use SHA256 and get avalanche, collision
| resistance AND second pre-image generation resistance.
| (Pretty sure you also get 1st pre-image generation
| resistance, but I haven't scanned the literature for that
| in a while.)
|
| And while I'm thinking about it, let me add these points:
|
| c. There are probably better hashing algorithms than MD5
| for use with a hash map/table/tree.
|
| d. If you're interested in how MD5 works, I recommend
| expanding the scope a bit and study Merkle-Damgard
| generally. Why MD5 has problems and other hash functions
| that make use of the Merkle-Damgard construction don't
| (or have different problems, or the same problems at
| different amounts of input) is pretty interesting.
|
| And yes, if you happen to have a MD5 hardware accelerator
| or petabytes of data and MD5 hashes already, it's hard to
| change that overnight.
| fluoridation wrote:
| >comparing the serialization of two PDUs over the wire
| based on their MD5
|
| I'm not sure what you mean by this, but this:
|
| >people generally don't want to use it (MD5) when it's
| use spans trust domains.
|
| is exactly what I mean by "cryptography". I.e. guarding
| against intentional tampering. Are there a lot of people
| using it for this purpose? I don't remember seeing one.
|
| >c. There are probably better hashing algorithms than MD5
| for use with a hash map/table/tree.
|
| For sure. Cryptographic functions (even obsolete ones)
| are almost always overkill and too slow for general data
| structures. Only use them if you can't find something
| more suitable for your data.
|
| >And yes, if you happen to have a MD5 hardware
| accelerator or petabytes of data and MD5 hashes already,
| it's hard to change that overnight.
|
| I was actually talking about SHA-256 acceleration, since
| I just saw like an hour ago that recent Intel CPUs have
| it. If your CPU has such instructions, by all means use
| it instead of software MD5, if all you need is data
| integrity.
| jabl wrote:
| Based on a quick test here on a 417MB .tar.gz file I had
| lying around, best of 5 times for various coreutils
| (8.32) implementations:
|
| - md5sum: 0.570s
|
| - sha1sum: 0.667s
|
| - sha256sum: 1.662s
|
| - sha512sum: 1.011s
|
| - b2sum: 0.486s
|
| - cksum: 0.982s
|
| In conclusion: b2sum is both the fastest and AFAIU
| considered secure.
| retrocryptid wrote:
| Moving from MD5 to SHA256 means we just about triple the
| amount of time it takes to generate a hash.
|
| But I suggest there are few applications where this speed
| improvement justifies the confusion I've seen in junior
| engineers who believe MD5 is okay because a. they use
| MD5SUM and b. Bruce Schneier said it was okay.
|
| Don't get me wrong. I trust that //YOU// will know not
| depend on a MD5 hash for anything where a bad guy can
| modify content over the wire. But... I'm going to go out
| on a limb and guess you're somewhat experienced. I worry
| about the kids who without the benefit of experience re-
| enact scenes from the cryptography edition of Lord of the
| Flies.
|
| But... if you know what you're doing... sure... use
| MD5... There's certainly no way your code will ever be
| used by a less experienced engineer, right?
| fluoridation wrote:
| Why are inexperienced engineers being allowed to mess
| with cryptographic primitives?
| jabl wrote:
| I tend to prefer b2sum (BLAKE2), as it's both fast and
| (at the time of writing this, AFAIK) secure. There are
| certainly situations where md5 is good enough, but yeah,
| I feel safer just using blake2 and not having to spend
| brain cycles thinking whether the hash needs to be
| cryptographically secure or not, and risk making the
| wrong judgement.
| alpaca128 wrote:
| Tried with a 3.5GB Ubuntu .iso file, results are similar
| though I also tried b3sum and that was even faster(just
| 1.8s vs. 19s for sha256sum & 6.7s for md5sum). So
| performance is definitely not an argument for md5.
| dylan604 wrote:
| Like everything else in the world, knowing your audience,
| intended use, etc plays a key role. If you're protecting data
| against state actors, then yes, MD5 is not good enough. If
| you're just trying to decide if a transfer of a file went
| across a network without munging the data, then MD5 is good
| enough. If you're trying to post that file onto something
| that is publicly facing and want to provide the person
| downloading the file that it has not been modified, MD5 is
| probably not the best method.
| dingosity wrote:
| The point the OP made is it's easy to construct a second
| preimage with MD5. So if a bad guy, even a bad guy with
| meager resources, modifies the hash or the data in transit,
| you won't be able to identify the change the MITM made in
| either the data or the hash while it was transmitted.
|
| What you're probably thinking is a digital signature, which
| combines a hash and an asymmetric cipher (or a keyed hash
| if you keep that key secret.)
|
| But even then a bad actor can construct a second pre-image
| with MD5 which would cause the digital signature validate
| as authentic even though it's been changed.
|
| And an adversary with meager computational resources could
| perform this hack. This is the OP's point. You don't have
| to be a state actor to do this.
| dylan604 wrote:
| Essentially, I was agreeing that MD5 is worthless for
| anything "secure" from 3rd party manipulation. However,
| it is fine for verification of non-adversarial purposes.
| It's much faster to run MD5 than SHA256 over a 60-120GB
| MOV file.
|
| Copying camera original footage from the card that is
| still warm from the camera and making copies to client
| deliverable storage before telling camera dept that it is
| okay to wipe the card and reuse is a thing I have done a
| lot. Speed is important. Security is not a concern at
| all.
|
| Copying a large MOV file from one storage pool to
| another, retrieving large media assets from tape archive,
| etc are also common situations where you just want to
| know that the copy is actually what you think it is from
| I/O errors during whatever transfer process. In these
| situations, I have zero concern that someone wearing a
| blackhat maliciously manipulated the data during the
| transfer.
| dingosity wrote:
| The cost of the calculation of the hash is dwarfed by the
| cost of copying to/from an SD card (or even nvme
| devices.) If you hash the file as you transfer it, you'll
| not notice the difference between MD5 and SHA256.
| stevewatson301 wrote:
| Further, SHA-256 operates on the same principle as well
| (repeated iterations of a one-way compression function,
| called a Merkle-Damgard iterative construction[1]). Learning
| how MD5 works is quite instructive in that regard.
|
| [1] https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd
| _co...
| EGreg wrote:
| I asked a while ago, whether it's feasible to get another file to
| generate a given hash.
|
| The answer is no. Not even with MD5.
|
| Just be very sure that this is the guarantee you are looking for.
| Often, for Merkle Trees etc. that is EXACTLY what is needed.
|
| Can someone craft input files (eg images) to fool your system?
| Yes, but only at their own expense.
|
| Sometimes if you want the system to be resilient even in the fact
| of malicious inputs then yes, you should use SHA256 and higher.
| sacrosancty wrote:
| They might not realize they're hurting themselves, which makes
| it a bug. Backblaze did this with one of these hash collisions.
| You could back up two different (colliding) files and when you
| tried to restore both, only two copies of the same file would
| be returned, the other one apparently lost forever.
|
| Why would you have colliding files? Perhaps because you're
| somebody working on hash collisions and have collected some
| examples.
| a-dub wrote:
| in the era of high fidelity generative models, i suspect that the
| future of media formats will be security forward with built-in
| protections against length extension attacks.
|
| i'm having a hard time imagining any other future than one where
| people only trust signed media, and media is possibly even signed
| in hardware by actual physical sensors/compressors.
| abetusk wrote:
| From the README:
|
| """
|
| Colliding any pair of files has been possible for many years, but
| it takes several hours each time, with no shortcut. This page
| provide tricks specific to file formats and precomputed collision
| prefixes to make collision instant. git clone. Run Script. Done.
|
| """
|
| Could anyone weigh in on whether these ideas can be generalized
| to speed up MD5 collisions in general?
| Retr0id wrote:
| What do you mean by this? You can't precompute all possible
| colliding blocks.
| dingosity wrote:
| They mean "compute the first pre-image."
| omk wrote:
| Have always thought of creating email addresses with colliding
| MD5 hashes only to see how Gravatar's MD5 URL reacts to it.
| omoikane wrote:
| See also: "Lifetimes of cryptographic hash functions" -
| https://valerieaurora.org/hash.html
|
| MD5 appears to be firmly in the "fun party trick" stage.
| waynesonfire wrote:
| "In 2007, the NIST launched the SHA-3 competition" and the
| following year, 2008 SHA-2 is labeled "Minor weakness".
|
| Well, great timing on that competition!
| magila wrote:
| That page hasn't aged all that well. The prediction that
| applications would need to switch to a new hash function "every
| few years" hasn't panned out. The feared improved attacks on
| SHA-2 have failed to materialize. Applications that chose SHA-2
| 20 years ago are still quite secure today.
|
| It seems we just weren't very good at designing hash functions
| in the 90s.
| ComplexSystems wrote:
| Good news for Bitcoin, as long as SHA-256 remains secure
| forever.
| mikessoft_gmail wrote:
| londons_explore wrote:
| Question for people into cryptography + data archiving....
|
| If I want to store data for 500 years, I want future people to be
| reasonably sure of the integrity of the data, both against 'bit
| rot', but also deliberate tampering.
|
| Is the best available approach to hash the data with a bunch of
| hash algorithms and publish all the hashes?
|
| Then if _any_ hash algorithm remains unbroken, the integrity of
| my data is certainly still good. An attacker would have to do a
| simultaneous preimage attack for _every_ hash algorithm I choose
| to break the scheme, which historically has never happened to my
| knowledge.
| squeaky-clean wrote:
| What if an attacker creates altered data and publishes it the
| same way by hashing it 50 times. Now you have two documents
| claiming to be the original and they each have 50 hashes
| guaranteeing the integrity of the data. What if the attacker
| did that with 1,000 fake documents.
|
| I think your best bet is just to publish it and also launch a
| copy into space with a 500 year orbit. If someone in the future
| tries to launch a similar data-comet with a shorter orbit, it
| will be visible in the trajectory. There's always the danger of
| them sending out a robotic space probe to mess with your data
| "in flight".
| [deleted]
| rainworld wrote:
| Symmetric cryptography has come a long way since MD5. Any one
| of SHA-256, SHA-3/SHAKE256, BLAKE2/3 would be fine.
| gnulinux wrote:
| For the next 500 years though? I suppose (as a crypto layman)
| it'll be fine, but given we don't know what will be broken
| within the next 500 years, isn't it natural to add redundancy
| and use multiple hash algorithms?
| dylan604 wrote:
| I'd be more concerned for what equipment would be used to
| store/read that data in 50 years let alone 500 to make this
| even a viable thing to do.
| rainworld wrote:
| Indeed. Hard to quantify the risk of one of these
| functions getting broken but it's very low. There's a
| million far more likely ways to lose that data.
| dheera wrote:
| > both against 'bit rot'
|
| MD5 is good enough if bit rot alone is the only thing you care
| about. It's still very hard to generate an MD5 collision with
| the same bit length, and doing so would generally require
| changing a VERY large number of bits to get a collision, which
| is not what happens with bit rot.
|
| It might even be mathematically impossible to find two files
| that have the same MD5 hash and differ by less than a certain
| amount of bytes, though I don't know the proper way to
| formalize this.
|
| (It's trivial to show for example that if you use any CRC as a
| hash, it is impossible to find a collision that differs by an
| edit distance of 1 and has the same length.)
|
| > but also deliberate tampering.
|
| You're never safe from this if you aren't the guardian of the
| official hash values. Someone could just change the file AND
| change the "official" hashes.
| traceroute66 wrote:
| With the caveat that I'm no data archiving guru...
|
| As a source of inspiration a good place to start might be B-LTA
| in the ETSI standards. I believe this takes into account the
| fact that the algorithm used for its creation might deprecated.
|
| IIRC the way it works is (basically) that signatures are given
| a finite validity period and you undertake periodic re-signing.
| Therefore you build up a audit trail history of signatures with
| current algos as you go along.
| vivegi wrote:
| Erasure coding combined with hashing. Since erasure coding uses
| data redundancy to help with recovery, bit rot with one
| stream/disk can be automatically detected and recovered
| (subject to supported levels of error correction).
|
| Hashing would be on top of it to ensure detection of
| tampering/errors. Using multiple hashes is also a good
| strategy.
| bawolff wrote:
| If its against (literal) bit rot, i would suggest things like
| ECC, so they could actually recover the damaged data.
|
| > Is the best available approach to hash the data with a bunch
| of hash algorithms and publish all the hashes?
|
| This is much less secure than you think it is. See
| https://www.iacr.org/archive/crypto2004/31520306/multicollis...
| Ayesh wrote:
| If you intend the data to be recoverable, consider adding
| recovery options to it. Error Correcting Codes (ECC), (with
| Reed-Solomon probably being the most common), can help verify
| and even recover data; helpful against bit-rot.
|
| WinRar 5 can do it, and I believe many backup software worth
| their salt can too.
| fluoridation wrote:
| It's impossible.
|
| You can send a message to someone else over an unreliable
| channel with a very good assurance that it hasn't been
| corrupted unintentionally, without needing a reliable channel.
|
| You can send a message to yourself in the future over an
| untrusted channel with a very good assurance that it hasn't
| been tampered with without needing a trusted channel, because
| you trust yourself and can thus remember trustable information.
|
| There's no way to send a message to someone else over an
| untrusted channel with a very good assurance that it hasn't
| been tampered with, without a trusted channel. You need some
| way to convey trustable information.
| wongarsu wrote:
| You can somewhat improve your odds by using many unreliable
| channels, requiring an attacker to compromise all of them.
|
| Not sure how to do that over 500 years though? Pass down the
| data as an heirloom, burry a SHA3 hash in a plaque below a
| building foundation, put a Blake3 hash on a granite tablet
| buried in the desert, write a SHA512 hash on the back of a
| Banksy painting. Put a BLAKE2b hash in your diary and get it
| accepted in a museum collection. The more you add, the more
| difficult it becomes for an attacker to find and alter all of
| them.
| fluoridation wrote:
| If you're hiding the hash where no one can find it, how are
| people going to check it? If you tell people where you're
| hiding it, what's the point of hiding it?
| wongarsu wrote:
| Hiding it can make sense as long as you are sure it's
| eventually found by chance, and the finder will recognize
| its significance. At some point the building foundation
| will be torn up, and precisely at that point the hash
| will be revealed. Any modification before that requires
| knowledge of it and a somewhat sophisticated plan to
| modify or destroy it.
|
| But really, you don't have to make those locations
| secret, you already get a lot of security by requiring an
| attacker to drive out into a desert to change that
| tablet, access a building foundation and tamper a Banksy
| painting. None of these are secure in the cryptographic
| sense, but even without secrecy that raises the bar for
| an attack substantially, and makes it more likely to be
| detected.
| fluoridation wrote:
| >Hiding it can make sense as long as you are sure it's
| eventually found by chance, and the finder will recognize
| its significance. At some point the building foundation
| will be torn up, and precisely at that point the hash
| will be revealed.
|
| The hash could be revealed _after_ the tampered message
| was found, and therefore _after_ it would have been
| necessary. Imagine some critical decision was taken based
| on incorrect information, for example.
|
| >you already get a lot of security by requiring an
| attacker to drive out into a desert to change that tablet
|
| If you think remoteness is sufficient protection then
| there's no need for hashes or anything like that. Just do
| what 9gag did and etch your message onto stone and bury
| it wherever. They found the approximate location of that
| slab fairly quickly, but who's going to go digging in
| Spain just to smash a limestone slab with some memes on
| it? If someone is willing to do that once, doing it one
| or more times again is not a lot more effort.
| zaphod420 wrote:
| "There's no way to send a message to someone else over an
| untrusted channel with a very good assurance that it hasn't
| been tampered with, without a trusted channel. You need some
| way to convey trustable information. "
|
| This is kind of what real p2p, open, permissionless,
| decentralized blockchains are good for...
| fluoridation wrote:
| Blockchains are not trusted channels. They don't permit
| secure communication between any node. That is to say, I
| can't send you a secret tamper-proof message over a
| blockchain if we haven't exchanged information over some
| other channel previously.
| Perseids wrote:
| Let's get creative! There is no need to secrecy of the hash.
| Pay celebrities to wear a dress with the hash printed on it.
| Influence a publisher to print the hash in all books for a
| year. Create a mystery about what the hash is about, so it is
| mentioned in lots of articles. Who is this eccentric
| billionaire that pays truckloads of money to spread these
| apparently useless 64 digits?!?
|
| (Disclaimer: Method not applicable to normal people.)
| fluoridation wrote:
| Yeah, making many copies makes sense. But if you're doing
| that, it's way easier to just make many copies of the
| original message. In other words, just publish it as a book
| and be done with it.
|
| Broadcasting the hash would make sense if the message
| should stay secret for some time. Although 500 years is
| rather a long time for a zero-knowledge proof.
| spiffytech wrote:
| For the bit rot part, this is what PAR2 is for. It generates
| recovery files, designed to repair the original file when
| arbitrary portions of it become malformed or lost.
| yifanl wrote:
| It's largely unknown if there's any emergent behaviour given
| H1(H2(m)), for any two hash functions H1, H2.
|
| It's fully possible that your Uberhash function has some
| vulnerability that can be easily exploited regardless of the
| underlying security of the individual hash functions.
| egeozcan wrote:
| Simultaneously finding collisions in multiple hash algorithms
| with 2 random inputs would be a hard task, and if one of those
| inputs is predetermined (your data), and the other needs to
| include a change that the attacker wants to see there? that
| really sounds impossible. But on the other hand, we are talking
| 500 years...
| Perseids wrote:
| The scenario is not finding collisions though. I assume
| londons_explore somehow finds a way to build a trusted
| relationship and channel with the future [1], say a piece of
| leather with the hashes burned into it that is ... stored
| publicly in Louvre with a ton of photo evidence spread all
| over the earth hence. As londons_explore is trusted, you only
| need Second Pre-image Resistance: Given a file (that was
| created outside of the attackers control) find a second file
| which has the same hash as the first file. Second pre-image
| attacks are super hard to accomplish. MD5 is still Second
| Preimage Resistant!
|
| Adding a second random input to the hash function, like you
| propose (that is to prime the internal state of the hash
| function with same random input, so when hashing of the usage
| data starts, the internal state of the hash function is
| unknown to the attacker) makes collision attacks much harder.
| In fact there is a term for that: target collision
| resistance. But second pre-image attacks, where the random
| input used for the hashing are known, don't get any harder.
| And if you don't transmit the random input over the secure
| channel as well, than second pre-image attacks get easier
| even (theoretically), as attackers have the possibility to
| manipulate the internal state outside of the usage data.
|
| Btw. reliance of target collision resistance, instead of
| collision resistance, is why Ed25519/Ed448 are much more
| resilient to problems with the hash function than ECDSA or
| any common RSA signature scheme. MD5 is still target
| collision resistant last time I checked. Remember the debacle
| of forged MD5 and later SHA1 certificates [2] ? Completely
| avoidable, if better signature schemes had been used.
|
| [1] Otherwise https://news.ycombinator.com/item?id=32912052
| applies. [2] https://www.win.tue.nl/hashclash/rogue-ca/
| nwellnhof wrote:
| How do you want to make sure that the hashes themselves haven't
| been tampered with?
| dingosity wrote:
| you don't know if it's the hash or the data that's been
| tampered with. so if the stored or transmitted hash doesn't
| match the computed hash, you reject it.
| jrootabega wrote:
| Tampering? That's harder. But to protect against corruption,
| since the hashes are shorter, you would be able to chisel
| them in different types of stone and glass and make 3+ copies
| of them. I suppose if you really want to protect against
| tampering, you could distribute so many copies that the
| ability to tamper with a majority of them would be very
| unlikely.
|
| But I think what was meant by "publishing" was that the
| hashes would be available to parties who would be willing and
| able to preserve them indefinitely.
| alex_sf wrote:
| Hash them, duh.
| dekhn wrote:
| Side note: I recently saw an example code using tensorflow to
| determine the private key of some cryptosystem. I can't find it.
| It was literally operating on the bits of the key and somehow had
| a loss function. Any ideas?
| bawolff wrote:
| I mean, if it worked it was probably for a really shitty
| cryptosystem, and hence less interesting.
| 1970-01-01 wrote:
| The most interesting part was collision with certificates.
| Platinum ticket for malware.
| Dwedit wrote:
| For anyone out there still using MD5 for any reason, check out
| this PDF file:
| https://www.alchemistowl.org/pocorgtfo/pocorgtfo14.pdf (42MB).
| You can also rename it to a .NES file and run it in a NES
| emulator.
|
| It's a PDF File which is also a NES ROM that displays its own MD5
| sum. The PDF also shows its own MD5 sum a few times. (The MD5 sum
| also happens to begin with 5EAF00D)
|
| When an arbitrary MD5 can be created that easily, it's useless
| for any cryptographic applications, or even any data integrity.
| pxx wrote:
| MD5 may not be collision resistant, but the logic in your
| conclusion is completely wrong. There is no feasible pre-image
| attack on MD5. The MD5 generated _was not arbitrary_; they seem
| to have just brute-forced a few of the leading bytes and then
| did a Nostradamus attack.
|
| Again: finding something that hashes to an arbitrary MD5 sum is
| still not known to be feasible. This isn't a particularly good
| reason to use MD5, but this also means that MD5 is not broken
| in the way you think it is.
|
| When somebody finds something that hashes to all zeroes you can
| finally say that MD5 is completely broken. It is not known to
| be at that point yet.
| jrootabega wrote:
| Disclaimer: This is a fun thought experiment. I'm not looking for
| actionable results, or advocating for relying on any of this
| comment for actual security. I'm clearly not a cryptographer; I
| just think it would be interesting to talk about here, and maybe
| more educated people could comment on how well these approaches
| might mitigate the exploits in the article. Play with me in this
| space.
|
| I'm curious if people have any interesting ideas on how to add
| some seasoning to MD5 to make it more secure. That is, simple,
| intuitive things you can do in combination with MD5 such that all
| the pieces in your scheme are still easily understood and don't
| amount to a new hash algorithm that can only be understood as a
| black box. Pretend MD5 is the only hash algorithm that has ever
| been found. Or that you're the Gilligan's Island Professor and
| MD5 hashes are your coconuts. What are the most potentially
| useful things you can build out of the most primitive, dumb
| components?
|
| For example:
|
| - Output the length of the input (or a hash of the length if you
| must have a constant-length output)
|
| - Hash the input forwards and backwards and produce two hashes.
| (Remembering that, though the output is 256 bits now, you still
| only have coconuts to work with.)
|
| - Include more complicated variations on the input in the hashes.
| e.g. start in the middle and oscillate forward and backward over
| the input, or move the second half of the input in front of the
| first before hashing, or use the input/hash of the input to seed
| a pseudorandom re-ordering of the input before hashing, etc.
|
| - Format-aware hashing - whatever program will interpret the
| content of the file can also produce a hash, or some [canonical]
| interpretation of the content that can be hashed. e.g., for an
| image format, we could ask the renderer how many iterations of
| some operation it had to perform to render the output, or in the
| worst case, hash the bitmap it produced.
| adrian_b wrote:
| Nowadays it would be a complete waste of time to try to improve
| MD5, because there is no reason to use it for anything.
|
| Most modern CPUs from Intel, AMD or ARM implement in hardware
| at least SHA-1 and SHA-256, so these both are faster than MD5
| implemented in software.
|
| If you only need 128 bits, you can truncate one of the longer
| hashes to 128 bits.
|
| SHA-1 is the fastest in the current CPUs, so even if it is also
| obsolete for applications where there is an adversary who
| attempts to find collisions, it remains a good choice for
| various non-cryptographic applications that need a long hash or
| random numbers.
|
| SHA-1 itself was an attempt to improve on MD5. Even if it was
| not entirely successful, it remains many orders of magnitude
| stronger than MD5.
|
| So if you want to know how MD5 can be improved, the best way is
| to compare the MD5 and the SHA-1 algorithms, and look at what
| was changed between them.
|
| Another improved MD5 is the RIPEMD-160 algorithm, so comparing
| RIPEMD-160 with MD5 is another source for seeing how MD5 can be
| improved, by different methods.
| TedDoesntTalk wrote:
| > Nowadays it would be a complete waste of time to try to
| improve MD5
|
| Not as a thought experiment. Thought experiments help you to
| grow.
| adrian_b wrote:
| I agree about the usefulness of thought experiments, but
| there are many other much more useful thought experiments,
| even in the domain of hashes.
|
| A large number of people have thought for several years
| about how to improve MD5, and the results were SHA-1 and
| RIPEMD-160.
|
| It is very instructive to study the evolution from MD4 to
| MD5 and to SHA-1 and RIPEMD-160, but it is very unlikely
| that attempting 30 years later to do something better than
| those, without completely changing a hash algorithm
| structure that is now well understood as being
| inappropriate, can teach you anything.
| bombcar wrote:
| The useful thought experiments are often to realize WHY
| the "trick to fix MD5" you came up with doesn't work.
| bawolff wrote:
| People are giving you shit, but people thinking about this
| sort of thing has yielded useful and interesting things
| like https://marc-stevens.nl/research/papers/C13-S.pdf
|
| Of course in production, just use sha256, but there is
| nothing wrong with thinking about unorthodox solutions.
| AgentME wrote:
| >- Output the length of the input (or a hash of the length if
| you must have a constant-length output)
|
| I think all of the known MD5 collisions (and SHA-1 collisions
| even) are of inputs that are the same length.
| jrootabega wrote:
| Interesting. I submitted by original comment with the
| misconception that it second preimage attacks against MD5
| were feasible and were being used by the article. So I see
| how in the case where you want to defend against collisions
| (e.g. a malicious certificate signing request), length
| awareness might not help. I'm not sure if that would apply to
| a second preimage attack, though.
| bawolff wrote:
| For sha1, people made a system where you can detect the
| patterns that lead to a collision, and (for example) replace it
| with a different hash only for inputs that would be a problem.
| https://github.com/cr-marcstevens/sha1collisiondetection i
| think git does this to eek more life out of sha1.
|
| I imagine you could take a similar counter-cryptnalysis
| approach to md5. (I am out of my depth here, so there could be
| reasons this doesnt work for md5 im unaware of)
| jrootabega wrote:
| Very cool! I searched for it here, and found some comments
| that imply it's possible to detect these attacks in MD5, but
| nothing too detailed. Much more advanced than I was thinking,
| but still hacky enough that it satisfies me!
| dingosity wrote:
| Might be useful to look up the prior research. Search for
| Merkle-Damgard. And the IACR ePrint archive has quite a few
| interesting reads: https://eprint.iacr.org/
|
| I get it that you're just hacking around, but reading what the
| olds said about MD5 is useful.
| dingosity wrote:
| Oh. Also. What you're doing here is "finding a second pre-
| image" which is slightly different from finding a collision.
| A second pre-image is where you already have a message and a
| hash and you want to find a second message that hashes to the
| same thing. Technically speaking, a collision is just finding
| two messages that hash to the same thing, but you don't
| necessarily get to pick what the two messages are.
|
| I mention it not to be an academic snob, but because when you
| read the literature, there are plenty of examples where you
| sort of have to know the two things are distinct, at least to
| the academics.
| jrootabega wrote:
| Thanks for pointing that out. It's definitely important to
| understand the difference, and not just academically. It
| helps us understand that the consequences of MD5 being
| broken have only been proven to work for collision attacks,
| and not preimage attacks. (But I still don't think using
| MD5 when you can use more secure alternatives is a good
| idea.)
| masklinn wrote:
| ... why are you trying to "season MD5 to make it more secure"?
| Calls to switch off of MD5 started in 1996, and MD5 has been
| considered broken for some 15 years.
|
| There are non-compromised hashes, use that.
| chrismarlow9 wrote:
| Speed and resource limitations. But overall I agree with you.
| Retr0id wrote:
| > BLAKE2b is faster than MD5, SHA-1, SHA-2, and SHA-3, on
| 64-bit x86-64 and ARM architectures. BLAKE2 provides better
| security than SHA-2 and similar to that of SHA-3
|
| https://en.wikipedia.org/wiki/BLAKE_(hash_function)#BLAKE2
| yencabulator wrote:
| And BLAKE3 is a lot faster than that.
| waynesonfire wrote:
| so after reading your comment I was curious why SHA-3 won
| the NIST competition,
|
| https://csrc.nist.gov/csrc/media/projects/hash-
| functions/doc...
|
| "NIST chose KECCAK over the four other excellent
| finalists for its elegant design, large security margin,
| good general performance, excellent efficiency in
| hardware implementations, and for its flexibility"
| Retr0id wrote:
| The real reason is that SHA-3 is more different to SHA-2
| in terms of fundamental design. The SHA-3 contest was
| launched shortly after SHA-1 was broken, and was partly
| just a hedge against SHA-2 being broken too.
| xani_ wrote:
| Or NSA decided to backdoor SHA3
| chrismarlow9 wrote:
| Cool I was not aware of this. Sha2 is just my immediate
| go to and haven't re evaluated any other options in a
| long time.
| jrootabega wrote:
| Because it's interesting to think about. It's hacking - doing
| something with technology that was not intended, and
| sometimes ill-advised, to see what kind of interesting
| results we can get.
| groby_b wrote:
| Then go hack.
|
| The point is, it's not an interesting hack for the vast
| majority of the audience. Because they've already seen this
| play out over and over and over.
|
| It's like sticking your hand in a grinder and yelling "for
| science". The result might be new for you, and it's hacking
| by your definition, but I'm fairly certain the rest of the
| audience is OK not investing time into the experiment.
| jrootabega wrote:
| I have no intention of sticking my metaphorical hand, or
| the hand of anyone, into a metaphorical grinder.
|
| But I will point out that it was someone who thought
| about, and experimented with, sticking fleshy pieces into
| power tools, that invented the SawStop. :)
| masklinn wrote:
| You're just throwing random shit at the wall, it's not
| hacking, it's just coldfusion.
| [deleted]
| LawTalkingGuy wrote:
| There's a difference between bringing MD5 up to modern strength
| and simply blocking the attempts of attackers who've attained
| this MD5 forging kit. The later _is_ an interesting though
| experiment.
|
| The point of a hash is to avoid feeding malicious data further
| into your pipeline so solutions that involve parsing the file
| and hashing its actual data-streams aren't a good idea. We'll
| focus on detecting the change before looking at the data.
|
| To make the attack harder, use file formats without expandable
| or optional sections. This attack works by parsing known file
| formats and making allowable changes. If you had encryption as
| one of your coconuts it would make this easier by making the
| whole file opaque, but the encryption could also be used to
| generate a hash-like construction so if you have good enough
| encryption you wouldn't be stuck with MD5 and there wouldn't be
| much thought experiment left...
|
| To stop this specific attack, assuming you had to use these
| formats, using MD5(file+reversed_file) or
| MD5(file)+MD5(reversed_file) or MD5('secret'+file) would work.
| This is massively beyond what a most people could make the
| tools do but it's probably not that much cryptographically
| harder.
|
| If your solution was a secret it would be pretty effective. The
| problem is that if you don't have encryption then they're
| watching you communicate. They can see the hash sum you tell
| the recipient to expect, and if this is the first time you've
| used this scheme, the steps to use to check it. But even if
| they don't observe this though, you're only using a small set
| of non-crypto operations and they can just try millions of
| combos (reverse the file, append a second copy, interleave
| bytes, etc.) and see if anything produces the same hash. Then
| they have discovered your algorithm and they can plan to modify
| their tools to perform the attack.
| jrootabega wrote:
| Oh, I think secrecy would be against the spirit of the
| experiment. And yeah, I wonder what kind of mods they would
| need to make to their tools in response. If it takes 10 years
| to generate the collisions, then it seems like it might be a
| worthwhile approach (in the thought experiment universe
| only).
|
| But we should also recognize that the article demonstrates
| collision attacks, not preimage attacks, which would be the
| attack you want to worry about if you are using a trusted
| hash to verify a file you received over an untrusted channel.
| Retr0id wrote:
| The simplest way to improve MD5 would be to add more rounds
| (but as others have mentioned, there are much better choices
| for any practical purpose)
|
| RE your last point, I'm not quite sure what attack you're
| defending against there, but most file formats do not have a
| well-defined "canonical interpretation", much less so one that
| is serializable into bytes. (If you think it sounds easy, you
| haven't thought about it hard enough :P )
| jrootabega wrote:
| Well, for images I still think that there is such a thing,
| even if it's the least interesting example: the bitmap they
| ultimately render. (For formats that are allowed to produce
| different bitmaps, it gets more difficult and maybe
| impossible.)
|
| For the general case of any file format: I agree this
| approach is the least simple/dumb/trivial and might even
| violate the spirit of my original comment itself. But it's
| still interesting. By "[canonical] interpretation", I just
| meant some way to fingerprint the content while understanding
| the format. e.g., if it's a tarball, sum the total number of
| files and directories inside it. Concatenate all their names
| in a well-defined order and hash that. I know you can't
| prevent collisions entirely, but it may be relatively cheap
| to make it so that 2 files that are different (with respect
| to the file format) are likely to be represented differently.
| Retr0id wrote:
| If for a moment we assume that you _can_ do it reliably
| (which I personally doubt, even for "simple" formats) -
| what's the point? Why not just hash the original file?
| What's the benefit here?
|
| For example, I could create two different PNGs that decode
| to the same bitmap. Or I could create one PNG that decodes
| to multiple _different_ bitmaps, depending on which
| implementation decodes it (due to implementation bugs and
| /or under-defined areas of the specification). Or I could
| create a PNG that is also a valid ZIP archive.
| jrootabega wrote:
| There's no security benefit, and I would have a hard time
| coming up with a practicality benefit. It's mostly just
| interesting to think about, especially in response to the
| article. The article is demonstrating fast MD5 second
| preimage attacks for various file formats (EDIT:
| apparently not preimage attacks, just collision attacks),
| so in response to that I'm wondering how these
| MD5-specific attacks might be mitigated, for fun.
| Consider it alternate history fiction in which we never
| discovered anything better than MD5 :)
|
| In your examples, though, :
|
| > two different PNGs that decode to the same bitmap
|
| But would the the PNGs also have the same MD5 hash?
|
| > one PNG that decodes to multiple different bitmaps,
| depending on which implementation decodes it
|
| Yeah, that would be a challenge. Relying on
| implementation details, or results which are allowed to
| vary, wouldn't work. But since this is meant to
| supplement an existing MD5 hash, the idea is that the
| format consumer/interpreter would be in a good position
| to produce some format-aware fingerprint that is
| statistically likely enough to be different when the
| inputs are different.
| Retr0id wrote:
| > But would the the PNGs also have the same MD5 hash?
|
| Yes, I could construct this trivially, because MD5 is
| broken.
| jrootabega wrote:
| Ah, OK, I think I misunderstood the article. If you are
| supplying both images to me, you could do that with the
| MD5 hashes. Although, I think if you could get them to
| generate the same bitmap, then the attack has been at
| least partially mitigated, by definition. Not completely,
| I admit, but I think it wouldn't qualify as the same
| attack shown in the article.
___________________________________________________________________
(page generated 2022-09-20 23:01 UTC)