[HN Gopher] Hash collisions and exploitations - Instant MD5 coll...
       ___________________________________________________________________
        
       Hash collisions and exploitations - Instant MD5 collision
        
       Author : losfair
       Score  : 121 points
       Date   : 2022-09-20 11:07 UTC (11 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | retrocryptid wrote:
       | And yet MD5 is still recommended in Applied Cryptography.
        
         | traceroute66 wrote:
         | > And yet MD5 is still recommended in Applied Cryptography.
         | 
         | I'm guessing this is the usual HN dig at Schneier's book ?
         | 
         | Fact is that Schneier was already warning against MD5 as far
         | back as 1996. And if you look on more recent blog posts, he is
         | not exactly silent on dissuading people against MD5.
         | 
         | Fact is that like it or not, even today, MD5 still sees usage
         | in the crypto landscape (see AWS S3 Content-MD5, for example).
         | Therefore being able to understand how it works is still
         | relevant, and in that respect Schneier's book is as good as any
         | other.
        
           | retrocryptid wrote:
           | It's more of a dig on people who claim MD5 is okay because
           | it's in Schneier's book. Not so much a dig on Schneier
           | himself.
           | 
           | And yes. I understand MD5 etag headers are common. I even
           | understand that MD5 still passes the avalanche test --
           | Robshaw's observations on MD5 in 1996 are just as valid
           | today.
           | 
           | But according to Sasaki & Aoki, //both// collisions and
           | second pre-image calculation are faster than brute force.
           | Maybe MD5 //shouldn't// be used if all you need is avalanche.
           | 
           | It turns out that SHA256 has both collision and second pre-
           | image calculation resistance //as well as// avalanche effect.
           | Maybe use that one instead.
        
             | fluoridation wrote:
             | I don't remember the last time I saw anyone use MD5 for
             | cryptography. It's still used a lot for data integrity,
             | because it still works perfectly well for that and it's
             | faster than the SHA family (when there's no hardware
             | implementation).
        
               | retrocryptid wrote:
               | Though be careful about the use of the phrase "data
               | integrity" -- comparing two files on a file-system by
               | their MD5 hash is probably fine, but comparing the
               | serialization of two PDUs over the wire based on their
               | MD5 hash may be problematic.
               | 
               | I think I understand your meaning to be MD5 certainly
               | still exhibits an avalanche effect; changing a single bit
               | in the input changes about half the bits in the output.
               | And if you trust the way you retrieve the hashed data and
               | the hash (like it's on a local hard drive) then yes, it's
               | certainly acceptable for that use. But collisions and
               | second pre-image generation being faster than brute force
               | are why people generally don't want to use it (MD5) when
               | it's use spans trust domains.
               | 
               | My point is:
               | 
               | a. Don't use MD5 just because Bruce Schneier published a
               | popular book that said it was okay RIGHT BEFORE all the
               | research damning it came out. (Personally... I think
               | Bruce should publish a third edition of this text
               | expressly to remove the bit about MD5 being okay. I
               | cannot tell you how many hours of billable time I've
               | wasted explaining to software engineers that no... MD5 is
               | not recommended for use even though at one time it was
               | considered acceptable. And if you know not to use MD5,
               | you're not the software engineer I'm talking about.)
               | 
               | b. You can use SHA256 and get avalanche, collision
               | resistance AND second pre-image generation resistance.
               | (Pretty sure you also get 1st pre-image generation
               | resistance, but I haven't scanned the literature for that
               | in a while.)
               | 
               | And while I'm thinking about it, let me add these points:
               | 
               | c. There are probably better hashing algorithms than MD5
               | for use with a hash map/table/tree.
               | 
               | d. If you're interested in how MD5 works, I recommend
               | expanding the scope a bit and study Merkle-Damgard
               | generally. Why MD5 has problems and other hash functions
               | that make use of the Merkle-Damgard construction don't
               | (or have different problems, or the same problems at
               | different amounts of input) is pretty interesting.
               | 
               | And yes, if you happen to have a MD5 hardware accelerator
               | or petabytes of data and MD5 hashes already, it's hard to
               | change that overnight.
        
               | fluoridation wrote:
               | >comparing the serialization of two PDUs over the wire
               | based on their MD5
               | 
               | I'm not sure what you mean by this, but this:
               | 
               | >people generally don't want to use it (MD5) when it's
               | use spans trust domains.
               | 
               | is exactly what I mean by "cryptography". I.e. guarding
               | against intentional tampering. Are there a lot of people
               | using it for this purpose? I don't remember seeing one.
               | 
               | >c. There are probably better hashing algorithms than MD5
               | for use with a hash map/table/tree.
               | 
               | For sure. Cryptographic functions (even obsolete ones)
               | are almost always overkill and too slow for general data
               | structures. Only use them if you can't find something
               | more suitable for your data.
               | 
               | >And yes, if you happen to have a MD5 hardware
               | accelerator or petabytes of data and MD5 hashes already,
               | it's hard to change that overnight.
               | 
               | I was actually talking about SHA-256 acceleration, since
               | I just saw like an hour ago that recent Intel CPUs have
               | it. If your CPU has such instructions, by all means use
               | it instead of software MD5, if all you need is data
               | integrity.
        
               | jabl wrote:
               | Based on a quick test here on a 417MB .tar.gz file I had
               | lying around, best of 5 times for various coreutils
               | (8.32) implementations:
               | 
               | - md5sum: 0.570s
               | 
               | - sha1sum: 0.667s
               | 
               | - sha256sum: 1.662s
               | 
               | - sha512sum: 1.011s
               | 
               | - b2sum: 0.486s
               | 
               | - cksum: 0.982s
               | 
               | In conclusion: b2sum is both the fastest and AFAIU
               | considered secure.
        
               | retrocryptid wrote:
               | Moving from MD5 to SHA256 means we just about triple the
               | amount of time it takes to generate a hash.
               | 
               | But I suggest there are few applications where this speed
               | improvement justifies the confusion I've seen in junior
               | engineers who believe MD5 is okay because a. they use
               | MD5SUM and b. Bruce Schneier said it was okay.
               | 
               | Don't get me wrong. I trust that //YOU// will know not
               | depend on a MD5 hash for anything where a bad guy can
               | modify content over the wire. But... I'm going to go out
               | on a limb and guess you're somewhat experienced. I worry
               | about the kids who without the benefit of experience re-
               | enact scenes from the cryptography edition of Lord of the
               | Flies.
               | 
               | But... if you know what you're doing... sure... use
               | MD5... There's certainly no way your code will ever be
               | used by a less experienced engineer, right?
        
               | fluoridation wrote:
               | Why are inexperienced engineers being allowed to mess
               | with cryptographic primitives?
        
               | jabl wrote:
               | I tend to prefer b2sum (BLAKE2), as it's both fast and
               | (at the time of writing this, AFAIK) secure. There are
               | certainly situations where md5 is good enough, but yeah,
               | I feel safer just using blake2 and not having to spend
               | brain cycles thinking whether the hash needs to be
               | cryptographically secure or not, and risk making the
               | wrong judgement.
        
               | alpaca128 wrote:
               | Tried with a 3.5GB Ubuntu .iso file, results are similar
               | though I also tried b3sum and that was even faster(just
               | 1.8s vs. 19s for sha256sum & 6.7s for md5sum). So
               | performance is definitely not an argument for md5.
        
           | dylan604 wrote:
           | Like everything else in the world, knowing your audience,
           | intended use, etc plays a key role. If you're protecting data
           | against state actors, then yes, MD5 is not good enough. If
           | you're just trying to decide if a transfer of a file went
           | across a network without munging the data, then MD5 is good
           | enough. If you're trying to post that file onto something
           | that is publicly facing and want to provide the person
           | downloading the file that it has not been modified, MD5 is
           | probably not the best method.
        
             | dingosity wrote:
             | The point the OP made is it's easy to construct a second
             | preimage with MD5. So if a bad guy, even a bad guy with
             | meager resources, modifies the hash or the data in transit,
             | you won't be able to identify the change the MITM made in
             | either the data or the hash while it was transmitted.
             | 
             | What you're probably thinking is a digital signature, which
             | combines a hash and an asymmetric cipher (or a keyed hash
             | if you keep that key secret.)
             | 
             | But even then a bad actor can construct a second pre-image
             | with MD5 which would cause the digital signature validate
             | as authentic even though it's been changed.
             | 
             | And an adversary with meager computational resources could
             | perform this hack. This is the OP's point. You don't have
             | to be a state actor to do this.
        
               | dylan604 wrote:
               | Essentially, I was agreeing that MD5 is worthless for
               | anything "secure" from 3rd party manipulation. However,
               | it is fine for verification of non-adversarial purposes.
               | It's much faster to run MD5 than SHA256 over a 60-120GB
               | MOV file.
               | 
               | Copying camera original footage from the card that is
               | still warm from the camera and making copies to client
               | deliverable storage before telling camera dept that it is
               | okay to wipe the card and reuse is a thing I have done a
               | lot. Speed is important. Security is not a concern at
               | all.
               | 
               | Copying a large MOV file from one storage pool to
               | another, retrieving large media assets from tape archive,
               | etc are also common situations where you just want to
               | know that the copy is actually what you think it is from
               | I/O errors during whatever transfer process. In these
               | situations, I have zero concern that someone wearing a
               | blackhat maliciously manipulated the data during the
               | transfer.
        
               | dingosity wrote:
               | The cost of the calculation of the hash is dwarfed by the
               | cost of copying to/from an SD card (or even nvme
               | devices.) If you hash the file as you transfer it, you'll
               | not notice the difference between MD5 and SHA256.
        
           | stevewatson301 wrote:
           | Further, SHA-256 operates on the same principle as well
           | (repeated iterations of a one-way compression function,
           | called a Merkle-Damgard iterative construction[1]). Learning
           | how MD5 works is quite instructive in that regard.
           | 
           | [1] https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd
           | _co...
        
       | EGreg wrote:
       | I asked a while ago, whether it's feasible to get another file to
       | generate a given hash.
       | 
       | The answer is no. Not even with MD5.
       | 
       | Just be very sure that this is the guarantee you are looking for.
       | Often, for Merkle Trees etc. that is EXACTLY what is needed.
       | 
       | Can someone craft input files (eg images) to fool your system?
       | Yes, but only at their own expense.
       | 
       | Sometimes if you want the system to be resilient even in the fact
       | of malicious inputs then yes, you should use SHA256 and higher.
        
         | sacrosancty wrote:
         | They might not realize they're hurting themselves, which makes
         | it a bug. Backblaze did this with one of these hash collisions.
         | You could back up two different (colliding) files and when you
         | tried to restore both, only two copies of the same file would
         | be returned, the other one apparently lost forever.
         | 
         | Why would you have colliding files? Perhaps because you're
         | somebody working on hash collisions and have collected some
         | examples.
        
       | a-dub wrote:
       | in the era of high fidelity generative models, i suspect that the
       | future of media formats will be security forward with built-in
       | protections against length extension attacks.
       | 
       | i'm having a hard time imagining any other future than one where
       | people only trust signed media, and media is possibly even signed
       | in hardware by actual physical sensors/compressors.
        
       | abetusk wrote:
       | From the README:
       | 
       | """
       | 
       | Colliding any pair of files has been possible for many years, but
       | it takes several hours each time, with no shortcut. This page
       | provide tricks specific to file formats and precomputed collision
       | prefixes to make collision instant. git clone. Run Script. Done.
       | 
       | """
       | 
       | Could anyone weigh in on whether these ideas can be generalized
       | to speed up MD5 collisions in general?
        
         | Retr0id wrote:
         | What do you mean by this? You can't precompute all possible
         | colliding blocks.
        
           | dingosity wrote:
           | They mean "compute the first pre-image."
        
       | omk wrote:
       | Have always thought of creating email addresses with colliding
       | MD5 hashes only to see how Gravatar's MD5 URL reacts to it.
        
       | omoikane wrote:
       | See also: "Lifetimes of cryptographic hash functions" -
       | https://valerieaurora.org/hash.html
       | 
       | MD5 appears to be firmly in the "fun party trick" stage.
        
         | waynesonfire wrote:
         | "In 2007, the NIST launched the SHA-3 competition" and the
         | following year, 2008 SHA-2 is labeled "Minor weakness".
         | 
         | Well, great timing on that competition!
        
         | magila wrote:
         | That page hasn't aged all that well. The prediction that
         | applications would need to switch to a new hash function "every
         | few years" hasn't panned out. The feared improved attacks on
         | SHA-2 have failed to materialize. Applications that chose SHA-2
         | 20 years ago are still quite secure today.
         | 
         | It seems we just weren't very good at designing hash functions
         | in the 90s.
        
           | ComplexSystems wrote:
           | Good news for Bitcoin, as long as SHA-256 remains secure
           | forever.
        
       | mikessoft_gmail wrote:
        
       | londons_explore wrote:
       | Question for people into cryptography + data archiving....
       | 
       | If I want to store data for 500 years, I want future people to be
       | reasonably sure of the integrity of the data, both against 'bit
       | rot', but also deliberate tampering.
       | 
       | Is the best available approach to hash the data with a bunch of
       | hash algorithms and publish all the hashes?
       | 
       | Then if _any_ hash algorithm remains unbroken, the integrity of
       | my data is certainly still good. An attacker would have to do a
       | simultaneous preimage attack for _every_ hash algorithm I choose
       | to break the scheme, which historically has never happened to my
       | knowledge.
        
         | squeaky-clean wrote:
         | What if an attacker creates altered data and publishes it the
         | same way by hashing it 50 times. Now you have two documents
         | claiming to be the original and they each have 50 hashes
         | guaranteeing the integrity of the data. What if the attacker
         | did that with 1,000 fake documents.
         | 
         | I think your best bet is just to publish it and also launch a
         | copy into space with a 500 year orbit. If someone in the future
         | tries to launch a similar data-comet with a shorter orbit, it
         | will be visible in the trajectory. There's always the danger of
         | them sending out a robotic space probe to mess with your data
         | "in flight".
        
         | [deleted]
        
         | rainworld wrote:
         | Symmetric cryptography has come a long way since MD5. Any one
         | of SHA-256, SHA-3/SHAKE256, BLAKE2/3 would be fine.
        
           | gnulinux wrote:
           | For the next 500 years though? I suppose (as a crypto layman)
           | it'll be fine, but given we don't know what will be broken
           | within the next 500 years, isn't it natural to add redundancy
           | and use multiple hash algorithms?
        
             | dylan604 wrote:
             | I'd be more concerned for what equipment would be used to
             | store/read that data in 50 years let alone 500 to make this
             | even a viable thing to do.
        
               | rainworld wrote:
               | Indeed. Hard to quantify the risk of one of these
               | functions getting broken but it's very low. There's a
               | million far more likely ways to lose that data.
        
         | dheera wrote:
         | > both against 'bit rot'
         | 
         | MD5 is good enough if bit rot alone is the only thing you care
         | about. It's still very hard to generate an MD5 collision with
         | the same bit length, and doing so would generally require
         | changing a VERY large number of bits to get a collision, which
         | is not what happens with bit rot.
         | 
         | It might even be mathematically impossible to find two files
         | that have the same MD5 hash and differ by less than a certain
         | amount of bytes, though I don't know the proper way to
         | formalize this.
         | 
         | (It's trivial to show for example that if you use any CRC as a
         | hash, it is impossible to find a collision that differs by an
         | edit distance of 1 and has the same length.)
         | 
         | > but also deliberate tampering.
         | 
         | You're never safe from this if you aren't the guardian of the
         | official hash values. Someone could just change the file AND
         | change the "official" hashes.
        
         | traceroute66 wrote:
         | With the caveat that I'm no data archiving guru...
         | 
         | As a source of inspiration a good place to start might be B-LTA
         | in the ETSI standards. I believe this takes into account the
         | fact that the algorithm used for its creation might deprecated.
         | 
         | IIRC the way it works is (basically) that signatures are given
         | a finite validity period and you undertake periodic re-signing.
         | Therefore you build up a audit trail history of signatures with
         | current algos as you go along.
        
         | vivegi wrote:
         | Erasure coding combined with hashing. Since erasure coding uses
         | data redundancy to help with recovery, bit rot with one
         | stream/disk can be automatically detected and recovered
         | (subject to supported levels of error correction).
         | 
         | Hashing would be on top of it to ensure detection of
         | tampering/errors. Using multiple hashes is also a good
         | strategy.
        
         | bawolff wrote:
         | If its against (literal) bit rot, i would suggest things like
         | ECC, so they could actually recover the damaged data.
         | 
         | > Is the best available approach to hash the data with a bunch
         | of hash algorithms and publish all the hashes?
         | 
         | This is much less secure than you think it is. See
         | https://www.iacr.org/archive/crypto2004/31520306/multicollis...
        
         | Ayesh wrote:
         | If you intend the data to be recoverable, consider adding
         | recovery options to it. Error Correcting Codes (ECC), (with
         | Reed-Solomon probably being the most common), can help verify
         | and even recover data; helpful against bit-rot.
         | 
         | WinRar 5 can do it, and I believe many backup software worth
         | their salt can too.
        
         | fluoridation wrote:
         | It's impossible.
         | 
         | You can send a message to someone else over an unreliable
         | channel with a very good assurance that it hasn't been
         | corrupted unintentionally, without needing a reliable channel.
         | 
         | You can send a message to yourself in the future over an
         | untrusted channel with a very good assurance that it hasn't
         | been tampered with without needing a trusted channel, because
         | you trust yourself and can thus remember trustable information.
         | 
         | There's no way to send a message to someone else over an
         | untrusted channel with a very good assurance that it hasn't
         | been tampered with, without a trusted channel. You need some
         | way to convey trustable information.
        
           | wongarsu wrote:
           | You can somewhat improve your odds by using many unreliable
           | channels, requiring an attacker to compromise all of them.
           | 
           | Not sure how to do that over 500 years though? Pass down the
           | data as an heirloom, burry a SHA3 hash in a plaque below a
           | building foundation, put a Blake3 hash on a granite tablet
           | buried in the desert, write a SHA512 hash on the back of a
           | Banksy painting. Put a BLAKE2b hash in your diary and get it
           | accepted in a museum collection. The more you add, the more
           | difficult it becomes for an attacker to find and alter all of
           | them.
        
             | fluoridation wrote:
             | If you're hiding the hash where no one can find it, how are
             | people going to check it? If you tell people where you're
             | hiding it, what's the point of hiding it?
        
               | wongarsu wrote:
               | Hiding it can make sense as long as you are sure it's
               | eventually found by chance, and the finder will recognize
               | its significance. At some point the building foundation
               | will be torn up, and precisely at that point the hash
               | will be revealed. Any modification before that requires
               | knowledge of it and a somewhat sophisticated plan to
               | modify or destroy it.
               | 
               | But really, you don't have to make those locations
               | secret, you already get a lot of security by requiring an
               | attacker to drive out into a desert to change that
               | tablet, access a building foundation and tamper a Banksy
               | painting. None of these are secure in the cryptographic
               | sense, but even without secrecy that raises the bar for
               | an attack substantially, and makes it more likely to be
               | detected.
        
               | fluoridation wrote:
               | >Hiding it can make sense as long as you are sure it's
               | eventually found by chance, and the finder will recognize
               | its significance. At some point the building foundation
               | will be torn up, and precisely at that point the hash
               | will be revealed.
               | 
               | The hash could be revealed _after_ the tampered message
               | was found, and therefore _after_ it would have been
               | necessary. Imagine some critical decision was taken based
               | on incorrect information, for example.
               | 
               | >you already get a lot of security by requiring an
               | attacker to drive out into a desert to change that tablet
               | 
               | If you think remoteness is sufficient protection then
               | there's no need for hashes or anything like that. Just do
               | what 9gag did and etch your message onto stone and bury
               | it wherever. They found the approximate location of that
               | slab fairly quickly, but who's going to go digging in
               | Spain just to smash a limestone slab with some memes on
               | it? If someone is willing to do that once, doing it one
               | or more times again is not a lot more effort.
        
           | zaphod420 wrote:
           | "There's no way to send a message to someone else over an
           | untrusted channel with a very good assurance that it hasn't
           | been tampered with, without a trusted channel. You need some
           | way to convey trustable information. "
           | 
           | This is kind of what real p2p, open, permissionless,
           | decentralized blockchains are good for...
        
             | fluoridation wrote:
             | Blockchains are not trusted channels. They don't permit
             | secure communication between any node. That is to say, I
             | can't send you a secret tamper-proof message over a
             | blockchain if we haven't exchanged information over some
             | other channel previously.
        
           | Perseids wrote:
           | Let's get creative! There is no need to secrecy of the hash.
           | Pay celebrities to wear a dress with the hash printed on it.
           | Influence a publisher to print the hash in all books for a
           | year. Create a mystery about what the hash is about, so it is
           | mentioned in lots of articles. Who is this eccentric
           | billionaire that pays truckloads of money to spread these
           | apparently useless 64 digits?!?
           | 
           | (Disclaimer: Method not applicable to normal people.)
        
             | fluoridation wrote:
             | Yeah, making many copies makes sense. But if you're doing
             | that, it's way easier to just make many copies of the
             | original message. In other words, just publish it as a book
             | and be done with it.
             | 
             | Broadcasting the hash would make sense if the message
             | should stay secret for some time. Although 500 years is
             | rather a long time for a zero-knowledge proof.
        
         | spiffytech wrote:
         | For the bit rot part, this is what PAR2 is for. It generates
         | recovery files, designed to repair the original file when
         | arbitrary portions of it become malformed or lost.
        
         | yifanl wrote:
         | It's largely unknown if there's any emergent behaviour given
         | H1(H2(m)), for any two hash functions H1, H2.
         | 
         | It's fully possible that your Uberhash function has some
         | vulnerability that can be easily exploited regardless of the
         | underlying security of the individual hash functions.
        
         | egeozcan wrote:
         | Simultaneously finding collisions in multiple hash algorithms
         | with 2 random inputs would be a hard task, and if one of those
         | inputs is predetermined (your data), and the other needs to
         | include a change that the attacker wants to see there? that
         | really sounds impossible. But on the other hand, we are talking
         | 500 years...
        
           | Perseids wrote:
           | The scenario is not finding collisions though. I assume
           | londons_explore somehow finds a way to build a trusted
           | relationship and channel with the future [1], say a piece of
           | leather with the hashes burned into it that is ... stored
           | publicly in Louvre with a ton of photo evidence spread all
           | over the earth hence. As londons_explore is trusted, you only
           | need Second Pre-image Resistance: Given a file (that was
           | created outside of the attackers control) find a second file
           | which has the same hash as the first file. Second pre-image
           | attacks are super hard to accomplish. MD5 is still Second
           | Preimage Resistant!
           | 
           | Adding a second random input to the hash function, like you
           | propose (that is to prime the internal state of the hash
           | function with same random input, so when hashing of the usage
           | data starts, the internal state of the hash function is
           | unknown to the attacker) makes collision attacks much harder.
           | In fact there is a term for that: target collision
           | resistance. But second pre-image attacks, where the random
           | input used for the hashing are known, don't get any harder.
           | And if you don't transmit the random input over the secure
           | channel as well, than second pre-image attacks get easier
           | even (theoretically), as attackers have the possibility to
           | manipulate the internal state outside of the usage data.
           | 
           | Btw. reliance of target collision resistance, instead of
           | collision resistance, is why Ed25519/Ed448 are much more
           | resilient to problems with the hash function than ECDSA or
           | any common RSA signature scheme. MD5 is still target
           | collision resistant last time I checked. Remember the debacle
           | of forged MD5 and later SHA1 certificates [2] ? Completely
           | avoidable, if better signature schemes had been used.
           | 
           | [1] Otherwise https://news.ycombinator.com/item?id=32912052
           | applies. [2] https://www.win.tue.nl/hashclash/rogue-ca/
        
         | nwellnhof wrote:
         | How do you want to make sure that the hashes themselves haven't
         | been tampered with?
        
           | dingosity wrote:
           | you don't know if it's the hash or the data that's been
           | tampered with. so if the stored or transmitted hash doesn't
           | match the computed hash, you reject it.
        
           | jrootabega wrote:
           | Tampering? That's harder. But to protect against corruption,
           | since the hashes are shorter, you would be able to chisel
           | them in different types of stone and glass and make 3+ copies
           | of them. I suppose if you really want to protect against
           | tampering, you could distribute so many copies that the
           | ability to tamper with a majority of them would be very
           | unlikely.
           | 
           | But I think what was meant by "publishing" was that the
           | hashes would be available to parties who would be willing and
           | able to preserve them indefinitely.
        
           | alex_sf wrote:
           | Hash them, duh.
        
       | dekhn wrote:
       | Side note: I recently saw an example code using tensorflow to
       | determine the private key of some cryptosystem. I can't find it.
       | It was literally operating on the bits of the key and somehow had
       | a loss function. Any ideas?
        
         | bawolff wrote:
         | I mean, if it worked it was probably for a really shitty
         | cryptosystem, and hence less interesting.
        
       | 1970-01-01 wrote:
       | The most interesting part was collision with certificates.
       | Platinum ticket for malware.
        
       | Dwedit wrote:
       | For anyone out there still using MD5 for any reason, check out
       | this PDF file:
       | https://www.alchemistowl.org/pocorgtfo/pocorgtfo14.pdf (42MB).
       | You can also rename it to a .NES file and run it in a NES
       | emulator.
       | 
       | It's a PDF File which is also a NES ROM that displays its own MD5
       | sum. The PDF also shows its own MD5 sum a few times. (The MD5 sum
       | also happens to begin with 5EAF00D)
       | 
       | When an arbitrary MD5 can be created that easily, it's useless
       | for any cryptographic applications, or even any data integrity.
        
         | pxx wrote:
         | MD5 may not be collision resistant, but the logic in your
         | conclusion is completely wrong. There is no feasible pre-image
         | attack on MD5. The MD5 generated _was not arbitrary_; they seem
         | to have just brute-forced a few of the leading bytes and then
         | did a Nostradamus attack.
         | 
         | Again: finding something that hashes to an arbitrary MD5 sum is
         | still not known to be feasible. This isn't a particularly good
         | reason to use MD5, but this also means that MD5 is not broken
         | in the way you think it is.
         | 
         | When somebody finds something that hashes to all zeroes you can
         | finally say that MD5 is completely broken. It is not known to
         | be at that point yet.
        
       | jrootabega wrote:
       | Disclaimer: This is a fun thought experiment. I'm not looking for
       | actionable results, or advocating for relying on any of this
       | comment for actual security. I'm clearly not a cryptographer; I
       | just think it would be interesting to talk about here, and maybe
       | more educated people could comment on how well these approaches
       | might mitigate the exploits in the article. Play with me in this
       | space.
       | 
       | I'm curious if people have any interesting ideas on how to add
       | some seasoning to MD5 to make it more secure. That is, simple,
       | intuitive things you can do in combination with MD5 such that all
       | the pieces in your scheme are still easily understood and don't
       | amount to a new hash algorithm that can only be understood as a
       | black box. Pretend MD5 is the only hash algorithm that has ever
       | been found. Or that you're the Gilligan's Island Professor and
       | MD5 hashes are your coconuts. What are the most potentially
       | useful things you can build out of the most primitive, dumb
       | components?
       | 
       | For example:
       | 
       | - Output the length of the input (or a hash of the length if you
       | must have a constant-length output)
       | 
       | - Hash the input forwards and backwards and produce two hashes.
       | (Remembering that, though the output is 256 bits now, you still
       | only have coconuts to work with.)
       | 
       | - Include more complicated variations on the input in the hashes.
       | e.g. start in the middle and oscillate forward and backward over
       | the input, or move the second half of the input in front of the
       | first before hashing, or use the input/hash of the input to seed
       | a pseudorandom re-ordering of the input before hashing, etc.
       | 
       | - Format-aware hashing - whatever program will interpret the
       | content of the file can also produce a hash, or some [canonical]
       | interpretation of the content that can be hashed. e.g., for an
       | image format, we could ask the renderer how many iterations of
       | some operation it had to perform to render the output, or in the
       | worst case, hash the bitmap it produced.
        
         | adrian_b wrote:
         | Nowadays it would be a complete waste of time to try to improve
         | MD5, because there is no reason to use it for anything.
         | 
         | Most modern CPUs from Intel, AMD or ARM implement in hardware
         | at least SHA-1 and SHA-256, so these both are faster than MD5
         | implemented in software.
         | 
         | If you only need 128 bits, you can truncate one of the longer
         | hashes to 128 bits.
         | 
         | SHA-1 is the fastest in the current CPUs, so even if it is also
         | obsolete for applications where there is an adversary who
         | attempts to find collisions, it remains a good choice for
         | various non-cryptographic applications that need a long hash or
         | random numbers.
         | 
         | SHA-1 itself was an attempt to improve on MD5. Even if it was
         | not entirely successful, it remains many orders of magnitude
         | stronger than MD5.
         | 
         | So if you want to know how MD5 can be improved, the best way is
         | to compare the MD5 and the SHA-1 algorithms, and look at what
         | was changed between them.
         | 
         | Another improved MD5 is the RIPEMD-160 algorithm, so comparing
         | RIPEMD-160 with MD5 is another source for seeing how MD5 can be
         | improved, by different methods.
        
           | TedDoesntTalk wrote:
           | > Nowadays it would be a complete waste of time to try to
           | improve MD5
           | 
           | Not as a thought experiment. Thought experiments help you to
           | grow.
        
             | adrian_b wrote:
             | I agree about the usefulness of thought experiments, but
             | there are many other much more useful thought experiments,
             | even in the domain of hashes.
             | 
             | A large number of people have thought for several years
             | about how to improve MD5, and the results were SHA-1 and
             | RIPEMD-160.
             | 
             | It is very instructive to study the evolution from MD4 to
             | MD5 and to SHA-1 and RIPEMD-160, but it is very unlikely
             | that attempting 30 years later to do something better than
             | those, without completely changing a hash algorithm
             | structure that is now well understood as being
             | inappropriate, can teach you anything.
        
               | bombcar wrote:
               | The useful thought experiments are often to realize WHY
               | the "trick to fix MD5" you came up with doesn't work.
        
             | bawolff wrote:
             | People are giving you shit, but people thinking about this
             | sort of thing has yielded useful and interesting things
             | like https://marc-stevens.nl/research/papers/C13-S.pdf
             | 
             | Of course in production, just use sha256, but there is
             | nothing wrong with thinking about unorthodox solutions.
        
         | AgentME wrote:
         | >- Output the length of the input (or a hash of the length if
         | you must have a constant-length output)
         | 
         | I think all of the known MD5 collisions (and SHA-1 collisions
         | even) are of inputs that are the same length.
        
           | jrootabega wrote:
           | Interesting. I submitted by original comment with the
           | misconception that it second preimage attacks against MD5
           | were feasible and were being used by the article. So I see
           | how in the case where you want to defend against collisions
           | (e.g. a malicious certificate signing request), length
           | awareness might not help. I'm not sure if that would apply to
           | a second preimage attack, though.
        
         | bawolff wrote:
         | For sha1, people made a system where you can detect the
         | patterns that lead to a collision, and (for example) replace it
         | with a different hash only for inputs that would be a problem.
         | https://github.com/cr-marcstevens/sha1collisiondetection i
         | think git does this to eek more life out of sha1.
         | 
         | I imagine you could take a similar counter-cryptnalysis
         | approach to md5. (I am out of my depth here, so there could be
         | reasons this doesnt work for md5 im unaware of)
        
           | jrootabega wrote:
           | Very cool! I searched for it here, and found some comments
           | that imply it's possible to detect these attacks in MD5, but
           | nothing too detailed. Much more advanced than I was thinking,
           | but still hacky enough that it satisfies me!
        
         | dingosity wrote:
         | Might be useful to look up the prior research. Search for
         | Merkle-Damgard. And the IACR ePrint archive has quite a few
         | interesting reads: https://eprint.iacr.org/
         | 
         | I get it that you're just hacking around, but reading what the
         | olds said about MD5 is useful.
        
           | dingosity wrote:
           | Oh. Also. What you're doing here is "finding a second pre-
           | image" which is slightly different from finding a collision.
           | A second pre-image is where you already have a message and a
           | hash and you want to find a second message that hashes to the
           | same thing. Technically speaking, a collision is just finding
           | two messages that hash to the same thing, but you don't
           | necessarily get to pick what the two messages are.
           | 
           | I mention it not to be an academic snob, but because when you
           | read the literature, there are plenty of examples where you
           | sort of have to know the two things are distinct, at least to
           | the academics.
        
             | jrootabega wrote:
             | Thanks for pointing that out. It's definitely important to
             | understand the difference, and not just academically. It
             | helps us understand that the consequences of MD5 being
             | broken have only been proven to work for collision attacks,
             | and not preimage attacks. (But I still don't think using
             | MD5 when you can use more secure alternatives is a good
             | idea.)
        
         | masklinn wrote:
         | ... why are you trying to "season MD5 to make it more secure"?
         | Calls to switch off of MD5 started in 1996, and MD5 has been
         | considered broken for some 15 years.
         | 
         | There are non-compromised hashes, use that.
        
           | chrismarlow9 wrote:
           | Speed and resource limitations. But overall I agree with you.
        
             | Retr0id wrote:
             | > BLAKE2b is faster than MD5, SHA-1, SHA-2, and SHA-3, on
             | 64-bit x86-64 and ARM architectures. BLAKE2 provides better
             | security than SHA-2 and similar to that of SHA-3
             | 
             | https://en.wikipedia.org/wiki/BLAKE_(hash_function)#BLAKE2
        
               | yencabulator wrote:
               | And BLAKE3 is a lot faster than that.
        
               | waynesonfire wrote:
               | so after reading your comment I was curious why SHA-3 won
               | the NIST competition,
               | 
               | https://csrc.nist.gov/csrc/media/projects/hash-
               | functions/doc...
               | 
               | "NIST chose KECCAK over the four other excellent
               | finalists for its elegant design, large security margin,
               | good general performance, excellent efficiency in
               | hardware implementations, and for its flexibility"
        
               | Retr0id wrote:
               | The real reason is that SHA-3 is more different to SHA-2
               | in terms of fundamental design. The SHA-3 contest was
               | launched shortly after SHA-1 was broken, and was partly
               | just a hedge against SHA-2 being broken too.
        
               | xani_ wrote:
               | Or NSA decided to backdoor SHA3
        
               | chrismarlow9 wrote:
               | Cool I was not aware of this. Sha2 is just my immediate
               | go to and haven't re evaluated any other options in a
               | long time.
        
           | jrootabega wrote:
           | Because it's interesting to think about. It's hacking - doing
           | something with technology that was not intended, and
           | sometimes ill-advised, to see what kind of interesting
           | results we can get.
        
             | groby_b wrote:
             | Then go hack.
             | 
             | The point is, it's not an interesting hack for the vast
             | majority of the audience. Because they've already seen this
             | play out over and over and over.
             | 
             | It's like sticking your hand in a grinder and yelling "for
             | science". The result might be new for you, and it's hacking
             | by your definition, but I'm fairly certain the rest of the
             | audience is OK not investing time into the experiment.
        
               | jrootabega wrote:
               | I have no intention of sticking my metaphorical hand, or
               | the hand of anyone, into a metaphorical grinder.
               | 
               | But I will point out that it was someone who thought
               | about, and experimented with, sticking fleshy pieces into
               | power tools, that invented the SawStop. :)
        
             | masklinn wrote:
             | You're just throwing random shit at the wall, it's not
             | hacking, it's just coldfusion.
        
               | [deleted]
        
         | LawTalkingGuy wrote:
         | There's a difference between bringing MD5 up to modern strength
         | and simply blocking the attempts of attackers who've attained
         | this MD5 forging kit. The later _is_ an interesting though
         | experiment.
         | 
         | The point of a hash is to avoid feeding malicious data further
         | into your pipeline so solutions that involve parsing the file
         | and hashing its actual data-streams aren't a good idea. We'll
         | focus on detecting the change before looking at the data.
         | 
         | To make the attack harder, use file formats without expandable
         | or optional sections. This attack works by parsing known file
         | formats and making allowable changes. If you had encryption as
         | one of your coconuts it would make this easier by making the
         | whole file opaque, but the encryption could also be used to
         | generate a hash-like construction so if you have good enough
         | encryption you wouldn't be stuck with MD5 and there wouldn't be
         | much thought experiment left...
         | 
         | To stop this specific attack, assuming you had to use these
         | formats, using MD5(file+reversed_file) or
         | MD5(file)+MD5(reversed_file) or MD5('secret'+file) would work.
         | This is massively beyond what a most people could make the
         | tools do but it's probably not that much cryptographically
         | harder.
         | 
         | If your solution was a secret it would be pretty effective. The
         | problem is that if you don't have encryption then they're
         | watching you communicate. They can see the hash sum you tell
         | the recipient to expect, and if this is the first time you've
         | used this scheme, the steps to use to check it. But even if
         | they don't observe this though, you're only using a small set
         | of non-crypto operations and they can just try millions of
         | combos (reverse the file, append a second copy, interleave
         | bytes, etc.) and see if anything produces the same hash. Then
         | they have discovered your algorithm and they can plan to modify
         | their tools to perform the attack.
        
           | jrootabega wrote:
           | Oh, I think secrecy would be against the spirit of the
           | experiment. And yeah, I wonder what kind of mods they would
           | need to make to their tools in response. If it takes 10 years
           | to generate the collisions, then it seems like it might be a
           | worthwhile approach (in the thought experiment universe
           | only).
           | 
           | But we should also recognize that the article demonstrates
           | collision attacks, not preimage attacks, which would be the
           | attack you want to worry about if you are using a trusted
           | hash to verify a file you received over an untrusted channel.
        
         | Retr0id wrote:
         | The simplest way to improve MD5 would be to add more rounds
         | (but as others have mentioned, there are much better choices
         | for any practical purpose)
         | 
         | RE your last point, I'm not quite sure what attack you're
         | defending against there, but most file formats do not have a
         | well-defined "canonical interpretation", much less so one that
         | is serializable into bytes. (If you think it sounds easy, you
         | haven't thought about it hard enough :P )
        
           | jrootabega wrote:
           | Well, for images I still think that there is such a thing,
           | even if it's the least interesting example: the bitmap they
           | ultimately render. (For formats that are allowed to produce
           | different bitmaps, it gets more difficult and maybe
           | impossible.)
           | 
           | For the general case of any file format: I agree this
           | approach is the least simple/dumb/trivial and might even
           | violate the spirit of my original comment itself. But it's
           | still interesting. By "[canonical] interpretation", I just
           | meant some way to fingerprint the content while understanding
           | the format. e.g., if it's a tarball, sum the total number of
           | files and directories inside it. Concatenate all their names
           | in a well-defined order and hash that. I know you can't
           | prevent collisions entirely, but it may be relatively cheap
           | to make it so that 2 files that are different (with respect
           | to the file format) are likely to be represented differently.
        
             | Retr0id wrote:
             | If for a moment we assume that you _can_ do it reliably
             | (which I personally doubt, even for  "simple" formats) -
             | what's the point? Why not just hash the original file?
             | What's the benefit here?
             | 
             | For example, I could create two different PNGs that decode
             | to the same bitmap. Or I could create one PNG that decodes
             | to multiple _different_ bitmaps, depending on which
             | implementation decodes it (due to implementation bugs and
             | /or under-defined areas of the specification). Or I could
             | create a PNG that is also a valid ZIP archive.
        
               | jrootabega wrote:
               | There's no security benefit, and I would have a hard time
               | coming up with a practicality benefit. It's mostly just
               | interesting to think about, especially in response to the
               | article. The article is demonstrating fast MD5 second
               | preimage attacks for various file formats (EDIT:
               | apparently not preimage attacks, just collision attacks),
               | so in response to that I'm wondering how these
               | MD5-specific attacks might be mitigated, for fun.
               | Consider it alternate history fiction in which we never
               | discovered anything better than MD5 :)
               | 
               | In your examples, though, :
               | 
               | > two different PNGs that decode to the same bitmap
               | 
               | But would the the PNGs also have the same MD5 hash?
               | 
               | > one PNG that decodes to multiple different bitmaps,
               | depending on which implementation decodes it
               | 
               | Yeah, that would be a challenge. Relying on
               | implementation details, or results which are allowed to
               | vary, wouldn't work. But since this is meant to
               | supplement an existing MD5 hash, the idea is that the
               | format consumer/interpreter would be in a good position
               | to produce some format-aware fingerprint that is
               | statistically likely enough to be different when the
               | inputs are different.
        
               | Retr0id wrote:
               | > But would the the PNGs also have the same MD5 hash?
               | 
               | Yes, I could construct this trivially, because MD5 is
               | broken.
        
               | jrootabega wrote:
               | Ah, OK, I think I misunderstood the article. If you are
               | supplying both images to me, you could do that with the
               | MD5 hashes. Although, I think if you could get them to
               | generate the same bitmap, then the attack has been at
               | least partially mitigated, by definition. Not completely,
               | I admit, but I think it wouldn't qualify as the same
               | attack shown in the article.
        
       ___________________________________________________________________
       (page generated 2022-09-20 23:01 UTC)