Post AWeZRTwDmTCgiJoJKy by brohee@pouet.chapril.org
(DIR) More posts by brohee@pouet.chapril.org
(DIR) Post #AWeZRTwDmTCgiJoJKy by brohee@pouet.chapril.org
2023-06-13T15:33:53Z
0 likes, 0 repeats
I'm floored. What I'm told is "Git for data", DVC (https://dvc.org/) is using MD5 to uniquely identify files. In 2023. An issue is open since early 2020 to at least have the choice of a strong hash, but the maintainers don't seem to understand the issue, which makes me seriously question their sanity (https://github.com/iterative/dvc/issues/3069). I guess I'm gonna import a bunch of files generated by @retr0id 's monomorph (https://github.com/DavidBuchanan314/monomorph) to see how it behaves.
(DIR) Post #AWeZRUnkZS1POKr3lw by ignaloidas@not.acu.lt
2023-06-13T15:55:34.116Z
0 likes, 0 repeats
@brohee@pouet.chapril.org @retr0id@retr0.id wow, that's surprising, especially since 128 bit hash is short enough that there's a decent-ish chance for a non-malicious hash collision at the current big-data database sizes, e.g. assuming 4k record sizes, Facebook has about a 1-2% chance for a MD5 hash collision.