[HN Gopher] A Git repository with 2^28 commits--one for every 7-...
___________________________________________________________________
A Git repository with 2^28 commits--one for every 7-character
shorthash
Author : breck
Score : 59 points
Date : 2021-07-09 20:39 UTC (2 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| detaro wrote:
| > _The repository has so many commits that git push hangs and
| runs out of memory, presumably because it tries to regenerate a
| packfile on the fly._
|
| Any guesses if this would also happen if you tried to push it
| bit-by-bit? (although you'd of course need reasonably large
| groups of commits still, to not end up with an impossible number
| of pushes)
| sverhagen wrote:
| As the README.md acknowledges, the usefulness may be limited,
| except for the fun in experimentation. What may not be obvious to
| basic Git users is that, while it may take 2^28 commits to fill
| up the entire address space of the 7-character shorthash, they
| are not designed to be unique (they are just the first part of
| the longer, unique hash). As a result, even relatively small
| repositories often already have _some_ duplicate shorthashes. And
| people scripting around their Git shorthashes must be prepared to
| deal with larger shorthashes, like 8-characters, 9, 10, 11,
| whatever it takes to disambiguate. My random Git repository of a
| mere 16865 commits (well, that's just "master") that I'm looking
| at over here, nothing out of the ordinary, needs shorthashes up
| to 11 characters to disambiguate all of them. (Not all the
| clashes may be on the same or main branch.)
| emerged wrote:
| To be clear it's code to generate a repository, not a repository.
| app4soft wrote:
| Yeah.
|
| Linked repo has only 2 commits.[0]
|
| [0] https://github.com/not-an-aardvark/every-git-commit-
| shorthas...
| Ericson2314 wrote:
| And those two commits are
| 00000002bdd056473559d2bd0eb835561b3c874b
| 00000002f7c605501165ee5e3c2db20ffe178848
|
| What the hell?!
| surye wrote:
| Hah, that's clever. The author is using their other toy
| research tool/project: https://github.com/not-an-
| aardvark/lucky-commit
| redler wrote:
| It's commit mining.
| [deleted]
| pronoiac wrote:
| Doing the math, 2 to the 28th is around 268 million.
| 988747 wrote:
| I can imagine a big company with a monorepo (i.e. Google)
| reaching that number in a few years.
| whatshisface wrote:
| Generously, there are about 40,000 people at Google who might
| commit to the monorepo. That's only 6,000 or so commits per
| person, a fairly achievable number. Although since they're
| not purposely generating every shorthash, it would take
| significantly longer for the absolute last unique hash to be
| created.
| yellow_lead wrote:
| I wonder what the number of commits is before you need to
| start worrying about 7 character collisions. (Birthday
| problem anyone?)
| charcircuit wrote:
| Don't forget the commits made by programs.
| pronoiac wrote:
| From the "unexpectedly useful for security research" link:
|
| > Due to the birthday problem, any repository that has at
| least 19291 commits is likely to have a pair of ambiguous
| commits somewhere.
| ghoward wrote:
| I can't find the link. :(
|
| Edit: nevermind; I am stupid.
| codetrotter wrote:
| Note that collisions in short hash are not actually a problem
| as such.
|
| > Git can figure out a short, unique abbreviation for your
| SHA-1 values. If you pass --abbrev-commit to the git log
| command, the output will use shorter values but keep them
| unique; it defaults to using seven characters but makes them
| longer if necessary to keep the SHA-1 unambiguous
|
| https://git-scm.com/book/en/v2/Git-Tools-Revision-Selection
|
| and also
|
| > Git doesn't really truncate anything, internally everything
| will be handled with the complete value.
|
| https://stackoverflow.com/questions/7128444/how-does-
| github-...
| IgorPartola wrote:
| I wonder how many tools out there hard code the 7 character
| length for a hit commit hash length and would break upon a
| collision.
| wruza wrote:
| This number may be even lower if you take the birthday
| problem into account. I'm not a statistics guy to confirm
| that or to make proper calculations, but I believe it applies
| to this case as well, because first few bits of a hash are
| like what a birthday is to an otherwise unique person.
|
| https://en.wikipedia.org/wiki/Birthday_problem
| mrkramer wrote:
| So this is basically a proof of work algorithm.
| posnet wrote:
| An assignment in one of my university security course was to
| mine "gitcoin".
|
| Which was a git based proof of work, the server would only
| accept pushes for commits if it had more leading zeros in its
| hash than the previous commit on that branch.
| distrill wrote:
| That sounds like a ton of fun, and tbh way cooler than
| anything I built in school.
| colejohnson66 wrote:
| Git, but on a Blockchain? /s
| Cerium wrote:
| Git is a Blocktree - a type of directed acyclic graph based
| proof of nothing crypto product that is invulnerable to fork
| based attacks by supporting it out of the box. /s
| gopherbro wrote:
| It seems someone write a script for it.
___________________________________________________________________
(page generated 2021-07-09 23:00 UTC)