[HN Gopher] Git's database internals II: commit history queries
___________________________________________________________________
Git's database internals II: commit history queries
Author : todsacerdoti
Score : 79 points
Date : 2022-08-30 16:01 UTC (6 hours ago)
(HTM) web link (github.blog)
(TXT) w3m dump (github.blog)
| ajross wrote:
| FTA:
|
| > Graph databases need not apply
|
| The previous blog entry had some of the same apologia about
| database theory and why git seems to be breaking rules. This
| seems misplaced to me. Git is a foundational technology of the
| modern internet, used and relied on by... basically everyone to
| do exactly the job for which it was engineered. It's _exactly_
| the place where you 'd expect to see a special-purpose data store
| in use, tuned to exactly the application space where it resides.
|
| That's not to say it's not interesting to talk about graph
| databases or B trees (from the last post) and how they might be
| applied to source control. But please don't stop in the middle of
| your discussion about git to do it. We know git stands alone,
| tell us how git works.
| ruuda wrote:
| The comment in part I that b-trees are not appropriate because
| Git does not do "live updating" of the packfiles also felt a
| bit misplaced to me. Writing small objects and then aggregating
| them into larger ones when we get too many of them, especially
| with geometric repacks, that sounds a lot like LSM trees, which
| underpin LevelDB and descendants, which in turn are used as the
| storage engine in many newer relational databases. It's not
| b-trees, sure, but there is something that updates with every
| commit, and you want to balance fast lookups, disk usage, and
| write amplification. In that sense Git's use case does not
| sound so special to me at all.
| cryptonector wrote:
| Fossil uses SQL and is very Git-like in many ways (but with an
| opinionated, merge-hap-hap-happy UI), and definitely is graph-
| like in its schema. That tends to prove that, indeed, there's
| nothing that special about Git that makes it not like other
| databases.
|
| Commit metadata is decidedly relational (or, rather, can be
| modeled as such). Commits have zero, one, or two parent commits
| (thinking of root commits as parentless), they have an author,
| a date, a commit synopsis and a commit message, a root object,
| and maybe some other metadata (e.g., who pushed, who signed
| off, etc.). That's all perfectly appropriate for a relational
| DB.
|
| The main issue that has come up over time with Fossil-style
| DVCSes is the size of the metadata you must keep _and_ examine
| to do things like `git log -- some-file`. The metadata size
| issue is always going to be an issue, but maybe an ad-hoc DB
| can wring more compression out than a general purpose DB. Git
| itself had the second problem, and it had to get solved by
| using a Bloom filter to reduce the set of commits that need to
| be examined.
| rzzzt wrote:
| Are octopus merges modeled as successive commits with two
| parents each?
| snvsn wrote:
| One commit with as many parents as merged branches. (A git
| commit can have an arbitrary number of parent commits.)
|
| Try git cat-file -p $SHA
| cryptonector wrote:
| Ok, zero, one, two, or any number of parents. That's even
| easier.
___________________________________________________________________
(page generated 2022-08-30 23:00 UTC)