[HN Gopher] Git's database internals II: commit history queries
       ___________________________________________________________________
        
       Git's database internals II: commit history queries
        
       Author : todsacerdoti
       Score  : 79 points
       Date   : 2022-08-30 16:01 UTC (6 hours ago)
        
 (HTM) web link (github.blog)
 (TXT) w3m dump (github.blog)
        
       | ajross wrote:
       | FTA:
       | 
       | > Graph databases need not apply
       | 
       | The previous blog entry had some of the same apologia about
       | database theory and why git seems to be breaking rules. This
       | seems misplaced to me. Git is a foundational technology of the
       | modern internet, used and relied on by... basically everyone to
       | do exactly the job for which it was engineered. It's _exactly_
       | the place where you 'd expect to see a special-purpose data store
       | in use, tuned to exactly the application space where it resides.
       | 
       | That's not to say it's not interesting to talk about graph
       | databases or B trees (from the last post) and how they might be
       | applied to source control. But please don't stop in the middle of
       | your discussion about git to do it. We know git stands alone,
       | tell us how git works.
        
         | ruuda wrote:
         | The comment in part I that b-trees are not appropriate because
         | Git does not do "live updating" of the packfiles also felt a
         | bit misplaced to me. Writing small objects and then aggregating
         | them into larger ones when we get too many of them, especially
         | with geometric repacks, that sounds a lot like LSM trees, which
         | underpin LevelDB and descendants, which in turn are used as the
         | storage engine in many newer relational databases. It's not
         | b-trees, sure, but there is something that updates with every
         | commit, and you want to balance fast lookups, disk usage, and
         | write amplification. In that sense Git's use case does not
         | sound so special to me at all.
        
         | cryptonector wrote:
         | Fossil uses SQL and is very Git-like in many ways (but with an
         | opinionated, merge-hap-hap-happy UI), and definitely is graph-
         | like in its schema. That tends to prove that, indeed, there's
         | nothing that special about Git that makes it not like other
         | databases.
         | 
         | Commit metadata is decidedly relational (or, rather, can be
         | modeled as such). Commits have zero, one, or two parent commits
         | (thinking of root commits as parentless), they have an author,
         | a date, a commit synopsis and a commit message, a root object,
         | and maybe some other metadata (e.g., who pushed, who signed
         | off, etc.). That's all perfectly appropriate for a relational
         | DB.
         | 
         | The main issue that has come up over time with Fossil-style
         | DVCSes is the size of the metadata you must keep _and_ examine
         | to do things like `git log -- some-file`. The metadata size
         | issue is always going to be an issue, but maybe an ad-hoc DB
         | can wring more compression out than a general purpose DB. Git
         | itself had the second problem, and it had to get solved by
         | using a Bloom filter to reduce the set of commits that need to
         | be examined.
        
           | rzzzt wrote:
           | Are octopus merges modeled as successive commits with two
           | parents each?
        
             | snvsn wrote:
             | One commit with as many parents as merged branches. (A git
             | commit can have an arbitrary number of parent commits.)
             | 
             | Try git cat-file -p $SHA
        
             | cryptonector wrote:
             | Ok, zero, one, two, or any number of parents. That's even
             | easier.
        
       ___________________________________________________________________
       (page generated 2022-08-30 23:00 UTC)