[HN Gopher] Commits are snapshots not diffs (2020)
       ___________________________________________________________________
        
       Commits are snapshots not diffs (2020)
        
       Author : warpech
       Score  : 258 points
       Date   : 2021-04-08 18:02 UTC (4 hours ago)
        
 (HTM) web link (github.blog)
 (TXT) w3m dump (github.blog)
        
       | aarchi wrote:
       | Whereas in Pijul and Darcs, commits (called patches) are diffs,
       | not snapshots. They are based on a sound theory of patches, which
       | allows for operations not supported by Git like commuting, as
       | long as the commits aren't interdependent. Plus, language-
       | specific tools can extend the notion of dependency from line-
       | based to semantic.
        
         | luhn wrote:
         | > They are based on a sound theory of patches, which allows for
         | operations not supported by Git like commuting, as long as the
         | commits aren't interdependent.
         | 
         | This is definitely supported by git. Even though commits may
         | technically be snapshots, you can build a diff from snapshots
         | (and vice versa). `git diff` will get you the diff for any
         | given commit, and `git rebase` will happily reorder commits for
         | you by reapplying the diffs.
        
         | rnhmjoj wrote:
         | When reading a bit about Pijul, a few months back, I had
         | assumed _every_ two patches would commute, and I couldn 't
         | image how that could possibly work.
         | 
         | Does it really have this limitation? If so, it doesn't look
         | much of an improvement compared to git: I can shuffle "patches"
         | all right using `git rebase -i`. I conceide it can be quite
         | slow, though.
        
           | dan-robertson wrote:
           | So not every patch can commute with every other patch:
           | "delete foo" doesn't make sense until after "add foo" has
           | happened. So patches have dependencies that they must come
           | after, but for lots of vc situations, patches are
           | independent. Sets of patches makes rebasing a branch trivial
           | for example because adding the patches from the master
           | _after_ your patches is equivalent to adding them _before_.
           | If you would get a merge conflict, you get the same merge
           | conflict whether they are added before or after.
           | 
           | But nailing down the logic behind commuting patches can be
           | important too as it can catch subtle problems that might
           | happen with normal snapshot-based merging. Consider some
           | people independently editing branches                 Bob
           | adds a file with line "foo"       Alice pulls Bob's patch
           | Bob changes "foo" to "bar"       Alice changes "foo" to "bar"
           | Bob changes "bar" back to "foo"
           | 
           | In Pijul or Darcs you should get a consistent result pulling
           | changes from Bob and Alice no matter what order you do it.
           | But if you use something like git, the order you pull and
           | merge, and if you do it at any intermediate times, might
           | change the resulting snapshot (as well as just the history).
           | The start and end state of Bob's repo loom the same as
           | snapshots but they are different because Bob changed his mind
           | about the line "bar"--maybe the change didn't work.
        
       | tsimionescu wrote:
       | It's nice to understand this, but I fail to see it helping much
       | in practice. Sure, you'll know why the thing you want to do is
       | hard for git to do, but that wont make it much easier.
       | 
       | And without knowing even further implementation details, it's a
       | bad idea to rely on this knowledge. For example, the article
       | states that committing a rename separately from edits in the
       | renames files helps git track the renames. But that's not
       | obviously true from the discussion above, because it's not
       | obvious if, when computing a diff between two commits, git will
       | follow the entire history or just apply the diff algorithm on the
       | two commits.
       | 
       | If it were the latter, then it doesn't really matter which order
       | you commit things in, git would simply see commit1: fileA, fileB
       | with contents cA and cB; commit2: fileD, fileE with contents cD
       | and cE, and would do the quadratic work anyway, even if commit1.5
       | had fileE, fileD with contents cA, cB.
        
         | [deleted]
        
       | [deleted]
        
       | Tomminn wrote:
       | Great article but:
       | 
       | "one of my favorite analogies is to think of commits as having a
       | wave/partical duality.."
       | 
       | is a hilariously misguided object to build an analogy from.
       | Theoretical physicist checking in, and my community has been
       | searching for about 100 years for an analogy to explain that
       | shit, so it's hilarious to see someone try to use it as a
       | concrete object people can use as a touchstone to better
       | understand a purely classical database.
        
         | bombcar wrote:
         | Read it as "think of commits as <unintelligible bullshit you
         | have to take on faith because nobody really understands it>"
        
       | fraculus wrote:
       | I think merge commits are key to why "snapshots" are a better
       | model than "diffs", and a stronger arguments would emphasize this
       | more.
       | 
       | Like people have said, the two models:
       | 
       | - a commits is a snapshot plus a pointer to a parent commit
       | 
       | - a commits is a pointer to a parent commit plus a diff
       | 
       | are sort of isomorphic. And some commands in the git porcelain
       | (like git cherry-pick, or git rebase) indeed make more sense if
       | you think of commits as diffs.
       | 
       | But this isomorphism becomes really strained when you have
       | commits with more than one parent (or even zero parents). (And I
       | think it's telling that those commands don't play very nicely
       | with merge commits or the root commit.)
       | 
       | If you really want to incorporate merge commits and the root
       | commit, the alternatives become:
       | 
       | - a commit is a snapshot, together with a list of zero or more
       | pointers to parent commits
       | 
       | - a commit is a list of M >= 0 pointers to parent commits,
       | together with N > 0 diffs, subject to the invariant that:
       | 
       | a) M = N, except that for exactly one commit, which we will call
       | the "root" we are allowed to have M = 0 but N = 1
       | 
       | b) starting from any commit, if you traverse a path back to the
       | root commit by following parent pointers, and then sequentially
       | (in reverse order) apply, for each commit in the path, the diff
       | that corresponds to the parent pointer chosen, then the result of
       | composing all those diffs is independent of the path chosen.
       | 
       | And when you put it like that, it's pretty clear that the "diffs"
       | model is really impractical, and that's why it's a lot better to
       | think of commits as snapshots.
        
       | slumpt_ wrote:
       | Most developers think of commits as diffs and they can for all
       | intents and purposes be thought of as such. It's actually best
       | for the understanding of how to practically get things done to
       | think of them in this way.
       | 
       | Odd semantic argument to make.
        
       | divbzero wrote:
       | This is a good overview of Git internals. If this stuff interests
       | you, Chapter 10 of _Pro Git_ offers similar descriptions of Git
       | objects [1] and Git references [2], and then continues onto Git
       | packfiles [3] which are not covered by OP.
       | 
       | [1]: https://git-scm.com/book/en/v2/Git-Internals-Git-Objects
       | 
       | [2]: https://git-scm.com/book/en/v2/Git-Internals-Git-References
       | 
       | [3]: https://git-scm.com/book/en/v2/Git-Internals-Packfiles
        
       | aequitas wrote:
       | This article goes into a little too much detail imho. I have had
       | great success explaining Git to coworkers using post-its,
       | permanent marker and a flip board (no computer!) and going
       | through the steps Git would take (abstractly, not exactly) when
       | performing certain commands. All commits (and their relations)
       | are written down on the board with the marker because they don't
       | change (eg: rebasing just creates a new line of commits). The
       | branches are written down on post-its and can move around (like
       | this article explains, they are just pointers). You can use a
       | whiteboard with non-permanent marking for the working directory
       | and index if you want to go that deep.
        
       | grawprog wrote:
       | >Commits are snapshots....commits are diffs....
       | 
       | Neither model really encompasses commits for me.
       | 
       | I prefer...
       | 
       | Commits are a point in history I can return to after I inevitably
       | fuck up or look back on so I can convince myself, yes I am indeed
       | making progress.
        
       | davesque wrote:
       | Neat overview of some of the core concepts in Git that often go
       | unnoticed. Although I'll say that the fact that commits are
       | technically not diffs doesn't seem to matter much in day to day
       | use. Git does a decent job of abstracting that detail away to the
       | point that you could just as well believe commits are diffs.
       | Also, I want to say that technically I believe Git _does_ use
       | deltas to compress an object 's history in the blob store. But
       | the different blobs that comprise an object's history can be
       | thought of somewhat as being separate. Git could just as easily
       | not perform this internal, space-saving optimization and things
       | would all work the same. The SHA hashes would be the same and
       | based on the same input.
        
       | breck wrote:
       | I think this is incorrect, no?
       | 
       | Can't all commits be turned into patches? Thus, aren't commits
       | isomorphic to diffs?
        
         | lann wrote:
         | > aren't commits isomorphic to diffs?
         | 
         | Nearly, though renames are only approximately extracted from
         | the snapshots.
        
           | jepler wrote:
           | If your VC isn't plumbed all the way into your editor, you
           | can't tell changing a typo from deleting and re-typing the
           | whole file when it comes time to create a delta.
        
             | lann wrote:
             | There is a `git mv` command that means "rename". Git
             | _could_ (even with its current data model) explicitly
             | annotate commits with this intent, but doesn 't. I don't
             | know how useful that would be compared to the current
             | heuristics, but it does mean that git commit "snapshots"
             | are not (quite) isomorphic to a diff format (like posix
             | diff's) that can explicitly encode renames.
        
         | tomtomtom777 wrote:
         | Consider a small commit with spelling error. If I turn this
         | into a patch and apply it to another branch it will be a
         | _different_ commit even it will be the same patch.
         | 
         | As such, the concept of a "commit" in Git refers to a complete
         | state of everything; a snapshot.
        
         | jepler wrote:
         | Yes,-ish. However, there's also the question of what operations
         | are efficient. (Diff feels very performant in git but) maybe
         | having the diffs as the first-class objects enables doing
         | something efficiently that git doesn't do. (perhaps,
         | identifying when identical patches have occurred in different
         | portions of the history?)
         | 
         | I've used several patch-based VCs (RCS and CVS) but I think
         | they pre-date this "sound theory of patches" and instead the
         | use of patch-style representation was for optimizing storage.
         | (just as git uses packs and deltas to optimize storage and
         | performance, f'rinstance) So I don't really know what I'm
         | missing.
         | 
         | (If the sound theory of patches would let me better understand
         | what occurred at a merge commit than git's tooling, that'd be
         | just about enough to sell me on switching. except for the
         | network effects of git & github.)
        
         | johntb86 wrote:
         | That's true, but git doesn't natively have a way to refer to a
         | single diff. You can use a hash to refer to a commit, but that
         | depends on the entire history up to that point. If you rebase a
         | commit then the hash changes, even if the new commit is
         | semantically the same.
        
           | jdoliner wrote:
           | It kinda does, `git show <commit>` does exactly what one
           | would expect if commits were actually diffs. That is it shows
           | the diff between <commit> and its parent.
        
         | jdoliner wrote:
         | It's technically correct. The key thing here is that it's
         | isomorphic. You can either have a system of commits in which
         | diffs are computed between commits or a system of diffs in
         | which commits are computed by applying diffs. The trade-offs
         | are in performance of various operations, not in the user
         | exposed semantics. Git chooses to have its first class object
         | be commits and diffs are computed on the fly. So again it's
         | technically correct, but in practice commands like cherry-pick,
         | which treats a commit like a diff between that commit and its
         | parent really blur the line. I think in reality you can be a
         | really advanced git user and not even realize that there's a
         | difference between a commit based and diff base version control
         | system, because in practice there really isn't much of a
         | difference.
        
         | divbzero wrote:
         | If you inspect the files in the .git directory, you'll find
         | commits stored as trees of directory and file objects. But it
         | is true that they can be converted diffs on the fly, which is
         | exactly what the git show command does.
        
       | maweki wrote:
       | I think we're running into a naming issue here. It's usefult to
       | think of a single commit in itself as a diff. The DAG is a useful
       | model for an accumulation of changes. The question is, what
       | changes and operations make up a node in the DAG (i.e. what code
       | is in this branch, compared to that? What code do they have in
       | common)?
       | 
       | To answer this: take the node and follow along the predecessor
       | until you get one (or more) roots. All commits along the root are
       | contained in the commit at hand. That's the history.
       | 
       | Adding changes is, I think, the most useful mental model, even if
       | it is not the implementation.
       | 
       | Now what the author is saying is: A commit is not only the diff,
       | but also the whole tree/history that the diff is based on. And
       | that is also true and then the commit (the adding plus the past)
       | is a snapshot.
       | 
       | Do we have a good naming convention for the single node in the
       | tree with its changes, compared to the single node in the tree
       | with its changes AND the references to the parents with all their
       | changes etc.?
        
         | derriz wrote:
         | > It's usefult to think of a single commit in itself as a diff.
         | 
         | Except if it has multiple parents like a merge commit.
         | 
         | Actually I don't agree even in general. It took me an
         | unreasonably long time to become unafraid of git because I
         | clung to the common VCS mental where commits were actually
         | diffs.
        
           | maweki wrote:
           | But the snapshot-model also doesn't really make a lot of
           | sense for merges. It's a snapshot of what then? A merge of
           | all parent trees? What's a merge of two files then? Defining
           | this merge-operation on trees is at least as mentally taxing
           | as the alternative.
           | 
           | Accumulating all the diffs from two (or more) ends (until
           | they are common again) is at least as useful.
        
             | cesarb wrote:
             | > But the snapshot-model also doesn't really make a lot of
             | sense for merges. It's a snapshot of what then?
             | 
             | It's a snapshot of the final result.
             | 
             | That's the beauty of the "commit as snapshot" model: each
             | commit always contains the final result of the commit. It
             | doesn't matter if the commit is a normal commit with a
             | single parent, a merge commit with multiple parents, or
             | even an initial commit with zero parents. It doesn't matter
             | if the parent commits are unavailable (shallow
             | repositories). It doesn't matter if the parent commits have
             | been changed (grafts).
        
             | derriz wrote:
             | For me, a merge commit in git is just a snapshot like any
             | other except that its metadata contains links to more than
             | one parent.
             | 
             | The parent child relationship acts as nothing more than
             | remark that the child was derived from both parents in some
             | way.
             | 
             | Of course, commonly the child is derived by finding the
             | most recent common parent, using heuristics to guess file
             | identities after any renaming and then performing a 3-way
             | line-based diff between what it thinks are corresponding
             | files.
             | 
             | But actually git doesn't really care - it's just another
             | snapshot you've created and added to the DAG.
             | 
             | I haven't found it helpful to think of what's going on in
             | git in terms of an "accumulated file diffs" abstraction
             | because git has no notion of file identity (across
             | commits).
        
           | viraptor wrote:
           | You can have a diff to multiple parents - you get multiple
           | status columns then. Similar to what you see in the diff in
           | merge issues.
        
       | zwieback wrote:
       | Cherry-pick is what messes up the commit-as-snapshot idea for me.
       | If I see a small commit that I feel I can merge into my branch
       | then that commit feels like a diff and I don't want to care about
       | the rest of the stuff that commit snapshots. I guess that's a
       | good thing.
        
         | SamBam wrote:
         | I tend to agree.
         | 
         | I am not someone who has a deep understanding of the inner-
         | workings of git by any means, yet I am perfectly comfortable
         | with rebasing and cherry-picking.
         | 
         | For me, git is so much easier to intuit if I only think of it
         | as diffs. When I rebase, I'm just rearranging diffs, or
         | squashing them together, or whatever. If I try and think of
         | everything as snapshots it actually gets more confusing for me.
        
           | jasonwatkinspdx wrote:
           | So, I think a useful simple way to think of it is "git
           | creates diffs when it needs to on demand."
           | 
           | When you're doing a cherry pick of say commit ce123, what
           | you're asking git to do is: 1. Diff ce123 against its parent
           | 2. Go apply that diff to some other branch
           | 
           | Likewise rebasing is the same, but with an extra step to
           | apply the inverse of the diff to the original commit first,
           | then rewrite the history.
           | 
           | One of the big advantages of this on demand diffing approach
           | is it's much more robust vs conflicts. Back in the subversion
           | days I wrote some shell scripts that did the equivalent of
           | git cherry pick and rebase. I'd keep a couple extra copies of
           | a checkout, would use the switch command to quickly put them
           | into a specific state, then would just generate a diff
           | manually to apply to my main working copy. It worked, and was
           | often faster than manually copying text around between editor
           | windows, but it was extremely conflict prone.
           | 
           | So this distinction, of whether you store snapshots and diff
           | on demand, or store diffs and snapshot on demand, is somewhat
           | subtle but has important consequences.
        
             | caterama wrote:
             | Since you can go from diffs to snapshots, and snapshots to
             | diffs, aren't they basically equivalent? I'm struggling to
             | see the important consequences at the user level.
        
               | viraptor wrote:
               | You can't go from diffs to snapshots. Two identical diffs
               | can be applied on different branches - looking just at
               | the diffs, you don't know which branch it is.
        
             | zwieback wrote:
             | Yeah, good summary.
        
       | [deleted]
        
       | ChrisMarshallNY wrote:
       | That's a cool explanation.
       | 
       | I'm a bit slow on the uptake, so I had to re-read a couple of
       | sections, but it was helpful.
        
       | whack wrote:
       | From a storage perspective, describing commits as snapshots seems
       | like a bad mental model. Suppose I have a directory that is 100MB
       | in size. If I take a snapshot of it, my snapshot would be 100MB
       | in size. If I take a 2nd snapshot of it tomorrow, my 2nd snapshot
       | would also be 100MB in size. My total storage needs would now be
       | 300MB.
       | 
       | Whereas if I had used git, and created 2 additional commits, each
       | making a change to a small text file, my total storage size would
       | be barely larger than 100MB. Describing the commits as a diff, as
       | opposed to a snapshot, leads to a better intuitive understanding
       | of why this would be the case.
       | 
       | Not to mention other features the article discussed, such as
       | cherry-picking. What does it even mean to "cherry-pick a
       | snapshot"? In comparison, cherry-picking a diff and applying it
       | to your current state, is far more intuitive.
       | 
       | And let's not forget commit messages. If a commit is a snapshot,
       | I would expect the commit-message to be descriptive of the entire
       | snapshot. Whereas if a commit is a diff, I would expect the
       | commit message to be descriptive of the diff. Which is exactly
       | how most people use commit messages.
       | 
       | Obviously both "diffs" and "snapshots" are leaky abstractions. If
       | you insist on using the "snapshot" abstraction, you will need to
       | resolve all of the above points of confusion by adding more
       | complexity to your abstraction. And if you prefer to use the
       | "diff" abstraction, you will eventually need to explain that a
       | commit is actually a combination of diffs, along with some other
       | metadata like a pointer to a parent commit. As a teaching tool,
       | you can make either abstraction work. But I find it far more
       | intuitive and useful to think of commits as "diffs + some
       | metadata".
        
         | jayd16 wrote:
         | Depends on the diff. If the diff is not aligned by bits a
         | single bit offset might cause double the size, ie the full file
         | to delete and a full file add.
         | 
         | >If you insist on using the "snapshot" abstraction
         | 
         | But its not insisted. Both abstractions are used as needed.
        
         | NTARelix wrote:
         | After going through the "Git Internals"[0] docs, I found that
         | the snapshot mental model has been much more helpful in
         | understanding what my Git commands are doing, how someone's
         | history got into a confusing state, etc. The primary model is
         | that of the Merkle tree, and subsequently hashing, which are
         | very simple and powerful concepts.
         | 
         | [0]: https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-
         | Po...
        
         | goerz wrote:
         | You can still think of them as snapshots. Git just does
         | compression on the entire folder of snapshots, including de-
         | duplication of data that doesn't change between snapshots.
         | 
         | In fact, when I teach git to students, I don't even bother with
         | the trees/blobs, which in my view are just an implementation
         | detail. I just tell them to think of git zipping up their
         | working directory together with some metadata (commit message,
         | reference to parents), and putting that zip file into its own
         | "compressed" storage inside the .git directory. That seems to
         | be sufficient for a good mental model of how to work with git
         | (independently of the git's somewhat baroque command line
         | interface, which just takes getting used to)
        
           | gilbetron wrote:
           | So it only stores the _diff_ erence between the two
           | snapshots? ;)
        
             | planckscnst wrote:
             | No, it stores an entirely new set of references to objects,
             | as well as some of those objects themselves (any that are
             | not identical to previously stored objects).
             | 
             | You cannot look at a commit on its own and know exactly how
             | it's different from the previous commit, but you do have
             | the complete new state. You have to look at the parent
             | commit's references and do an object-by-object comparison
             | to identify exact changes. On the other hand, when you look
             | at a diff, you can see exactly what has changed, but you
             | cannot produce the version that came before without also
             | having a complete copy of the current version.
        
             | dbt00 wrote:
             | The implementation-specific compression doesn't store
             | deltas or diffs, it stores unique blocks of text.
             | 
             | Git allows for shallow clones, which would be impossible if
             | the protocol or implementation were based solely around
             | diffs.
        
             | detaro wrote:
             | No. _If_ it _chooses to_ compress the commits, which it if
             | I remember correctly not does automatically for each
             | commit, but rather occasionally as a larger step, it uses
             | the difference to whatever it deems to be a good candidate,
             | if it finds one. E.g. if you have a file in commit A,
             | change it massively in later commit B, and then on a
             | different branch create commit C that also changes the file
             | to one very similar to the one in B, git might very well
             | compress C by storing the difference from B to C, despite
             | those having no direct relationship in the commit graph. It
             | can also choose to not use a delta to a different version
             | entirely, and this is 100% an internal implementation
             | detail of the storage system in git (afaik one of those
             | implementation details is that it prefers candidates that
             | are in the same commit chain, but it doesn 't have to - and
             | it can easily jump multiple commits if that works better).
             | If you ask git to show you a diff to the previous commit,
             | it does not pull a diff from storage, but pulls two file
             | versions from its storage backend (which if deltas have
             | been used to store will resolve those) and diffs them.
        
           | redisman wrote:
           | I don't know that you need to teach them any of that. Version
           | control is an abstraction. I have no clue what happens under
           | the hood and I don't care.
        
             | shuntress wrote:
             | To some extent, this is true. I don't feel the need to
             | totally understand gits packing logic or the specific
             | mechanics of the various diff/merge algorithms.
             | 
             | But some knowledge of how/why your tools work the way they
             | do can be very helpful.
             | 
             | Some knowledge of a tools internal working can be
             | fundamental to efficient use of that tool. At the very
             | least it can allow you to understand or derive your useful
             | interactions with that tool rather than simply memorize how
             | it is used.
        
           | hmsimha wrote:
           | This is the thing though. You're talking about snapshots
           | which actually have duplication removed... in my mind this
           | really fits more with the 'diff' model. I've already done the
           | exploratory diving-into-git-internals thing years ago, so I
           | could develop a better understanding of how things actually
           | work.
           | 
           | But for newcomers who want to understand how git is working,
           | it really makes more sense to tell them it's 'like a diff.
           | Not exactly under the hood, but think of it like a diff for
           | now'. This is what I've been telling people as I've mentored
           | a number of people in getting acquainted with git over the
           | years, and if they're curious enough to look under the hood,
           | they'll get a better understanding of the internals.
           | 
           | As a programmer, what you're working with is essentially the
           | diff. This is the easiest way to think about things
           | initially. The fact that git is storing blobs under the hood,
           | shallowly deduplicating blobs but still storing large chunks
           | separately that may contain duplicate data, until it
           | generates packfiles which do a deeper
           | deduplication/compression, is really not that helpful.
           | Telling people it's more like zipping is a bit disingenuous
           | because it doesn't really explain how things are compressed
           | more efficiently over the course of _many_ changes.
           | 
           | If I have a 1MB code file and make 1000 commits of one-line
           | changes then sure, git is initially storing large blobs
           | representing those, but then will compress over the change
           | set when it generates the packfile.
           | 
           | Compared to making a zip of the file for every change (say
           | these are 100KB compressed) and now you have people thinking
           | the 1000 one-line changes generate 100MB in the .git
           | directory.
           | 
           | You may think that a 1MB file with many smaller changes is a
           | fabricated example, but consider that dependency lockfiles
           | (package-lock.json I'm looking at you) can easily grow to
           | this size, and contain this many changes.
        
             | formerly_proven wrote:
             | The "snapshots which are stored as deltas, if that works"
             | part is _unrelated_ to the diffs the git porcelain
             | generates for you when you do a git-diff or git-show. The
             | former is purely an implementation detail of the storage
             | (albeit an important one), while the latter is entirely
             | virtual, calculated from the snapshots every single time
             | you view the data. That 's why operations like git-diff and
             | git-blame can take some time on large trees or histories
             | (and why e.g. git-blame has various options to tweak how it
             | tracks files across revisions, because that is not
             | something git does), while git-log is fast.
        
             | ako wrote:
             | Not really: if you do a checkout of a snapshot into an
             | empty directory, you expect the entire state at the time of
             | the snapshot, not just the diffs.
        
             | goerz wrote:
             | It may depend on the background of who you're talking to.
             | Programmers may be very comfortable with diffs, but non-
             | programmers (in my case, physics graduate students) usually
             | aren't. On the other hand, everybody is familiar with
             | snapshots: even high school student will end up with
             | "report_v1.docx", "report_v2.docx", etc, which are
             | snapshots at the file level (and work reasonably well as
             | long as you have a consistent scheme and don't need
             | branches). I've also routinely seen less-technical people
             | organize their research / paper writing by making a weekly
             | snapshot of their work folder ("project-2020-04-1").
             | Telling these people that git basically does the same thing
             | for them automatically with a tree-like "labeling scheme"
             | that allows for branches tends to go over quite well, in my
             | experience. For actually programmers, I'd be inclined to
             | give them a more technical introduction to git's internals.
             | I'd still point out that git stores compressed snapshots,
             | not diffs (especially if they're older and may have
             | previous SVN experience)
        
               | hmsimha wrote:
               | Those non-programmers are likely going to have a worse
               | understanding of what is happening when you zip/compress
               | something anyway, but I concede this is probably the most
               | straightforward path if they have some understanding of
               | what a zip is, and can't understand what a diff is. But
               | even then I question if they should be using git, since
               | `git diff`, `git show`, basically everything git exposes,
               | is going to show them diffs.
        
               | sagonar wrote:
               | A storage with pure diff would be impossible to recover
               | if you get a error in any commit. It would also be much
               | slower to examine the data, and newer version control do
               | not use pure diff.
               | 
               | The version control system Mercurial had description
               | about these problems on the homepage, "behind the
               | scense", which was good reading.
               | 
               | I am not sure if GIT is the best solution, but at least a
               | "pure snapshot" is okey, but where a diff storage must in
               | practise include some snapshot logic as well.
        
             | gowld wrote:
             | As a programmer I care about diffs only when I am comparing
             | two versions. A commit creates a new version. "Snapshot" is
             | a distraction.
        
             | goerz wrote:
             | Also (for less-technical audiences), I don't exactly dwell
             | on the de-duplication. It's just "Git makes snapshots and
             | puts them into .git in some efficient way. Don't worry
             | about it. Or, if you want the details, read the Git SCM
             | book."
        
             | bosswipe wrote:
             | The diff mental model doesn't work for things like `git
             | checkout <commit>`.
        
               | hmsimha wrote:
               | I actually haven't had a problem with this, though
               | perhaps it's because I understand what's happening at a
               | deeper level. You're generally referencing commits which
               | exist somewhere in this family of commits you can view
               | with `git log --graph`. You can easily think of checkout
               | as the path of diffs to get there. Files at commits are
               | still whole objects, mentally, but the thing we care
               | about as programmers working with multiple versions are
               | the diffs.
               | 
               | I have had it break down a bit more when working with
               | stash though, because now the object you're referencing
               | can exist outside of that graph-like commit family.
        
         | dfox wrote:
         | If you commit 100MB file, change few bytes in it and commit it
         | again your .git/objects will almost certainly contain two 100MB
         | objects. The fact that it is somewhat likely that running "git
         | gc" or something similar will convert one of them into
         | reference to the other one and some compact representation of
         | the difference is implementation detail.
         | 
         | While commit object does represent the snapshot it also
         | references the previous state, thus the commit message usually
         | describes what was changed between the referenced snapshot and
         | the parent(s) that are also referenced from the commit object.
         | 
         | As for the overall model and leakage between implementation
         | details and how people use it interesting approach is used by
         | SCCS/BitKeeper with its internal "weave" format that
         | essentially is both snapshot and diff at the same time.
        
         | jedimastert wrote:
         | I prefer to think of a repo as a whole as a tree, where the
         | nodes are snapshots and the vertices between each node is a
         | diff. This sort of lands us in both places
        
         | cesarb wrote:
         | > From a storage perspective, describing commits as snapshots
         | seems like a bad mental model. Suppose I have a directory that
         | is 100MB in size. If I take a snapshot of it, my snapshot would
         | be 100MB in size. If I take a 2nd snapshot of it tomorrow, my
         | 2nd snapshot would also be 100MB in size. My total storage
         | needs would now be 300MB.
         | 
         | That's not what one would expect. Suppose I have a directory
         | that is 100MB in size. If I take a snapshot of it ("btrfs
         | subvolume snapshot"), my snapshot would be 100MB in size, but
         | the storage needed for the original and the snapshot together
         | would still be 100MB (plus a few kilobytes of overhead). If I
         | take a second snapshot of it tomorrow ("btrfs subvolume
         | snapshot" again), my second snapshot would also be 100MB in
         | size, and my total storage needs would still be 100MB (plus a
         | few kilobytes of overhead).
         | 
         | If I made a change to a small text file before each snapshot,
         | my total storage size would still be barely larger than 100MB.
         | 
         | That is, when creating a snapshot, one would expect it to be
         | copy-on-write. While not exactly what git does (it's a content-
         | addressable storage instead of a copy-on-write storage), the
         | end effect is similar enough for most purposes (the main
         | difference being that undoing a change in git would not need
         | extra storage, while a copy-on-write storage would store a new
         | copy of the contents).
        
           | barrkel wrote:
           | Copy on write filesystems describe changes as a structural
           | diff, effectively.
        
             | klodolph wrote:
             | That's not really true. The copy-on-write filesystems just
             | allow multiple files to reference the same blocks, and only
             | allow modifications to blocks if the refcount is 1. At
             | least, at its simplest, that's how copy-on-write works. To
             | copy a file, you copy the block references and increment
             | the reference counts. You won't end up with a diff or
             | deltas stored anywhere.
        
           | whack wrote:
           | I've learnt something new today, thanks for sharing. Looks
           | like I had a naive understanding of how snapshotting actually
           | works.
           | 
           | I still think that it's more intuitive to describe commits as
           | diffs, in the context of things like cherry-picking a commit
           | or rebasing/reordering a series of commits.
           | 
           | But given that you can also "check out" a commit, in order to
           | get a specific snapshot of the repo, I can see the parallels
           | between commits and snapshots. Maybe both analogies are
           | equally useful in describing the different features that git
           | provides.
        
             | diroussel wrote:
             | The point of the article is not an analogy. Git is based on
             | snapshots. Abs diffs are computed from snapshots as needed.
             | 
             | The snapshots are also de-duplicated and compressed, but
             | that is not important.
             | 
             | The article is a good one. And if you spend the time to
             | understand git it gets easier to use.
        
             | breischl wrote:
             | >I still think that it's more intuitive to describe commits
             | as diffs, in the context of things like cherry-picking a
             | commit or rebasing/reordering a series of commits.
             | 
             | If I understood the article correctly, those things
             | actually are implemented via diffs. It's just that the
             | diffs are calculated on-the-fly, used to create a new
             | snapshot, and then discarded.
        
             | AaronFriel wrote:
             | It's helpful to understand git in terms of the "porcelain"
             | and the "plumbing".
             | 
             | The git commands you know and love are largely the
             | porcelain, nice fixtures over other things. When you "git
             | cherry-pick", under the hood what it's actually doing is
             | querying that commit's parent(s), finding the diff the
             | commit introduced relative to its parent(s), and then
             | applies those same changes to the index and your working
             | tree.
             | 
             | Cherry-pick is porcelain on top of the plumbing.
             | 
             | There are a few "write git yourself" tutorials out there,
             | of which "Write yourself a Git!" is I think the most
             | popular. In it, you'll learn how git really stores data,
             | and you'll write a (fairly basic) git client that can do
             | several things to locally manage a repository.
             | 
             | Write yourself a Git!: https://wyag.thb.lt/
        
             | spuz wrote:
             | The correct way to think about snapshots and diffs when it
             | comes to cherry-picking and rebasing is to realise that
             | diffs are always derived from snapshots. I.e. the
             | fundamental data-structure is the snapshot and from those
             | we can build diffs. Those diffs are necessary to implement
             | cherry-picking and rebasing but it's also possible to
             | imagine an implementation git that has those features
             | missing. It would still fundamentally work in the same way
             | - it would just be slightly less useful.
             | 
             | Edit: If you think this is just splitting hairs, I
             | encourage you to look at the differences between git and
             | pijul which is a VCS where the fundamental building block
             | is diffs: https://pijul.com/
        
           | throwaway894345 wrote:
           | Copy-on-write is an implementation detail that allows for
           | lower storage. The snapshot is still the full copy. One could
           | try to argue that the same is true for git in that diffs (or
           | content addressable storage) are just an internal
           | implementation detail, but as the parent pointed out that's
           | not quite true--our commits document the diff, not the
           | materialized snapshot.
        
           | crazygringo wrote:
           | Clearly people are using two diametrically opposed
           | definitions of snapshot.
           | 
           | If a snapshot is defined is _opposed_ to a diff, then it 's
           | clear snapshot means "full copy". If I snapshot the state of
           | my cloud server, it creates a full copy of its disk in block
           | storage somewhere, and takes several minutes to complete.
           | 
           | You are describing snapshots that exist _as part of a diff
           | system_ or copy-on-write system, where they use virtually no
           | storage at all, because further changes are assumed to be
           | applied _as diffs_ rather than overwriting previous data.
           | Where the snapshot is a  "marked" diff that can specifically
           | be rewinded to, as opposed to a general ongoing stream of
           | diffs.
           | 
           | But that's a more advanced and system-specific definition of
           | snapshot.
           | 
           | As a general mental model, when you say "think of it as a
           | snapshot not a diff", I think it's clear that the former
           | definition is being used, and that the expectation is a fully
           | copy that takes up disk space. Because otherwise, in the
           | second case, all the snapshots _are_ just the most recent
           | diff (on top of the entire prior history), so the sentence
           | "think of it as a snapshot not a diff" doesn't really mean
           | anything. The snapshot and the diff are the same.
        
             | klodolph wrote:
             | > If I snapshot the state of my cloud server, it creates a
             | full copy of its disk in block storage somewhere, and takes
             | several minutes to complete.
             | 
             | Which cloud provider are you using? Neither Amazon nor
             | Google take snapshots this way. Amazon EBS and Google
             | Persistent Disk both use copy-on-write semantics for
             | snapshots. If you take a hundred snapshots of a 100 GB
             | disk, your total usage is 100 GB plus metadata. When you
             | run a VM instance from that disk, the storage usage will
             | increase as blocks change, to a maximum of 200 GB total
             | storage (for live disk + out of date snapshot).
             | 
             | When I use QEMU or VirtualBox at home, I also get copy-on-
             | write snapshots of disks, although it's certainly possible
             | to get a full copy if you want. I think the feature is
             | pretty standard.
        
               | crazygringo wrote:
               | Digital Ocean. It absolutely takes snapshots by making a
               | full copy:
               | 
               | https://docs.digitalocean.com/products/images/snapshots/
               | 
               | So this is a perfect example of what I mean by the word
               | "snapshot" being used in two different ways by different
               | people.
               | 
               | Snapshot meaning "full copy" is one usage (Digital
               | Ocean), snapshot meaning "diff checkpoint" is another
               | usage (Google, AWS).
        
               | klodolph wrote:
               | Those aren't different definitions of "snapshot", though.
        
               | crazygringo wrote:
               | Of course they're different. They have different
               | meanings, so they're different definitions.
               | 
               | It's not like it's the same concept with different hidden
               | implementation details.
               | 
               | On Digital Ocean, I can delete the server but I still
               | have the snapshot. On the others, you can't. One copies,
               | the other bookmarks.
               | 
               | They're _entirely_ different concepts, therefore
               | different definitions.
        
         | Ericson2314 wrote:
         | > Obviously both "diffs" and "snapshots" are leaky
         | abstractions.....
         | 
         | Joel Spolsky wrote many great things, but "all abstractions
         | leak" was not one of them (edit his but not good). I am very
         | tired of programmers excusing their poor imagination with
         | appeals to this nonsense.
         | 
         | ------
         | 
         | Commits store snapshots. Full stop.
         | 
         | The "bad mental model" is not commits being snapshots, but
         | things behind stored individually, i.e.
         | 
         | > Sum |things| = |Product things|
         | 
         | This comes up in many other contexts, especially when storage
         | quotas are involved and it's unclear what to do when storage is
         | deduped across quotas.
         | 
         | -----
         | 
         | git packfiles do use a delta encoding, but it's important to
         | understand that there isn't any necessarily any correspondence
         | between the history and the delta encodidng. In fact, commands
         | like `git repack` exist _precisely_ to avoid path dependency
         | issues from the repacks matching the history too much.
         | 
         | Saying commits are diffs to explain the delta-encoding storage
         | characteristics is wrong and confuses, not clarifies.
         | 
         | ------
         | 
         | > And let's not forget commit messages. If a commit is a
         | snapshot, I would expect the commit-message to be descriptive
         | of the entire snapshot. Whereas if a commit is a diff, I would
         | expect the commit message to be descriptive of the diff. Which
         | is exactly how most people use commit messages.
         | 
         | It's git tree objects that are snapshots, commit objects have
         | tree child and a prev commit child, so it is natural for them
         | to describe the relationship between two states without
         | appealing to hypothetical alternatives.
         | 
         | > Not to mention other features the article discussed, such as
         | cherry-picking. What does it even mean to "cherry-pick a
         | snapshot"? In comparison, cherry-picking a diff and applying it
         | to your current state, is far more intuitive.
         | 
         | I might `git checkout somethingelse .` mid-rebase. What does
         | that mean if commits are diffs? Nothing very clear. The better
         | thing to teach people is about darcs and patch theory and those
         | other modules. I think the git model and the patch theory model
         | both have uses, and the fact that git makes people always work
         | in the git model is a fundamental issue that cannot be fixed
         | with analogies.
         | 
         | - Patch theory is good for the things are you still working on
         | 
         | - merkle dag of states is good for the things you've already
         | done / agreed upon.
        
           | gowld wrote:
           | > All non-trivial abstractions, to some degree, are leaky.
           | 
           | You look a bit silly making grandiose comments that take one
           | web searching to disprove
           | 
           | https://www.joelonsoftware.com/2002/11/11/the-law-of-
           | leaky-a...
           | 
           | > All non-trivial abstractions, to some degree, are leaky.
        
             | heinrich5991 wrote:
             | I think the emphasis was one "great". I.e. your parent
             | wants to say that this thing Joel wrote was not great.
        
             | breischl wrote:
             | I'm fairly certain he was disagreeing with the content of
             | the statement, not that Joel Spolsky wrote it.
             | 
             | ie, yes Spolsky said that, but he was wrong.
        
               | Ericson2314 wrote:
               | Yes, thanks
        
         | LukeShu wrote:
         | _> Suppose I have a directory that is 100MB in size. If I take
         | a snapshot of it, my snapshot would be 100MB in size._
         | 
         | Not with `btrfs subvolume snapshot`, it won't. If that's not a
         | snapshot, I don't know what is.
         | 
         | From a storage perspective, no dammit, Git commits _are
         | snapshots_ , look at the bits on disk if you don't believe it.
         | This isn't something that people who like to write blog posts
         | about Git made up for pedagogical purposes, it's how Git
         | _actually works_.
         | 
         | As you point out, it's wonky for pedagogical purposes; what
         | does it mean to "cherry-pick" a snapshot? When thinking about
         | cherry-picking, yeah, a diff makes more sense than a snapshot.
         | But saying a diff is better pedagogically doesn't change the
         | fact that a commit _actually is_ a snapshot (and when cherry-
         | picking, it diffs to snapshots to create a patch, then applies
         | that patch).
        
           | jgraham wrote:
           | > From a storage perspective, no dammit, Git commits are
           | snapshots, look at the bits on disk if you don't believe it
           | 
           | Except they're not. They're (often) packfiles, which are a
           | delta encoding i.e. a diff. It's not necessarily the same as
           | a specific commit, but appealing to "the bits on disk" is
           | wrong.
           | 
           | It is certainly true that the git object model each commit
           | object refers to a tree that represents the complete state of
           | the repository at that commit.
           | 
           | It is also true that _many_ git commands implictly treat a
           | commit as being the diff between the state of the tree in
           | that commit and the state in the parent. For example git
           | show, git rebase and git cherry-pick.
           | 
           | It is simultaneously true that the on-disk storage system is
           | optimised for performance and so doesn't map onto the object
           | model in a trivial way.
        
             | LukeShu wrote:
             | _> They 're (often) packfiles, which are a delta encoding
             | i.e. a diff. ...appealing to "the bits on disk" is wrong._
             | 
             | That's fair. The diffs in a packfile have no relation to
             | the "diff" that a commit would be if the commit were a
             | diff; so it's wrong to use "but packfiles" when arguing
             | that commits are diffs and not snapshots; but you're right,
             | packfiles make my "bits on disk" argument not quite right.
             | 
             | The way I look at it is that packfiles are a compression
             | mechanism; and they don't alter the fact that fundamentally
             | it's snapshots that are being compressed. But that's not
             | the only way of looking at it.
        
             | nyanpasu64 wrote:
             | > It is also true that many git commands implictly treat a
             | commit as being the diff between the state of the tree in
             | that commit and the state in the parent. For example git
             | show, git rebase and git cherry-pick.
             | 
             | A commit is a snapshot, and you can compute the diff
             | between a commit and any of its parents. If a commit has
             | multiple parents, git cherry-pick bails out unless you pick
             | a parent (usually -m 1), and git rebase, I think implicitly
             | assumes the first parent.
             | 
             | (EDIT: a commit's tree, its parents' trees)
        
               | LukeShu wrote:
               | _> If a commit has multiple parents, ... git rebase, I
               | think implicitly assumes the first parent._
               | 
               | `git rebase`'s behavior regarding merge commits is
               | shockingly complicated, but much of the time: Because by
               | default it linearizes the history, it actually just skips
               | merge commits because it assumes that the merge has
               | already happened implicitly by applying one of the
               | merge's parents on top of the other parent.
        
         | kazinator wrote:
         | > _What does it even mean to "cherry-pick a snapshot"?_
         | 
         | It means to do something like a three-way diff among three
         | snapshots: the cherry-picked baseline, the target, and a common
         | ancestor.
         | 
         | You can do something similar with the diff3 tool, which takes
         | three files (snapshots) as input, not diffs.
        
         | Twirrim wrote:
         | > From a storage perspective, describing commits as snapshots
         | seems like a bad mental model. Suppose I have a directory that
         | is 100MB in size. If I take a snapshot of it, my snapshot would
         | be 100MB in size. If I take a 2nd snapshot of it tomorrow, my
         | 2nd snapshot would also be 100MB in size. My total storage
         | needs would now be 300MB.
         | 
         | That's not the way storage snapshot works under most (all?)
         | storage targeted file systems, filers etc.. What you're talking
         | about there is a backup.
         | 
         | Snapshots are not backups. Snapshots work on "copy on write"
         | basis.
         | 
         | Roughly speaking, when you take a snapshot you draw a line in
         | the sand. "These were the files at this time". Snapshot
         | operations as a result are super cheap and super fast. Future
         | changes to those files results in the filer/file system writing
         | the modified blocks to new locations, not overwriting the
         | original data.
         | 
         | So take a 100MB directory. I create a snapshot. That results in
         | almost new storage usage, just a small amount of metadata. I
         | write/modify 10MB of data, now the total storage cost is 110MB.
         | If I take another snapshot after writing that 10MB. it's still
         | only 110MB of storage usage.
        
         | dbt00 wrote:
         | if your filesystem was copy on write and implemented snapshot
         | semantics internally (like WAFL for example, over 20 years old
         | now), then the second snapshot would not take 100MB, it would
         | just cost the metadata.
         | 
         | A commit is a snapshot of a tree with a reference to it's prior
         | ancestors. It's important to know that because it becomes
         | extremely relevant when trying to do things like merges
         | properly.
        
         | towergratis wrote:
         | Lookup Copy-On-Write. ZFS and BtrFS do it.
        
         | outworlder wrote:
         | Commits are snapshots.
         | 
         | How to represent those snapshots, and fix the storage bloat a
         | naive implementation would cause, is a completely different
         | problem.
         | 
         | One of the things that makes Git smart is that it doesn't try
         | to optimize things prematurely. SVN and co. would store actual
         | diff data, but this made some operations really hard to
         | implement (and, in many cases, slow).
         | 
         | Git has commits conceptually as snapshots. It's up to the
         | storage code to figure out how to deal with this.
         | 
         | > But I find it far more intuitive and useful to think of
         | commits as "diffs + some metadata".
         | 
         | Except that this is not what's happening. I wouldn't even call
         | it an abstraction, it's how things actually work. What you call
         | abstractions are actually operations. If we run a diff we are
         | interested in the changes, but if you ask git to show you the
         | commit it will show you just that.
         | 
         | If you think a commit is a diff, you have a mismatch between
         | the mental model and what's actually happening behind the
         | scenes. This will make it difficult to understand concepts
         | later on.
        
           | slavik81 wrote:
           | > If we run a diff we are interested in the changes, but if
           | you ask git to show you the commit it will show you just
           | that.
           | 
           | git show <commit SHA-1> will output a diff.
        
             | trulyme wrote:
             | I think this is more a sign that git (porcelain) is not
             | aligned with the underlying model.
             | 
             | It is actually a pity that so little effort went into git
             | UI. I find the OP explanation of git model awesome and the
             | presented concepts beautiful, but the cli utility has
             | countless naming and consistency problems which make me sad
             | that hg didn't win over git. Life would be much simpler for
             | many developers if it did, imho.
        
           | munk-a wrote:
           | > If you think a commit is a diff, you have a mismatch
           | between the mental model and what's actually happening behind
           | the scenes. This will make it difficult to understand
           | concepts later on.
           | 
           | I don't think those concepts are distinct as you're painting
           | them. At a user visible level commits will almost always be
           | visualized as diffs, which puts us at a place where - at the
           | highest level and lowest level they're defined as pretty
           | close to diffs, while at an intermediary level they're
           | defined closer to snapshots.
           | 
           | I honestly think they're neither, each expression method
           | (diff vs. snapshot) can be translated pretty easily and both
           | are trying to represent the same end goal. It can be helpful
           | to know that commits are representative of the full state of
           | the codebase that exists at a time, but that view can be at
           | odds with merging and rebasing which use actual change sets
           | to calculate - when a commit is being manipulated it's
           | helpful to view it as a diff (and git does this) - while as,
           | when a commit is being read, we're using it as a snapshot.
        
             | mewse wrote:
             | Structure purist, ingredient rebel: A snapshot between two
             | levels of diffs is a sandwich.
        
           | xmprt wrote:
           | One way I like to think about this is that when you rebase a
           | branch, the diffs are the same (barring any conflicts) but
           | the commits are different. Just another reason commits aren't
           | the same as diffs.
        
             | klodolph wrote:
             | The diffs are often different, even without conflicts. Try
             | comparing them some time, and look closely at the diff...
             | look at the lines starting with @. People usually ignore
             | those lines but "patch" does NOT.
             | 
             | This is not an irrelevant detail, but it's the result of a
             | three-way merge. The three-way merge can update those @
             | lines if it has a complete set of inputs (all three
             | inputs). If you to make a patch from one branch and then
             | apply it to a different branch without using the three-way
             | merge algorithm (stripping the diff of all its context),
             | the patch may fail to apply even if the three-way merge
             | succeeded without conflicts.
        
           | hibbelig wrote:
           | > _If you think a commit is a diff, you have a mismatch
           | between the mental model and what 's actually happening
           | behind the scenes. This will make it difficult to understand
           | concepts later on._
           | 
           | I find that thinking of commits as snapshots is not so
           | useful. I prefer to think of them as a pair of parent commit
           | and diff.
           | 
           | With that in mind, things like rebase become obvious: Take
           | the same diff and attempt to apply it to a different parent.
           | 
           | It's not clear to me how thinking of commits as snapshots
           | helps me to explain operations such as rebase.
           | 
           | I do concede, however, that "git cat" (I think that's the
           | command) seems more closely related to a snapshot: you
           | identify a commit and a file, and it will give you the
           | content of that file at that commit. Clearly in this case the
           | concept of a snapshot works well. But I need this very
           | rarely.
        
             | tlb wrote:
             | Rebase doesn't work that way, though [0]. It first extracts
             | the 3 versions (2 leafs and their common ancestor) and then
             | does a diff & patch.
             | 
             | This allows git to store the deltas between versions in the
             | most efficient way on disk, while also letting it use
             | contextual diffs to minimize the chance of spurious merge
             | conflicts. Patching algorithms have various heuristics that
             | make sense for programming languages, like special
             | treatment for lines with only changes in whitespace.
             | 
             | (Edited to add:) also, minimal diff algorithms have to do a
             | lot of work to detect large blocks of text being moved
             | around. This is part of what made Subversion, which used
             | the same diff algorithm for storage compression and
             | merging, painfully slow.
             | 
             | [0] https://git-scm.com/book/en/v2/Git-Branching-Rebasing
        
               | hibbelig wrote:
               | Here is the paragraph that describes what rebase does:
               | 
               | > _This operation works by going to the common ancestor
               | of the two branches (the one you're on and the one you're
               | rebasing onto), getting the diff introduced by each
               | commit of the branch you're on, saving those diffs to
               | temporary files, resetting the current branch to the same
               | commit as the branch you are rebasing onto, and finally
               | applying each change in turn._
               | 
               | Is "applying the diff to a different parent" not a good
               | way to describe this?
        
               | tlb wrote:
               | You're using the word 'diff' for 2 different things:
               | 
               | - an efficient way to store 2 very similar files
               | 
               | - the minimal set of changes made by a programmer to a
               | file.
               | 
               | Subversion uses the same diff algorithm for these 2
               | functions, which is why people conflate them. But git
               | uses different algorithms. The first one (which it calls
               | deltas) are optimized for speed and compression ratio.
               | The second set of algorithms (you can choose from a few,
               | some of which are better at identifying rearrangements of
               | large blocks of text) are optimized for merging 2
               | programmer's changes without conflicts.
        
             | haberman wrote:
             | > With that in mind, things like rebase become obvious:
             | Take the same diff and attempt to apply it to a different
             | parent.
             | 
             | You can think of it that way if you want. But it's not what
             | Git actually does.
             | 
             | Personally I much prefer to have my mental model match the
             | actual reality of things.
             | 
             | You may not use "git cat" very often, but what about "git
             | checkout <SHA>"? If commits were stored as diffs, then Git
             | would have to rebuild a tree of the very first commit, then
             | replay every single diff up to the SHA you asked for.
             | 
             | What it does in actuality is find the snapshot of that SHA
             | and change the working tree to match it.
        
               | hibbelig wrote:
               | > _You may not use "git cat" very often, but what about
               | "git checkout <SHA>"? If commits were stored as diffs,
               | then Git would have to rebuild a tree of the very first
               | commit, then replay every single diff up to the SHA you
               | asked for._
               | 
               | Yes, this is true. I don't know why it never bothers me.
               | Maybe it's because you could also store the diffs in the
               | opposite direction (i.e. store the tip of each branch in
               | the clear, then store diffs from each commit to its
               | parent). Computing the inverse of a diff should be a
               | quick operation. Usually, when you check out something,
               | it's the tip of a branch or near the tip of a branch.
               | 
               | Anyway.
               | 
               | Of course I know that storing trees makes it easy to
               | compute diffs. Computing diffs will becomes slower with
               | larger trees. On the other hand, storing diffs makes it
               | slow to compute trees, and the more commits we've got,
               | the slower the tree computation goes.
        
               | kevincox wrote:
               | > Computing diffs will becomes slower with larger trees
               | 
               | Not usually. Computing a diff is roughly O(n) with the
               | size of a diff. This is because unchanged leaves of the
               | tree can be seen as identical (because the are content
               | addressed) and are skipped. So to compute the diff you
               | only need to recurse into changed directories.
               | 
               | So having a million files in the root directory and one
               | has changed is very fast to diff as you just diff that
               | one file. The worse case is the diff happening in a very
               | deeply nested directory with lots of files in each of the
               | subdirectories but even that is quite cheap as diffing a
               | sorted directory listing is O(n) with the size of the
               | listing.
               | 
               | (The actual worst case is diffing large files as most
               | text diff algorithms are worse than O(n))
        
               | yxhuvud wrote:
               | > If commits were stored as diffs, then Git would have to
               | rebuild a tree of the very first commit, then replay
               | every single diff
               | 
               | Well, it would usually be more efficient to figure out
               | where the current checked out branch differ from the
               | branch that is checked out, and then unapply and apply
               | diffs as needed.
        
               | wazari972 wrote:
               | what about "git cherry-pick <commit>"?
               | 
               | with this command you don't import a snapshot, but only
               | the diff between <commit~>..<commit>, so the model
               | parent+diff makes sense to me
        
               | taberiand wrote:
               | If git did rebuild the graph, right from the very first
               | commit, the end result of the operation would look
               | identical to the user as it does now.
               | 
               | It seems to me the two mental models are interchangeable
               | when it comes to the use of git from the users point of
               | view. What is missing, from the users point of view, when
               | they model commits as diffs+parents vs as snapshots?
               | 
               | Now I think about it, it's probably that users have a bad
               | understanding of the commit-as-diff models; they could
               | similarly have a bad understanding of the commit-as-
               | snapshot model I expect, I don't know that thinking in
               | snapshots helps to understand git from an users point of
               | view better than thinking (properly) in diffs.
               | 
               | The article for example explains that any two commits can
               | be differenced because the underlying snapshot trees can
               | be compared, but the commit-as-diff model can as easily
               | explain why comparing two commits works by tracing each
               | commit back to the common base commit - so the commit-as-
               | diff mental model just needs to remember that commits are
               | fundamentally tied to the path they have back to the root
               | commit.
               | 
               | It seems to me if you take the diagrams from the article
               | and remove the under-the-covers stuff leaving just the
               | circles, the commits-as-diffs and commits-as-snapshots
               | models look exactly the same.
        
               | JoshuaDavid wrote:
               | Merge commits are a bit hard to understand from the
               | perspective of "a commit is basically just a parent
               | commit plus diff".
               | 
               | On the flip side, cherry-picking is hard to understand
               | from the perspective of "a commit is basically just a
               | snapshot, nothing more" (it's _also_ weird from the
               | parent-commit-plus-diff perspective -- cherry-pick is
               | kind of a weird operation, but useful enough that we keep
               | it anyway despite it not fitting quite as cleanly into
               | the git model as other operations).
               | 
               | Outside those edge cases, though, people with "snapshot"
               | and "parent + diff" mental models will make basically
               | identical predictions about what the results of various
               | operations with git will be.
        
               | haberman wrote:
               | > What is missing, from the users point of view, when
               | they model commits as diffs+parents vs as snapshots?
               | 
               | With the wrong mental model it's harder to predict what
               | operations are expensive. If "git checkout <SHA>" truly
               | did have to replay all diffs from the beginning of time,
               | it would be a very expensive operation that is best
               | avoided unless you absolutely need it. But in practice it
               | is a very fast operation (one of the fastest) that there
               | is no need to shy away from.
        
             | [deleted]
        
             | [deleted]
        
             | klodolph wrote:
             | The way you try to apply a diff to a different parent is by
             | doing a three-way merge... the vast majority of tools do
             | this by taking three files as arguments and producing a
             | fourth as output. The three-way merge is the underlying
             | process which makes merge, rebase, cherry-pick, and revert
             | work. They are all just "three-way merge, shuffle the
             | arguments around, and adjust metadata".
             | 
             | The parent + diff storage is not isomorphic to snapshot
             | storage. Snapshot storage reflects the actual usage of VCS
             | tools... people make changes, and record the final state.
             | Parent + diff does not do this, it records the changes,
             | which requires creating a diff, and there are multiple ways
             | to create a diff between two snapshots.
             | 
             | Git postpones the "which diff is correct" question until
             | you actually care about the answer.
        
       | iudqnolq wrote:
       | Why have so many people written long thoughtful explanations
       | about how the author is wrong to suggest snapshots are a better
       | mental model, and that you think all abstractions are leaky, but
       | you find diffs a better mental model?
       | 
       | The entire article is literally about how commits are literally
       | snapshots. I would say people didn't read TFA, but a lot of
       | people are quoting lines from TFA and then going on to argue
       | with/expand on them in a way that is directly contradicted by the
       | next few lines.
       | 
       | I think it's because most of the people here have spent years
       | working with git, and are so deeply attached to their
       | understanding that they didn't hear most of what the article
       | said.
       | 
       | (Some commentators have pointed out specific oversimplifications
       | the author makes like glossing over pack files, I'm referring to
       | the people who say a git blob is a diff when the entire point of
       | TFA is that it isn't)
        
         | mekkkkkk wrote:
         | What does TFA mean in this context?
        
           | dekerta wrote:
           | TFA = The F**ing Article
        
             | mekkkkkk wrote:
             | Thanks. I had a hunch. I'm familiar with "RTFM", but would
             | probably get equally confused if "TFM" was used as a noun.
        
               | hunter2_ wrote:
               | I suspect the chronology is something like RTFM -> TFM ->
               | RTFA -> TFA, but the second and third might be switched.
               | Dropping the R does introduce obscurity, but being able
               | to convey the underlying sentiment (that while the
               | content could/should have been consulted, it seems as
               | though it was not) without a verb allows for a
               | nonconfrontational syntax similar to passive voice, but
               | even moreso, and often without obvious "weasel" effect,
               | to boot!
        
             | nayuki wrote:
             | And I believe this slang came from Slashdot, which is like
             | the Hacker News forum in the decade before Hacker News
        
           | doublerabbit wrote:
           | The same context as RTFM
        
           | iudqnolq wrote:
           | it's an abbreviation to refer to the article being discussed
           | on a site like this.
        
         | Agingcoder wrote:
         | Agreed, commits are snapshots, whether we like or not. For
         | obvious storage efficiency reasons, the implementation then
         | diffs/packs/etc, but this is a different issue altogether.
         | 
         | I have found that I can't work with git with a different mental
         | model (diffs). Every time things get messy, the diff model is
         | not enough, whereas snapshots + commit graph + names/pointers
         | make things natural.
         | 
         | Interestingly enough, when migrating people from svn to git,
         | explaining the actual model makes the transition much smoother,
         | so it would seem I'm not the only one.
        
         | koolba wrote:
         | > Why have so many people written long thoughtful explanations
         | about how the author is wrong to suggest snapshots are a better
         | mental model, and that you think all abstractions are leaky,
         | but you find diffs a better mental model?
         | 
         | Once you remember (learn?) that a commit can have N parents, it
         | becomes apparent that it cannot be a single diff.
        
         | nightpool wrote:
         | > Why have so many people written long thoughtful explanations
         | about how the author is wrong to suggest snapshots are a better
         | mental model, and that you think all abstractions are leaky,
         | but you find diffs a better mental model?
         | 
         | Probably because, to take their words at face value, they find
         | diffs a better mental model? I think impugning "people [...]
         | are so deeply attached to their understanding that they didn't
         | hear most of what the article said" is a real bad faith
         | reading, especially when you even acknowledge that central to
         | people's arguments is "all mental models are leaky". This
         | article may be technically correct about the way git internals
         | are structured, but it makes cherry-pick and rebase _more_
         | mentally complex for users to understand (you first have to go
         | from commit = > patch), not _less_.
         | 
         | Saying "Commits are collections of files + a parent commit, but
         | you can diff it to generate a patch" and saying "Commits are a
         | patch + a parent commit, and you can apply it to generate a
         | collection of files" are isomorphic mental models--the fact
         | that #1 is "correct" (for some value of correct that doesn't
         | include the actual files stored on disk) is really besides the
         | point.
        
           | iudqnolq wrote:
           | My point is that people criticizing TFA's proposed mental
           | model are missing the fact that TFA doesn't propose a mental
           | model, it explains how things work. Both have value, but
           | they're distinct.
        
             | nightpool wrote:
             | I disagree. TFA is _explaining the mental model Git uses to
             | structure their codebase_. If you 're writing code for Git,
             | this is obviously very useful to understand, but if you're
             | just using it, this is only one of several mental models
             | available to you. In this case, I think it's right to say
             | that the distinction the author is attempting to draw is
             | immaterial to those not working on the Git codebase.
        
               | iudqnolq wrote:
               | If your code is written in a certain way that's a model,
               | not a mental model.
        
               | iainmerrick wrote:
               | Yes! It just seems so strange not to care about how
               | things _actually are_ in software. Is it a way of coping
               | with the fact that so much software is so deeply layered
               | and complex now?
               | 
               | Maybe I'm misremembering, but I feel like I didn't see
               | this usage of "mental model" much until fairly recently.
               | The first I recall being surprised at was a discussion of
               | a "mental model of Javascript" -- why would you need a
               | mental model of something with a very detailed spec and
               | multiple compatible implementations to study? If you want
               | to understand how some aspect works, just look up how it
               | actually does work.
        
         | smallnamespace wrote:
         | People are disagreeing with the author, not because they didn't
         | necessarily read the article, but because _they don 't agree
         | about how things should be defined_.
         | 
         | At the root, this is a disagreement about semantics and
         | philosophy, not about git itself. I'm going to refer to
         | Aristotle here: _we think we have knowledge of a thing only
         | when we have grasped its cause_ , and there are four general
         | 'causes' [1]:
         | 
         | - The material cause: 'What is it made of?'
         | 
         | - The formal cause: 'What is the _ideal_ of this thing? ' ,
         | e.g. what's its abstract nature?
         | 
         | - The efficient cause: 'How did this thing come to be?'
         | 
         | - The final cause: 'What is its purpose?' How is it actually
         | used? What role does it play in the world?
         | 
         | Here we can see that commits are used (at least in the git
         | internals) as 'snapshots' -- they refer to bytes, not changes
         | in bytes. That's pretty close to the formal and efficient
         | causes -- the abstraction inside of git is closest to a
         | snapshot, and that comes from the history of what Linus wanted
         | when he wrote it.
         | 
         | But! The underlying storage uses deltas (which are diffs) to
         | save space. That's the material cause.
         | 
         | But also, when we actually _use_ commits, git often creates
         | diffs for us as a convenience (cherry-picking, rebasing), and
         | hides the fact that they 're snapshots under the hood (final
         | cause).
         | 
         | So there's an inherent tension between the different ways to
         | answer 'what is a thing?'. For commits, this is especially bad,
         | since there's an even split between 'causes'.
         | 
         | This tension never goes away because the most useful definition
         | really depends on the context.
         | 
         | [1] https://plato.stanford.edu/entries/aristotle-
         | causality/#FouC...
        
           | cesarb wrote:
           | > The underlying storage uses deltas (which are diffs) to
           | save space.
           | 
           | Not necessarily! The base git storage stores each object
           | individually, not as deltas ("disk space is cheap"); it's
           | only after a "git gc" that they are stored as deltas to other
           | (potentially unrelated) objects. The original implementation
           | of git didn't even have the delta storage (pack files), it
           | was added later as an optional optimization.
           | 
           | So answering to "what it's made of?" with "deltas" comes with
           | a huge caveat, that it's often partially or completely
           | untrue.
        
           | haberman wrote:
           | > But! The underlying storage uses deltas (which are diffs)
           | to save space. That's the material cause.
           | 
           | This does not make the "commits are stored as diffs" story
           | much more true:
           | 
           | 1. This is only true of pack files, but pack files are only
           | created once the repository exceeds a certain size.
           | 
           | 2. Nothing about the pack file format requires that deltas
           | follow the chronology of commits at all. The deltas could be
           | stored in reverse order or even random order compared to the
           | chain of commits.
           | 
           | 3. The deltas in a pack file do not correspond to a change in
           | a given commit, they are just the data to create a particular
           | snapshot. If you find that a commit's file blob is stored in
           | a pack file as a delta, that does not tell you anything about
           | whether the file changed in _that particular commit_. You
           | have to look at two commits and diff them to determine which
           | files actually changed.
           | 
           | If a person wants to think about version control in an
           | abstract way, then yes the two views (commits vs diffs) are
           | somewhat interchangeable. If a person wants to understand
           | what actually happens when you run a Git command, the answer
           | to that question is less open to interpretation.
        
           | efaref wrote:
           | The true zen of source control is that they are _both_.
        
           | iudqnolq wrote:
           | This is exactly what I'm talking about. A person posts "this
           | is literally how this works", and someone replies
           | "philosophically I would prefer to think it works
           | differently, therefore you're wrong".
        
       | ndand wrote:
       | I used to think commits as snapshots, but it was confusing. Then
       | I read "Git Internals".
       | 
       | A commit contain the "whole" content of each file that we've
       | commited. But since a commit has a pointer to a root commit, it
       | also represents a working directory. Even though a commit contain
       | "whole" files, the git internally stores only parts of the files
       | as an optimization.
       | 
       | When we diff two commits, we see the difference of the file
       | contents in the corresponding working directories that the
       | commits represent.
        
       | dmuth wrote:
       | If anyone does want to get more into the internals of Git without
       | playing with a production repo, I built a "playground" awhile ago
       | which creates a simple Git repo of synthetic commits which you
       | can then play around with:
       | 
       | https://github.com/dmuth/git-rebase-i-playground
       | 
       | I know it says "rebase -i", which originally what I built it for
       | (and what the exercises in the README are for), but you can
       | really do whatever you want in it, and blow away/rebuild the repo
       | with the included script.
       | 
       | Enjoy!
        
       | d_tr wrote:
       | My first tutorial was the Pro Git book, and this fact was
       | stressed well there so it stuck. Thinking of commits as snapshots
       | also has the small advantage of making the first commit less
       | special.
        
       | Tomminn wrote:
       | It strikes me as bizarre that something as old and as important
       | as git is to the general version control problem, doesn't have a
       | beautiful, complete and helpful user interface.
       | 
       | With the status quo how it is, I definitely love articles like
       | this because every time I use git I get a kind of anxiety that
       | fades only in proportion to the depth with which I understand
       | actual git mechanics.
       | 
       | The thing I find strange is that when I interact with databases
       | that have beautiful, helpful user interfaces, I have almost none
       | of this anxiety, and just kind of accept "black box that handles
       | things", and move on with my life.
       | 
       | I figure I must not be alone in this psychological niche. Which
       | again, makes it bizarre that the problem of giving git a
       | beautiful, complete, helpful front end has not been solved.
        
         | motoboi wrote:
         | > It strikes me as bizarre that something as old and as
         | important as git is to the general version control problem,
         | doesn't have a beautiful, complete and helpful user interface.
         | 
         | It has several.
         | 
         | Tower is a wonderful interface in MacOS, Sublime-Merge too.
         | 
         | Github is another, Gitlab also a very good. Gog is a free as in
         | beer option too.
         | 
         | There are several. None has dominated the market, tough.
        
         | marcodave wrote:
         | Would you like to talk about our lord and saviour IntelliJ ?
        
         | nwatson wrote:
         | I like SourceTree from Atlassian, I dip into the command-line
         | from time to time but it meets many needs.
         | 
         | Only problem is, no Linux version, only macOS and Windows. But
         | that's now solved with WSL2 ... code in Linux/Docker/PyCharm
         | etc on Windows WSL2, SourceTree on Windows.
        
         | adamnew123456 wrote:
         | I guess I'll be the one to make the obligatory "magit is
         | awesome and if you use Emacs you should definitely check it
         | out" comment.
         | 
         | Other that being horribly slow on Windows I can't think of any
         | downsides. Aside from the very rare black magic incantations it
         | does everything I've needed from a Git frontend.
         | 
         | If something like it existed for SVN ($JOB VCS of choice,
         | sadly) I would abandon Tortoise in a heartbeat. IntelliJ is
         | nice but the overhead of the VCS add-ons kill my startup time.
        
         | SCLeo wrote:
         | I can't agree with you more. git commands are definitely not
         | designed for the current main stream usage (i.e. with services
         | like GitHub/GitLab). Simple things like forking a repo from
         | another user and edit locally requires >10 non-straight forward
         | steps is far from ideal.
        
           | mekkkkkk wrote:
           | There are so many tools to help with this though? If you want
           | to work with Github, there is an official Github CLI tool
           | that makes forking easy peezy. Gitlab doesn't have an
           | official one AFAIK, but there are unofficial ones. And if you
           | want GUI there's a myriad of those as well. I don't
           | understand this complaint at all.
        
       | rhabarba wrote:
       | Darcs users disagree.
        
       | siawyoung wrote:
       | Commits are conceptually snapshots, and everything else Git does
       | is just an optimization over the naive "keep all versions of all
       | files ever" (imagine implementing a version of Git that is just
       | zipping the entire folder). Diffs are isomorphic to commits and
       | are generated as needed.
       | 
       | I wrote about it (albeit imprecisely) here:
       | https://siawyoung.com/git-intuition
        
       | samatman wrote:
       | This blog post is the most compelling argument I've yet seen for
       | pijul.
       | 
       | Git _should work the way we think it does_! It 's confusing that
       | snapshots are being converted into a few different forms of
       | change object, which can be reconciled with merges or rebases or
       | applying patches.
       | 
       | Pijul (and darcs before it) actually works on the basis of
       | patches, pijul with a robust theory of patches. A cherry-pick
       | just moves a patch from one history-of-patches (branch) to
       | another history-of-patches. One can share just a patch, and
       | applying it is guaranteed to be the same action everywhere if
       | that's possible, which it often is.
       | 
       | I'm patiently waiting for pijul to be mature enough that I can
       | move everything over to using it, it's one of the more exciting
       | projects in the last ten years.
        
         | jayd16 wrote:
         | Can I make a shallow clone in Pijul?
        
         | volta83 wrote:
         | > I'm patiently waiting for pijul to be mature enough that I
         | can move everything over to using it
         | 
         | Pijul is super slow. I've tried it a couple of times, and is
         | too slow to be usable.
        
         | smichel17 wrote:
         | I don't view git as a series of diffs. I view it as a logical
         | extension of my file system to include a time dimension (or in
         | fewer words, as snapshots).
         | 
         | It replaces file-v1, file-v2, file-v2-with-changes-from-Alex,
         | etc, that you commonly find on the hard drives of people not
         | familiar with version control. That it can generate meaningful
         | diffs is a product of the type of data we're storing.
        
         | ausbin wrote:
         | > Git should work the way we think it does!
         | 
         | Hold on, who is "we"? Personally speaking, git works the way I
         | think it does. Granted, I've written my own (simple) libgit2
         | frontend, so I understand the git internals fairly well, on a
         | high level at least
         | 
         | I haven't looked into pijul, but why is teaching people a new
         | tool more helpful than teaching people how the tool they
         | already use works? (Like the OP blog post does.)
         | 
         | Am I blinded by the knowledge I gained from writing my little
         | tool and learning about git internals? I get that a tool you
         | need to learn the internals of to use is probably a bad tool,
         | but is asking git users to understand the contents of the OP
         | blog post really too much? Maybe I'm just a git fanboy...
        
           | notdonspaulding wrote:
           | >> Git should work the way we think it does!
           | 
           | > Hold on, who is "we"?
           | 
           | I'm not the GP, but I agree that git should work the way "we"
           | think it does, and I think a reasonable definition of "we" in
           | the context of Git Users is probably SaaS/Startup/SMB
           | software engineers.
           | 
           | Git is popular enough to have many thousands of different use
           | cases, but I would speculate that the distribution of use
           | cases probably follows the distribution of public
           | Github/Gitlab repos pretty closely.
           | 
           | > Personally speaking, git works the way I think it does.
           | Granted, I've written my own (simple) libgit2 frontend,
           | ...snip...
           | 
           | > Am I blinded by the knowledge I gained from writing my
           | little tool and learning about git internals?
           | 
           | Yes.
           | 
           | > I get that a tool you need to learn the internals of to use
           | is probably a bad tool, but is asking git users to understand
           | the contents of the OP blog post really too much?
           | 
           | Yes. Or rather, knowing git's internals is incredibly helpful
           | if you've already decided to use git and now you're deciding
           | _what workflow to use to develop software_ , because you can
           | match your mental model of how to use git to the way git
           | naturally wants to represent your stored work.
           | 
           | However, if you come to git with an existing mental model of
           | software development, and that existing mental model includes
           | the idea of "branches" or "diffs" or "immutable history",
           | then you're going to quickly and repeatedly run into
           | stumbling blocks as your mental model doesn't match git's
           | internal model. Git can _do_ branches and diffs and immutable
           | history, of course, but they 're a leaky abstraction on top
           | of the concepts git really cares about.
           | 
           | > Maybe I'm just a git fanboy...
           | 
           | Sure, nothing wrong with that!
        
         | klodolph wrote:
         | > Git _should work the way we think it does!_
         | 
         | I think it works using snapshots... or are you saying that Git
         | should work the way that _you_ think it does, and not how _I_
         | think it does?
         | 
         | It's clear that Git is not the final evolution of version
         | control systems, that we are just currently in the "Git era"
         | and at some point we're going to be in the "post-Git era" of
         | VCS. It's unclear what that looks like, but I am skeptical when
         | I hear these claims about Pijul.
         | 
         | > One can share just a patch, and applying it is guaranteed to
         | be the same action everywhere if that's possible, which it
         | often is.
         | 
         | My understanding is that you need to define a very weak version
         | of "same version everywhere" which is useless. With Git, you
         | can merge and get no conflicts, but that is no guarantee that
         | the patch applied successfully... it just means that the merge
         | operation didn't run into any obstacles. It's not just the
         | patch that needs to be vetted by humans, it's the _state_ which
         | must also be vetted, and that 's one of the problems that Git
         | solves well.
        
         | cmeacham98 wrote:
         | No idea what Pijul is, but how does this not describe git?
         | 
         | Unless your complaint is that a commit is really a set of
         | diffs/patches?
        
           | chriswarbo wrote:
           | Pijul (and Darcs) operate on sets of patches. As a simple
           | example, git commits have at least one 'parent', which
           | imposes an order, e.g. let's say I edit file X in commit x
           | and file Y in commit y; if I want both of those changes, git
           | forces me to apply them in a particular order, e.g. [x, y].
           | If someone else applied those same two commits in a different
           | order, they'll get a different commit ID, which may cause
           | problems e.g. when trying to merge their changes with ours.
           | 
           | If we treat x and y as (sets of) patches instead, then the
           | set {x, y} is the same as the set {y, x}; the order doesn't
           | matter (we say those patches _commute_ ).
           | 
           | The idea of commuting patches is _really_ useful, since we
           | can rearrange patches to a more convenient form. For example,
           | if we commit something we shouldn 't (like a password, or a
           | huge binary), then later remove it, a system like git makes
           | it hard to remove that file from the history. If we're
           | dealing with sets of patches, we can simply swap them around
           | until the 'add file' and 'remove file' patches are next to
           | each other, then merge those two patches. Voila, the file no
           | longer appears, the rest of the history remains intact, the
           | branch's content is guaranteed to remain unchanged (since we
           | only swapped commuting patches, which doesn't change
           | anything; and merged two patches, which doesn't change
           | anything).
        
           | dan-robertson wrote:
           | People using git think that commits are patches. But that
           | isn't how git works. Git sometimes tries to let you treat a
           | commit like the diff between it and it's parent and lets you
           | try to rewrite history but these are really making new
           | commits with new ids and this confuses people.
           | 
           | In pijul, the objects you interact with _actually are_ diffs
           | (aka patches) and then snapshots are well-formed sets of
           | patches. Here, well-formed means that if a patch is in the
           | set then so are it's dependencies (these dependencies aren't
           | like parent commits in git, they're more like you need to add
           | line 3 before you can delete it). So removing or modifying a
           | patch in a branch isn't a horrific interactive rebase
           | operation anymore.
           | 
           | When you move a patch in pijul it doesn't affect any of the
           | patches written before or after it (unless they depend on
           | it). When you "move a patch" in git you rewrite the history
           | and create new commits, so if I was talking about a commit
           | (id) before the move, I would be talking about some dangling
           | commit after the move and would need to update my id to the
           | corresponding new post-move commit.
        
         | diegocg wrote:
         | I have visited the pijul site 2 or three times, every time I
         | would start reading about a "sound mathematical theory", get
         | bored, and close the tab. To this date I still don't know what
         | pijul is trying to do and why I should be interested on it.
         | 
         | They really should improve their documentation (hint, in case
         | someone reads this: nobody except a few geeks give a shit about
         | sound mathematical models. Show me how pijul makes my life
         | easier compared to git, that's all I need)
        
           | avodonosov wrote:
           | Have you ever rebased a long chain of git commits onto new
           | branch, where one of the first of those commits have a
           | conflict with the new base, and after resolving this conflict
           | for that commit you have the same conflict over and over
           | again for all the subsequent commits, even if they did not
           | modify that place in the code, and you need to manually
           | resolve it again and again?
           | 
           | Pijul will, as I understand, save us from those unnecesary
           | repeated "conflicts".
           | 
           | See also the answer by @chriswarbo about removing unndesired
           | changes from history
        
             | zemo wrote:
             | you know about rerere right? https://www.git-
             | scm.com/book/en/v2/Git-Tools-Rerere
        
               | MattIPv4 wrote:
               | I feel like this is missing something about the
               | drawbacks? Or are there truly no drawbacks beyond disk
               | usage for the cache, and folks should just enable it once
               | they're aware it exists?
        
               | mekkkkkk wrote:
               | I guess it involves a bit of assumptions and guesswork to
               | automatically replay your previous actions to files that
               | in turn may have changed. It probably slightly increases
               | the chances of Git doing something you didn't expect, and
               | not tell you about it. Hence why it isn't default. Maybe?
        
               | avodonosov wrote:
               | No, I did hot know that, repeated everything manually.
               | Will try that next time, thank you.
               | 
               | BTW, pijul docs mention rerere as helping "in some
               | cases":
               | 
               | > This is why in these systems, conflicts are often
               | painful, as there is no real way to solve a conflict once
               | and for all (for example, Git has the rerere command to
               | try and simulate that in some cases).
               | 
               | https://pijul.org/manual/why_pijul.html
        
               | bombcar wrote:
               | I feel their example (with the ABGX) just makes me think
               | "merging can result in weirdness silently and git and
               | pijul do it different but silent" - it doesn't really
               | argue that one is better than the other.
               | 
               | (Most people probably use git as an effectively infinite
               | string of zip files anyway. https://xkcd.com/1597/ )
        
             | jayd16 wrote:
             | Could one not add a new rebasing strategy to git by
             | generating patches from the git history? Are the concepts
             | non-translatable?
        
               | avodonosov wrote:
               | I think rebase alreaby works by generating patches, but
               | for some reason the repeated conflicts happen...
        
         | mdnahas wrote:
         | I strongly disagree.
         | 
         | Snapshots are a useful concept for programming. Each snapshot
         | represent a compilable program with a certain set of features.
         | So snapshot A has a certain set of features and B has another.
         | 
         | Diffs are not a useful concept. Does the diff between A and B
         | represent the new features in B? No. Because if it did, it
         | would mean I could take any another compilable snapshot C and
         | apply the diff of A and B to it, then I should end up with a
         | snapshot D is compilable and has all the features of C with the
         | new features in B. And that doesn't work with any programming
         | language I know.
         | 
         | It doesn't even work with the most trivial features.
         | 
         | Diffs may be a useful concept when working with some data
         | formats. But for programming languages, snapshots are the right
         | concept.
        
       | masukomi wrote:
       | this... seems so very flawed and disprovable to me. Ignoring the
       | obvious storage issues that have been discussed if commits were
       | snapshots you could rebase and reorder them without ever worrying
       | about conflicts. In reality you very much DO have to worry about
       | conflicts because they are change instructions that transform a
       | file from A->B->C if you try and reorder it as A->C->B you're
       | going to have serious issues (assuming these all touch the same
       | code) because C is a transformation from the B state to the C
       | state. It blows up attempting to convert A->C because the
       | instructions in that transformation describe going from B->C.
       | 
       | > A commit is a snapshot in time. Each commit contains a pointer
       | to its root tree,
       | 
       | it so... _so_ very much isn't. It's not even a snapshot in time
       | of a section of a file.
       | 
       | It's a change instruction. No, it's not a "diff" but it also
       | isn't a snapshot.
        
         | nyanpasu64 wrote:
         | commits are snapshots. cherry-picking/rebasing diffs a commit
         | and its (first?) parent, and applies the diff on a base commit
         | to create a new commit.
         | 
         | if you `git replace` a single commit and change its contents,
         | its children do not change their contents, so `git show`ing any
         | direct child will show a new diff, not previously present,
         | reverting the actions you've performed in `git replace`.
        
       | cryptonector wrote:
       | Yes, exactly, this is a very good post on the nature of Git.
       | 
       | > Branches are pointers
       | 
       | Yes. I would say they are named pointers. Commit hashes are weak,
       | unnamed pointers.
        
       | karmakaze wrote:
       | This comes up from time to time and each time the comments debate
       | the correctness/effectiveness of the title.
       | 
       | The contents of the post does shed much light on how git operates
       | and introduces a view that can help in navigating how to use git.
       | 
       | Whether or not you want to think of a commit as a snapshot or a
       | diff isn't material. It's best to think of it as a dual, since a
       | diff on any base can create a snapshot, and a snapshot can create
       | a diff from a snapshot.
       | 
       | This very much mirrors the idea of a transaction log (of diffs)
       | and a 'current' state. The current state is convenient, can
       | benefit performance, but is not absolutely necessary. It doesn't
       | even have to be the most recent, e.g. key frames in video
       | compression. These are all just ideas, getting used to them and
       | being able to move viewpoints between them is better than
       | clinging to any one of them.
        
       | ashton314 wrote:
       | I really liked this video: the guy first walks you through how to
       | build your own git-like utility with a handful of shell commands,
       | then goes and walks through an actual git repo:
       | 
       | https://youtu.be/qq_s2Hh--aQ
       | 
       | Even the first 20 minutes was enough for me to have a
       | substantially better understanding of how git works.
        
       | mberning wrote:
       | Am I the only person that doesn't want to understand the inner
       | workings of my VCS in lurid detail? I don't have to know as much
       | about any other developer tool in order to use it effectively.
        
         | outworlder wrote:
         | > I don't have to know as much about any other developer tool
         | in order to use it effectively.
         | 
         | You don't? How do you debug problems?
        
           | mberning wrote:
           | I just use the debugger and it mostly works how I expect. I
           | don't have to go and study the data structures and other
           | intricacies of the debugger itself to puzzle out why it works
           | the way it does. Git is terrible in that way, as evidenced by
           | the thousands of blog posts of people trying to describe the
           | inner workings of it and how it will "make more sense" once
           | you understand it as well.
        
           | bombcar wrote:
           | Think about ZFS - ZFS works perfectly fine for me and I don't
           | have the foggiest idea how it works beyond "copy on write"
           | magic.
        
         | detaro wrote:
         | Do you feel like you need to know this to use git? What did the
         | blog post change about your use of git?
        
       | [deleted]
        
       | necovek wrote:
       | > I believe that Git becomes understandable if we peel back the
       | curtain and look at how Git stores your repository data.
       | 
       | I agree, and like many, I have been saying that for years (nay,
       | for more than a decade): and that's exactly the problem!
       | 
       | You don't need to understand how an internal combustion engine
       | works to drive a car... You don't need to understand how your
       | graphics card renders stuff to develop a web page... You don't
       | need to know how a brushless motor works to use a drill...
       | 
       | There is a pattern there, and it's the one that makes sense.
       | 
       | I've read up on the internals of git a dozen times by now. But I
       | only occasionally need to do something weird that makes me go
       | back to it, so I usually forget the relevant bits.
       | 
       | The trouble is that I've used a distributed VCS that did not ask
       | me to understand internals and it had a sane UI, and good model
       | (like tree-like commit history, so a top-level commit log would
       | only have merges, but you could dive deeper into individual
       | commits if you so pleased). It wasn't perfect, but it's hard for
       | me to accept that we have gone with a subpar solution where every
       | "tutorial" starts with how you need to understand the internals!
       | But you also need to memorise them, dammit!
       | 
       | Just like I keep forgetting the Emacs rectangle editing shortcuts
       | since I seldom use them, I'll keep forgetting the specifics of
       | git internals that I might need once every 12 months.
       | 
       | And it's not me, it's _you_, git!
        
         | mdnahas wrote:
         | Sadly, the bad part is git's user interface. It hides the
         | pretty parts underneath.
         | 
         | There is a concept of "the next commit" or, equivalently, "the
         | pending commit". In the documentation, this gets called
         | "indexed" or "cached" or "staged" --- three different names!
         | And if you want to diff with it, you can't refer to it by name.
         | You need to use an option, so it's "git diff --cached <other
         | commit>.
         | 
         | I know git's internals, mostly because it lets me navigate its
         | bad user interface.
        
       | gpspake wrote:
       | I think it's a tragedy that just about every developer uses git
       | but most learn add, commit, branch, and merge and then just stop
       | learning.
       | 
       | A lot of people are scared of rebase and cherrypick and shut down
       | or get defensive when you mention them or try to encourage their
       | use.
       | 
       | The result is, because developers only have a hammer, they brute
       | force merge everything which results in grotesque conflict
       | resolutions and commit histories and makes it hard to untangle
       | problems.
       | 
       | At a previous job, another developer was kind enough to walk
       | through rebasing on the command line with vim. I was receptive
       | and in about 10 minutes, I realized there was a significant set
       | of standard features and day to day Git use I was previously just
       | oblivious to.
       | 
       | These days, the UI for rebasing and cherry picking in Gitkraken
       | is state of the art and effortless and I use them every day
       | without hesitation and without the fear that comes from not
       | understanding or knowing what I'm doing. Still, I constantly
       | struggle with coworkers merging feature branches from 100 commits
       | ago in to new feature branches and brute force resolving
       | conflicts across half a dozen files in one commit without any
       | context.
       | 
       | I see it all because I have visibility in to the history and
       | branch relationships but I still get shrugs and eye rolls when I
       | bring it up. I don't necessarily want to dictate nitpicky git
       | usage but I have a hard time accepting when people just to refuse
       | how rebasing and cherrypicking work when they're both core basic
       | features of a tool we all use every day. Proper Git use is one of
       | those hills I'll die on, though so I don't intend to shut up
       | about it any time soon :)
       | 
       | Edit: My practical advice: If you use git every day and you don't
       | know how to rebase, reset, cherrypick, and stash from the command
       | line, make it a goal. Then, once you're comfortable, learn how to
       | do it in a visual tool like Gitkraken and make an effort to
       | incorporate them in to your daily workflow. My guess is things
       | will become a lot less tedious and confusing when things get
       | messy.
        
         | 9dev wrote:
         | Honestly, I don't really see your point. Yes, We keep our
         | commit messages as clean and descriptive as possible. Yes, if
         | we have the time, we split our commits into logical groups of
         | changes. Yes, we work on feature branches for mature projects.
         | We do all this with the git integration of IntelliJ, and I
         | don't see the slightest reason to waste any time with the
         | syntax of our version control tool! I'd gladly force everyone
         | on the team to use ,,stuff" as the single, exclusive commit
         | message, if that improved velocity (which it obviously
         | doesn't). Because all this discussion about proper git usage is
         | nothing but bike-shedding.
        
         | dr-detroit wrote:
         | NOBODY on my team learned ANYTHING about git. They stopped
         | using it on new projects its like everyone considers their
         | (shabby) work proprietary. Thank you for coming to my TED talk:
         | life is suffering
        
         | forrestthewoods wrote:
         | > I think it's a tragedy that just about every developer uses
         | git but most learn add, commit, branch, and merge and then just
         | stop learning.
         | 
         | This is because Git is too hard to use.
         | 
         | How do I know that Git is too hard to use? Because there are
         | literally thousands of blog post tutorials explaining how easy
         | Git is to learn. Things that are easy do not need thousands of
         | different guides telling you how easy it is.
        
           | JadeNB wrote:
           | > This is because Git is too hard to use.
           | 
           | > How do I know that Git is too hard to use? Because there
           | are literally thousands of blog post tutorials explaining how
           | easy Git is to learn. Things that are easy do not need
           | thousands of different guides telling you how easy it is.
           | 
           | I'm not sure that's convincing. I think that a lot of guides
           | about how easy it is indicate that it's _slightly_ difficult
           | to learn. That results in a lot of people struggling for a
           | little bit, overcoming the struggle, and feeling a sense of
           | accomplishment and enlightenment, which they then want to
           | share.
           | 
           | (There's also a difference between how hard something is to
           | _use_ and how hard it is to _learn_. I 'd argue that there's
           | often a trade-off to be made, where some sacrifice on
           | difficulty learning results in a reward in ease of use--in
           | the sense that, for example, vim is far easier to use than
           | any other editor for a seasoned vimmer.)
        
           | gkoberger wrote:
           | The counter argument would be that maybe git is so basic and
           | easy to learn that everyone feels comfortable enough with it
           | to write a tutorial?
           | 
           | I think a large amount of content is more a factor of Git's
           | ubiquity than its difficulty.
        
             | Supermancho wrote:
             | > git is so basic and easy to learn that everyone feels
             | comfortable enough with it to write a tutorial?
             | 
             | Nobody writes tutorials on how easy Lyft or Uber apps are
             | to use. Easy interfaces don't need lots and lots of
             | tutorials. That's exclusively the result of poorly designed
             | interfaces AND complicated systems.
        
               | gkoberger wrote:
               | Lyft is an app. Git is a tool for developers. I have no
               | clue how they're related.
               | 
               | That being said, I googled "How to use Lyft" and there's
               | a ton of results.
        
           | zwieback wrote:
           | My SCM journey: RCS-PVCS-cvs-VSS-MKSSI-svn-Perforce-git
           | 
           | I hate all of them but learned to use them because what's the
           | alternative?
        
           | gpspake wrote:
           | I respectfully disagree on the basis that, yes revision
           | control is hard, but git provides a relatively beautifully
           | simple api and vocabulary on top of an inherently complex and
           | absolutely necessary set of concepts.
           | 
           | When you're working on a codebase with multiple people, there
           | are going to be changes and the changes have to be
           | consolidated and the conflicts have to be resolved. I
           | believe, with a reasonable amount of time and effort,
           | developers can learn that API and vocabulary and I have yet
           | to encounter anything comparable in terms of ease of use and
           | "grok-ability" - especially with modern GUI tools.
           | 
           | git is one of the most ubiquitous and unavoidable
           | technologies in software development and it's 100% worth the
           | time and effort to understand and be good at it.
        
           | ajross wrote:
           | > This is because Git is too hard to use.
           | 
           | Git is hard to _LEARN_. It is objectively very easy to _USE_
           | for those who have learned it, so much so that the population
           | of  "I used to use git until I found ..." evangelists is
           | effectively zero. Tools like mercurial exist in the
           | marketplace of ideas mostly by peeling off users who haven't
           | yet started using git productively by promising 80% of the
           | features for 10% of the effort.
           | 
           | In fact, I don't know that there has been a new tool since
           | vim or emacs that so well illustrated this dichotomy between
           | ease of learning and ease of use.
           | 
           | But to be honest: it really is needlessly hard to learn. The
           | content of the linked article is that git is built on an
           | extremely simple foundation of data structures and operations
           | that anyone can understand. But the takeaway from the article
           | is that _no one does understand it_ , because that layer is
           | hidden behind a facade of tools that completely obscure it.
           | Where are the "blobs" in git reset? What is the "index"? Is
           | it a "tree" (it's not, IIRC)? I definitely agree with people
           | who complain about the porcelain layer's design. But I still
           | use git every day and love it.
        
           | morsch wrote:
           | You have evidence that git is hard to use, but no evidence
           | establishing that it's _too_ hard. Some things just aren 't
           | easy.
        
         | outworlder wrote:
         | > Edit: My practical advice: If you use git every day and you
         | don't know how to rebase, reset, cherrypick, and stash from the
         | command line, make it a goal. Then, once you're comfortable,
         | learn how to do it in a visual tool like Gitkraken and make an
         | effort to incorporate them in to your daily workflow. My guess
         | is things will become a lot less tedious and confusing when
         | things get messy.
         | 
         | I would add git bisect to the list. It's incredibly useful (if
         | your codebase is sane).
        
           | trulyme wrote:
           | I read some description ob what that is and it looks like
           | checking out different commits (via bisection) until you
           | figure out where in the history some change happened. Is
           | there some other benefit I am missing?
        
             | iudqnolq wrote:
             | It automatically does a binary search, and you can use it
             | completely automatically if you can write a script that
             | determines if the bug was present.
             | 
             | The other day I used it to write a good bug report. I first
             | used it to find the earliest commit I could compile on my
             | machine, then I used it to find the commit where a certain
             | command would fail.
        
             | jfengel wrote:
             | It's not about finding a particular change, it's about
             | which change broke something.
             | 
             | In the best cases, it's totally automatic. You know that it
             | worked at commit A and is broken by commit Z. So it checks
             | out commit M and runs the tests. If they succeed, then it
             | broke somewhere between M and Z. If they fail, then it
             | broke somewhere between A and M. So it checks out either H
             | or S, depending, and repeat.
             | 
             | It's not always that easy, especially when your tests and
             | environment are complicated. There's often manual
             | intervention, which is tedious. Still, log2 N steps is
             | often manageable, especially if the computer is taking care
             | of the tracking for you.
        
               | trulyme wrote:
               | Thank you (and the sibling) for the explanation, sounds
               | useful!
        
         | zwieback wrote:
         | True, but you could also argue the opposite view that it's a
         | sign of git's usability that beginners can get by with just
         | those commands. The problem doesn't crop up until those lazy
         | users start doing things that make the repo messy.
        
           | phailhaus wrote:
           | That's honestly the opposite of good design. It's hiding the
           | complexity to make it seem easy for beginners, then slamming
           | them with inscrutable error messages when they "don't use it
           | right." It leads to a system with a deceptively gentle
           | learning curve that requires you to suddenly learn everything
           | all at once when you hit an issue.
        
             | zwieback wrote:
             | Good point. I'm not sure which side I'm on, to me git feels
             | like a good core with an atrocious UI, even after years of
             | use I have to look up whether to use this or that option
             | and whether it's uppercase or lowercase, do I use "--" or
             | ":" and on and on.
             | 
             | Mind you, I'm not complaining, most utilities I've written
             | for my internal users are worse! It's when something gets
             | out and used by the masses that you wish you had had the
             | time to put together a coherent user interface.
        
           | redisman wrote:
           | It's really just using the old generic version control
           | commands everyones used to.
        
         | phendrenad2 wrote:
         | I know how to rebase, reset, cherry-pick, stash, reflog,
         | assume-unchanged, and many other advanced techniques.
         | 
         | I still prefer to add/commit/branch/merge. I often copy-paste
         | changes into a new branch, just because I don't enjoy recalling
         | arcane commands from memory or googling them for the umpteenth
         | time.
         | 
         | I suspect that git is a leaky abstraction that doesn't fit the
         | corporate software development workflow. I think that git is a
         | hammer and non-distributed development is the screw we're
         | hitting with it.
        
         | robaato wrote:
         | How do you track which cherry picks have been done?
        
         | chriswarbo wrote:
         | I avoid rebase because it rewrites history, and I'd rather have
         | an accurate log of what happened. Cherry-pick can certainly be
         | useful for grabbing particular _commits_ ; although I find
         | myself more often using `format-patch`+`am` to grab particular
         | _files_ (which also works across repos).
        
           | iudqnolq wrote:
           | I never understood the "rewrites history" argument. The
           | original commits don't necessarily faithfully represent my
           | thought process, I might have a larger commit because I got
           | distracted in the middle of the day or a shorter commit
           | because I wanted to make sure my code was backed up at the
           | end of the day.
        
           | bentcorner wrote:
           | I use rebase often _because_ it rewrites history - it lets me
           | squash commits into a conceptual single commit, or re-order
           | commits together that chronologically make more sense next to
           | each other.
        
         | secondcoming wrote:
         | We've never had an issue with git merge'ing all over. I've
         | never looked at our git graph because I've never had to.
         | 
         | Maybe rebase is a tool to help poor software development
         | practices? (and your colleagues letting branches go stale is
         | one of those)
        
           | Sn0wCoder wrote:
           | I agree if you merge early and often everything goes smooth
           | and if you have a stale branch your doing something else
           | wrong. At least for the work I do, web dev. Some of our
           | pipeline folks are working under a different paradigm and
           | they run into these problems often but also understand how to
           | rebase and cherry pick. As always just depends,
        
           | kazinator wrote:
           | > _help poor software development practices_
           | 
           | Says someone who's "never looked at our git graph".
        
             | secondcoming wrote:
             | Entertain me then, why do you need to look at it?
        
         | f154hfds wrote:
         | Be safe out there everyone. Squash with caution, don't force
         | push master. And remember, reflog is always there to help if
         | you get into trouble.
        
         | kgwgk wrote:
         | > about every developer uses git but most learn add, commit,
         | branch, and merge and then just stop learning.
         | 
         | What are branch and merge? /j
        
         | bluedino wrote:
         | Git has over-complicated source control for the majority of
         | developers. Things were much simpler with svn.
        
           | MereInterest wrote:
           | Svn's complete inability to handle merges at a file level may
           | have been simpler, but was by no means better. Needing to
           | coordinate who is allowed to edit each file in order to avoid
           | a painful merge is common with svn, and nearly unheard of
           | with git. Svn looks at the inherent complexity of multiple
           | people working on the same code base, shrugs, and figures
           | that it is somebody else's problem.
        
           | kevin_thibedeau wrote:
           | SVN is a versioned storage engine pretending to be source
           | control. Branching and tagging? We'll just expect everyone to
           | obey an implied policy in our filesystem tree. Oh you think
           | you want to merge these things that have a clear ancestor in
           | the DAG? Not so fast buddy.
        
             | chipotle_coyote wrote:
             | IIRC, merging in SVN is... basically something you never
             | wanted to do. :)
        
           | tehbeard wrote:
           | > Things were much simpler with svn.
           | 
           | Not from my experience as a newbie with SVN at the start of
           | uni and in the beginning at $job.
           | 
           | In both cases, it was temperamental, prone to network issues
           | (this was in both student accom. -> uni server, and LAN at
           | $job) and did not like users working on the same files.
           | 
           | Git took some learning, and it took reading Git Magic
           | <http://www-cs-students.stanford.edu/~blynn/gitmagic/> to go
           | from <https://xkcd.com/1597/> to the friend mentioned in the
           | alt text.
           | 
           | SVN still feels like I'm pulling teeth all these years later.
        
         | ravenstine wrote:
         | Rebasing and cherry-picking are awesome tools once you know how
         | what they're actually doing. I think people avoid rebase for a
         | few reasons; the term "rebase" doesn't mean anything outside of
         | Git so it's not obvious what it is doing under the hood, and
         | inexperienced Git users might use it to change the history on
         | the main branch, which I see as an antipattern.
         | 
         | There's nothing inherently wrong with merging, but I personally
         | don't like it because I find merge commits harder to understand
         | than regular commits. Better to use things like rebasing and
         | cherry-picking to move commits arbitrarily and then squash some
         | commits into units of work that make sense.
         | 
         | Stash is crappy though, IMO, because it's not branch-specific.
         | Instead of stash, I like to fork the branch I'm working on and
         | create a "WIP" commit. That way I don't lose track of work I
         | had in progress that only belongs in a certain branch.
        
         | breck wrote:
         | Agreed. Git should be treated as a deep skill, as important to
         | practice and train with as unit testing and regular
         | expressions.
         | 
         | Think of your git history as a product and art form in itself.
         | If you don't enjoy writing your commit history, readers will
         | not enjoy reading it.
         | 
         | On a tactical level, I highly recommend buying Sublime Merge 2.
        
           | blacktriangle wrote:
           | Alternative interpretation: git is a terrible terrible tool.
           | It solve's Linus' problem, but he also wrote the damn thing.
           | Had GitHub not entered the scene we'd likely all be using
           | something else, maybe even SVN still.
           | 
           | Maybe distributed source control really is this complicated
           | and treating git as a deep skill is justified, but having
           | also used Darcs and Mercurial I have a hard time believing
           | that git's usability issues are inherent complexity and are
           | in fact an artifact of git itself.
        
           | codyb wrote:
           | Regular expressions? I really enjoy using them and futzing
           | around but I encourage the people I work with to stay away
           | and to avoid using in production when possible.
        
             | trulyme wrote:
             | Whoa, why? That sounds like an awful advice to me. By all
             | means, use regexes, but make sure you understand the theory
             | behind them (state machines) so you will know not to parse
             | HTML with them. They are really pretty easy to do right
             | once you grasp the concepts.
        
               | breck wrote:
               | You can write correct ad hoc regex parsers for many
               | subsets of HTML depending upon your needs.
        
             | turminal wrote:
             | Regular expressions are useful as tools for searching
             | through code, filtering logs, searching through the
             | filesystem..., even if you never commit them into your
             | codebase.
        
               | breck wrote:
               | Yes, good clarification. I probably write 10x (maybe
               | 100x? more?) more throwaway regexes than regexes I
               | commit.
        
           | chipotle_coyote wrote:
           | I found Sublime Merge -- although it may have been version 1
           | when I tried it! -- very unintuitive and fiddly, a lot like
           | using Vim for doing three-way merges. Definitely one of those
           | "YMMV" kinds of tools. (I mean, I'm sure Vim is terrific at
           | it once you get the hang of it.)
           | 
           | Personally, I've settled on
           | 
           | * Getting pretty familiar with the git command line
           | 
           | * Using a decent GUI diff and three-way merge tool (I use
           | Kaleidoscope)
           | 
           | * Using GitUp, an open source Mac git client, on occasions
           | where I want to get kind of arcane: committing individual
           | lines of files in separate commits, re-ordering commits (on
           | an unmerged feature branch because I'm not a complete
           | monster), etc.
           | 
           | I suspect having already discovered GitUp is a good chunk of
           | why I didn't get into Sublime Merge; it can do a lot of
           | advanced stuff in ways I personally find easier to grok.
        
         | Vinnl wrote:
         | It's also one of the few tools that is likely to be a constant
         | factor in your job for a long time. Yes, it's not super easy,
         | and you can sort of get by with minimal effort, but it's not
         | that much time to invest compared to how much benefit you'll
         | reap.
         | 
         | And I don't mean memorising commands and their arguments, but
         | rather _understanding_ Git from first principles.
         | 
         | (I wrote this visual tutorial for that purpose, takes about
         | fifteen minutes to go through :
         | https://agripongit.vincenttunru.com/)
        
           | bombcar wrote:
           | This looks nice but scrolling on Safari on Big Sur is causing
           | weird artifacts. maybe if I scroll really slow
        
         | gregmac wrote:
         | > My practical advice: If you use git every day and you don't
         | know how to rebase, reset, cherrypick, and stash from the
         | command line, make it a goal. Then, once you're comfortable,
         | learn how to do it in a visual tool like Gitkraken and make an
         | effort to incorporate them in to your daily workflow.
         | 
         | I do agree you should learn rebase, reset, cherrypick and
         | stash, but I don't agree that you need to learn on the CLI. I
         | mean, use the CLI if you prefer that, but the git GUIs are
         | perfectly adequate for performing any of these operations.
         | 
         | I used to use git CLI heavily, but in the past few years I have
         | simply not needed to, to the point that aside from a small
         | handful of the most common operations I don't even remember a
         | lot of it anymore. Partly this is due to maturity of the GUIs,
         | and partly because old practices like SSHing to a dev server to
         | edit+commit something there are just totally obsolete and
         | unnecessary these days.
         | 
         | Even for a simple commit, there's a massive convenience of
         | seeing the timeline and being able to interactively stage and
         | look at diffs that is just miles ahead of CLI, and lets me
         | break down commits into better units of work and write better
         | messages.
         | 
         | There's a stupid gatekeeping thing some developers still do
         | about git CLI, I don't get it. It's as valid as dictating what
         | text editors, color schemes, input devices, or OSes "real
         | developers" use. Judge people on their output, not their tools.
        
           | trulyme wrote:
           | I don't force my choice on others, but I'm firmly in "git
           | cli" camp. The reason is simple - cli is available
           | everywhere. I'm sure GitKraken & co. are great once you get
           | used to them, but apart from GitLab graph view (which GitHub
           | sadly lacks, and cli tool afaik also) I don't miss anything.
           | But again, this is just my personal preference and I agree
           | that developers should be free to use gui if they prefer it.
        
             | gregmac wrote:
             | > cli is available everywhere
             | 
             | I guess that's part of what I was getting at though. Where
             | are you doing your development that isn't your workstation?
             | 
             | I work on a whole bunch of different things -- from
             | personal stuff running on my laptop or VMs in my house to
             | cloud services deployed across dozens of AWS services --
             | and things have just got to a point where I have no need to
             | do a commit anywhere but my workstation. (Well,
             | technically, I have two: one personal, one work).
             | 
             | I definitely used to do it years ago, but now I don't
             | remember the last time I had to use git on a remote system.
        
               | trulyme wrote:
               | I use two at the moment but they change over time, along
               | with the OS. I am not talking about developing remotely,
               | though it does happen that I use git push to deploy for
               | some smaller projects. But I could have used GUI for that
               | too. I guess I just don't want to get used to a tool I
               | won't be able to keep using.
        
         | leokennis wrote:
         | I think the issue here is that using git is not a goal unto
         | itself. Git is a tool/system that should get out of your way as
         | much as possible. Instead it has arcane commands and options
         | making anything but the most basic operations Shakespeare
         | novels on the command line.
         | 
         | My goal is to have my code in the repo. So if git starts being
         | a pain, it's much easier to store my edits locally, pull a
         | fresh copy of the repo, copy over my edits again and commit +
         | push.
         | 
         | If you have a good cook, let him/her cook dishes and let
         | someone else care about sharpening the knives and cleaning the
         | dishes.
         | 
         | If you have a brilliant programmer, let him/her write good
         | code. Don't bother them with understanding binary trees and
         | hashes of snapshots of diffs of local repo's of pointers of
         | objects in a blob graph lalalalala.
        
         | fshbbdssbbgdd wrote:
         | My usual workflow is to frequently merge master into my feature
         | branch during development, then I squash before merging back to
         | master. As far as I can tell, this gives a clean history so
         | shouldn't bother the rebase fans (who prioritize a simple
         | commit graph), and shouldn't bother bisect fans (who get
         | confused by fake historical commits). Is there something bad
         | about this approach?
        
         | mschuster91 wrote:
         | > A lot of people are scared of rebase and cherrypick and shut
         | down or get defensive when you mention them or try to encourage
         | their use.
         | 
         | Because a _lot_ of people have been burned and way too many
         | hours been lost due to a rebase gone wrong. Cherry-pick and
         | stash are trivial operations, reset (outside of  "undo a git
         | add") and especially rebase are not.
         | 
         | The learning curve for both is _steep_ , the potential for
         | failure extremely high, so I understand why organizations go as
         | far as entirely banning rebase.
        
           | [deleted]
        
           | zo1 wrote:
           | People are scared of rebase because of the constant scare-
           | mongering around it. "Rebase is Evil" and "Never use Rebase",
           | etc. Then we end up with junior devs that are too-scared to
           | even use their git IDE's built-in rebase-remote branch onto
           | remote so they end up littering the entire repo with "Merged
           | branch-A from origin into branch-A".
           | 
           | It's so bad that even seasoned developers that haven't delve
           | deeply into git have no idea that this sort of rebase is
           | practically harmless. Instead they parrot "Rebase is Evil"
           | without thinking twice.
        
         | Stratoscope wrote:
         | What puzzles me is how resistant many developers are to using
         | or even considering a Git GUI. I prefer SmartGit, but GitKraken
         | is nice too.
         | 
         | People tell me, "I'm so much more productive on the command
         | line" and then it turns out all they know is pull/commit/push
         | and using a local branch. Anything outside that brings terror:
         | "I never use rebase. What if something goes wrong in the
         | rebase? Now I've lost all my work and I have to pull a fresh
         | copy of the repo from scratch."
         | 
         | Yes, I have heard exactly that.
         | 
         | One thing I love about SmartGit is how it unifies features that
         | the Git command line presents as separate and unrelated
         | concepts. The reflog? Click the Recyclable Commits checkbox and
         | now all of your reflog commits show up as ordinary commits just
         | like any other.
         | 
         | Stashes? Same thing. Turn on the checkbox to make a stash or
         | all stashes visible and now they show up as ordinary commits,
         | which is all they are under the hood.
         | 
         | Want a diff between two commits, whether they be normal commits
         | or stash or reflog commits? Click one commit, ctrl+click the
         | other, and you instantly see the differences between the two.
         | No need to check out a reflog commit temporarily just to have a
         | look at it.
         | 
         | Yet I have only had a 5-10% success rate in getting anyone to
         | take a look at any Git GUI, much less using one. I would be
         | really interested in understanding why so many developers are
         | reluctant to doing anything other than the Git command line.
        
           | kevincox wrote:
           | I agree with your assessment. I think GUIs are great things
           | that you don't do often enough to memorize (or for things
           | that are inherintly visual, but not relevant here) but they
           | are often looked down on.
           | 
           | There are many people who do enough git to know how it works
           | well and be familiar enough with enough of the commands that
           | they don't need a GUI and are likely faster on the command
           | line. But for every one of those people there are at least 2
           | who would work faster, and more accurately with a good GUI.
        
         | Solvitieg wrote:
         | In practice, rebasing increases conflicts, requires teams to
         | time their merges, and obfuscates the history of the project.
         | 
         | I never understood why people think this is a good pattern.
        
           | mekkkkkk wrote:
           | This. I don't know if it's because of a lack of understanding
           | or bad workflow, but almost all of the times I've seen bugs
           | caused by git operations to slip through the cracks, it's
           | been because someone decided that a pretty history was of
           | upmost priority. Rebasing is probably necessary in some
           | cases, but it can be a real foot gun as well.
        
           | Espressosaurus wrote:
           | In a large repo with many people merging, it helps keep
           | things organized.
           | 
           | In my experience you can make an argument for a merge-based
           | workflow up to around 6 people. By 12 it's painful and hard
           | to track what's going on, doubly so when you have a dev
           | branch and multiple sustaining branches or something more
           | complicated.
           | 
           | By the time you get to 100 people or more committing to the
           | same repo, it just becomes absolute chaos, and at least you
           | can maintain a semblance of sanity in your official branches
           | by forcing a rebase-based workflow on them.
        
           | TheLocehiliosan wrote:
           | What you need to understand is people use rebase on their
           | unshared branches. It's part of crafting your commit history
           | to be a coherent set of atomic changes instead of the path
           | you took while developing it all.
           | 
           | You rebase BEFORE you merge into the mainline branch.
        
             | fshbbdssbbgdd wrote:
             | Do you run your test suite against each of the commits you
             | create when rebasing? If not, isn't this "coherent set of
             | atomic changes" misleading? It seems like a lot of effort
             | to make a fake clean-looking history.
        
               | xoudini wrote:
               | There shouldn't be an issue in doing so. During a rebase
               | you'll either have no conflicts -- in which case there
               | isn't an issue -- or you'll have to stop to resolve
               | conflicts, and you might as well run tests before
               | continuing the rebase. In both cases I'd argue that the
               | statement "coherent set of atomic changes" applies.
        
               | fshbbdssbbgdd wrote:
               | Correct me if I'm missing something here - but a lack of
               | conflicts during rebase only means that the few lines
               | surrounding your changes weren't changed in the upstream.
               | The rest of the repo changed, and this will often cause
               | some kind of inconsistent state. I've encountered this
               | situation frequently when using git bisect.
        
               | xoudini wrote:
               | When you rebase, you basically replay the history of your
               | branch since it diverged from the branch you're rebasing
               | onto. Thus, the branch is always in a consistent state
               | (or equally consistent to when you originally authored
               | the commit you're replaying). And of course this assumes
               | the target branch is already in a consistent state.
        
               | fshbbdssbbgdd wrote:
               | If the upstream is like this:
               | 
               | A -> B
               | 
               | And you branch off B and start making changes, then the
               | upstream continues on its own:
               | 
               | A -> B -> C -> D
               | 
               | Now you rebase your dev branch off D. Your changes get
               | replayed on top of D and create new commits. Some of
               | those commits might not be valid, because they take code
               | that worked in the context of B and put it in the context
               | of D. The history seems clean if all you do is look at
               | the diffs, but if you bisect and try to use the repo in
               | one of the rewritten commits, you may find it doesn't
               | even compile (even if that commit was fully functional
               | before rebasing).
        
               | xoudini wrote:
               | Hm, you're right. The simplest example I could think of
               | right now is the upstream having renamed/deleted
               | something that the dev branch depends on, but didn't
               | directly touch. That would definitely cause a "broken"
               | history during the rebased commits, and is technically
               | unavoidable.
        
               | breischl wrote:
               | When I've done this, your private/dev branch may be a
               | series of broken commits. Then you rebase onto main,
               | squash to one commit, and test (if necessary). So what
               | shows up on the main branch is a single, squashed, tested
               | commit that contains one logical unit of code (usually a
               | feature or fix).
               | 
               | In this model the main branch history is "real" in that
               | it records the sequence of changes to the production
               | code. It's "fake" in that it doesn't record the exact
               | sequence of fumbling steps and backtracks you took to get
               | there. But IME the latter is usually not very useful
               | anyway.
        
               | xoudini wrote:
               | In some cases I agree, but squashes can end up so large
               | that doing a `git bisect` (which is quite useful in
               | finding the comparatively small commit which introduced a
               | bug) becomes unfeasible.
        
               | fshbbdssbbgdd wrote:
               | I like the squashed commit approach. I get there by
               | merging upstream into my dev branch when developing, then
               | squashing before I merge my changes into the upstream. As
               | far as I can tell, that has the same outcome as rebase
               | with squash. Both approaches create a simple commit
               | graph, and both avoid fake intermediate commits.
        
             | Solvitieg wrote:
             | For sure, I understand that.
             | 
             | It tends not to be an issue when a developer is working on
             | an isolated feature that only he or she cares about, that
             | is reviewed in a timely matter, and gets directly committed
             | to main.
             | 
             | Often this is not the case.
        
         | greggman3 wrote:
         | I don't have a solution and maybe the problem is just not
         | solvable but ...
         | 
         | The tragedy is that git is so hard to learn. Start a github
         | project (I know github is not git). Take a PR, have the PR have
         | a conflict, now, try to explain to the new user how they can
         | fix their PR via git to not conflict. You'll be stuck giving
         | them a giant lesson, probably an hour to write the
         | instructions, then several back and forths.
         | 
         | Mostly, either they already know git and fix it themselves OR I
         | give up and merge it by hand myself since it's easier than
         | becoming a git teacher for them.
        
           | outworlder wrote:
           | > Take a PR, have the PR have a conflict, now, try to explain
           | to the new user how they can fix their PR via git to not
           | conflict.
           | 
           | Is this a Git problem? I recall entire workdays being wasted
           | on SVN and CVS back in the day with multiple people trying to
           | make sense of a merge.
           | 
           | In Git this is actually easier to do (and easier to do
           | repeatedly, with git rerere and similar).
        
             | iudqnolq wrote:
             | It's a problem, and the place to fix it is in git. That
             | makes it a git problem. Just because things were worse back
             | in the day doesn't mean we can't have nice things.
        
         | wruza wrote:
         | _I think it 's a tragedy that just about every developer uses
         | git but most learn add, commit, branch, and merge and then just
         | stop learning._
         | 
         | This implies that they think in a wrong way and not have a
         | wrong tool. A real tragedy is that git took over the world (in
         | minds of lovers of shiny-new things and in saas) without most
         | of the world realizing that they don't even need it, because
         | they wouldn't even like to think in its way. The world wanted
         | quick subversion and instead got this in-all-regards UX
         | monster.
        
       ___________________________________________________________________
       (page generated 2021-04-08 23:00 UTC)