[HN Gopher] Commits are snapshots not diffs (2020)
___________________________________________________________________
Commits are snapshots not diffs (2020)
Author : warpech
Score : 258 points
Date : 2021-04-08 18:02 UTC (4 hours ago)
(HTM) web link (github.blog)
(TXT) w3m dump (github.blog)
| aarchi wrote:
| Whereas in Pijul and Darcs, commits (called patches) are diffs,
| not snapshots. They are based on a sound theory of patches, which
| allows for operations not supported by Git like commuting, as
| long as the commits aren't interdependent. Plus, language-
| specific tools can extend the notion of dependency from line-
| based to semantic.
| luhn wrote:
| > They are based on a sound theory of patches, which allows for
| operations not supported by Git like commuting, as long as the
| commits aren't interdependent.
|
| This is definitely supported by git. Even though commits may
| technically be snapshots, you can build a diff from snapshots
| (and vice versa). `git diff` will get you the diff for any
| given commit, and `git rebase` will happily reorder commits for
| you by reapplying the diffs.
| rnhmjoj wrote:
| When reading a bit about Pijul, a few months back, I had
| assumed _every_ two patches would commute, and I couldn 't
| image how that could possibly work.
|
| Does it really have this limitation? If so, it doesn't look
| much of an improvement compared to git: I can shuffle "patches"
| all right using `git rebase -i`. I conceide it can be quite
| slow, though.
| dan-robertson wrote:
| So not every patch can commute with every other patch:
| "delete foo" doesn't make sense until after "add foo" has
| happened. So patches have dependencies that they must come
| after, but for lots of vc situations, patches are
| independent. Sets of patches makes rebasing a branch trivial
| for example because adding the patches from the master
| _after_ your patches is equivalent to adding them _before_.
| If you would get a merge conflict, you get the same merge
| conflict whether they are added before or after.
|
| But nailing down the logic behind commuting patches can be
| important too as it can catch subtle problems that might
| happen with normal snapshot-based merging. Consider some
| people independently editing branches Bob
| adds a file with line "foo" Alice pulls Bob's patch
| Bob changes "foo" to "bar" Alice changes "foo" to "bar"
| Bob changes "bar" back to "foo"
|
| In Pijul or Darcs you should get a consistent result pulling
| changes from Bob and Alice no matter what order you do it.
| But if you use something like git, the order you pull and
| merge, and if you do it at any intermediate times, might
| change the resulting snapshot (as well as just the history).
| The start and end state of Bob's repo loom the same as
| snapshots but they are different because Bob changed his mind
| about the line "bar"--maybe the change didn't work.
| tsimionescu wrote:
| It's nice to understand this, but I fail to see it helping much
| in practice. Sure, you'll know why the thing you want to do is
| hard for git to do, but that wont make it much easier.
|
| And without knowing even further implementation details, it's a
| bad idea to rely on this knowledge. For example, the article
| states that committing a rename separately from edits in the
| renames files helps git track the renames. But that's not
| obviously true from the discussion above, because it's not
| obvious if, when computing a diff between two commits, git will
| follow the entire history or just apply the diff algorithm on the
| two commits.
|
| If it were the latter, then it doesn't really matter which order
| you commit things in, git would simply see commit1: fileA, fileB
| with contents cA and cB; commit2: fileD, fileE with contents cD
| and cE, and would do the quadratic work anyway, even if commit1.5
| had fileE, fileD with contents cA, cB.
| [deleted]
| [deleted]
| Tomminn wrote:
| Great article but:
|
| "one of my favorite analogies is to think of commits as having a
| wave/partical duality.."
|
| is a hilariously misguided object to build an analogy from.
| Theoretical physicist checking in, and my community has been
| searching for about 100 years for an analogy to explain that
| shit, so it's hilarious to see someone try to use it as a
| concrete object people can use as a touchstone to better
| understand a purely classical database.
| bombcar wrote:
| Read it as "think of commits as <unintelligible bullshit you
| have to take on faith because nobody really understands it>"
| fraculus wrote:
| I think merge commits are key to why "snapshots" are a better
| model than "diffs", and a stronger arguments would emphasize this
| more.
|
| Like people have said, the two models:
|
| - a commits is a snapshot plus a pointer to a parent commit
|
| - a commits is a pointer to a parent commit plus a diff
|
| are sort of isomorphic. And some commands in the git porcelain
| (like git cherry-pick, or git rebase) indeed make more sense if
| you think of commits as diffs.
|
| But this isomorphism becomes really strained when you have
| commits with more than one parent (or even zero parents). (And I
| think it's telling that those commands don't play very nicely
| with merge commits or the root commit.)
|
| If you really want to incorporate merge commits and the root
| commit, the alternatives become:
|
| - a commit is a snapshot, together with a list of zero or more
| pointers to parent commits
|
| - a commit is a list of M >= 0 pointers to parent commits,
| together with N > 0 diffs, subject to the invariant that:
|
| a) M = N, except that for exactly one commit, which we will call
| the "root" we are allowed to have M = 0 but N = 1
|
| b) starting from any commit, if you traverse a path back to the
| root commit by following parent pointers, and then sequentially
| (in reverse order) apply, for each commit in the path, the diff
| that corresponds to the parent pointer chosen, then the result of
| composing all those diffs is independent of the path chosen.
|
| And when you put it like that, it's pretty clear that the "diffs"
| model is really impractical, and that's why it's a lot better to
| think of commits as snapshots.
| slumpt_ wrote:
| Most developers think of commits as diffs and they can for all
| intents and purposes be thought of as such. It's actually best
| for the understanding of how to practically get things done to
| think of them in this way.
|
| Odd semantic argument to make.
| divbzero wrote:
| This is a good overview of Git internals. If this stuff interests
| you, Chapter 10 of _Pro Git_ offers similar descriptions of Git
| objects [1] and Git references [2], and then continues onto Git
| packfiles [3] which are not covered by OP.
|
| [1]: https://git-scm.com/book/en/v2/Git-Internals-Git-Objects
|
| [2]: https://git-scm.com/book/en/v2/Git-Internals-Git-References
|
| [3]: https://git-scm.com/book/en/v2/Git-Internals-Packfiles
| aequitas wrote:
| This article goes into a little too much detail imho. I have had
| great success explaining Git to coworkers using post-its,
| permanent marker and a flip board (no computer!) and going
| through the steps Git would take (abstractly, not exactly) when
| performing certain commands. All commits (and their relations)
| are written down on the board with the marker because they don't
| change (eg: rebasing just creates a new line of commits). The
| branches are written down on post-its and can move around (like
| this article explains, they are just pointers). You can use a
| whiteboard with non-permanent marking for the working directory
| and index if you want to go that deep.
| grawprog wrote:
| >Commits are snapshots....commits are diffs....
|
| Neither model really encompasses commits for me.
|
| I prefer...
|
| Commits are a point in history I can return to after I inevitably
| fuck up or look back on so I can convince myself, yes I am indeed
| making progress.
| davesque wrote:
| Neat overview of some of the core concepts in Git that often go
| unnoticed. Although I'll say that the fact that commits are
| technically not diffs doesn't seem to matter much in day to day
| use. Git does a decent job of abstracting that detail away to the
| point that you could just as well believe commits are diffs.
| Also, I want to say that technically I believe Git _does_ use
| deltas to compress an object 's history in the blob store. But
| the different blobs that comprise an object's history can be
| thought of somewhat as being separate. Git could just as easily
| not perform this internal, space-saving optimization and things
| would all work the same. The SHA hashes would be the same and
| based on the same input.
| breck wrote:
| I think this is incorrect, no?
|
| Can't all commits be turned into patches? Thus, aren't commits
| isomorphic to diffs?
| lann wrote:
| > aren't commits isomorphic to diffs?
|
| Nearly, though renames are only approximately extracted from
| the snapshots.
| jepler wrote:
| If your VC isn't plumbed all the way into your editor, you
| can't tell changing a typo from deleting and re-typing the
| whole file when it comes time to create a delta.
| lann wrote:
| There is a `git mv` command that means "rename". Git
| _could_ (even with its current data model) explicitly
| annotate commits with this intent, but doesn 't. I don't
| know how useful that would be compared to the current
| heuristics, but it does mean that git commit "snapshots"
| are not (quite) isomorphic to a diff format (like posix
| diff's) that can explicitly encode renames.
| tomtomtom777 wrote:
| Consider a small commit with spelling error. If I turn this
| into a patch and apply it to another branch it will be a
| _different_ commit even it will be the same patch.
|
| As such, the concept of a "commit" in Git refers to a complete
| state of everything; a snapshot.
| jepler wrote:
| Yes,-ish. However, there's also the question of what operations
| are efficient. (Diff feels very performant in git but) maybe
| having the diffs as the first-class objects enables doing
| something efficiently that git doesn't do. (perhaps,
| identifying when identical patches have occurred in different
| portions of the history?)
|
| I've used several patch-based VCs (RCS and CVS) but I think
| they pre-date this "sound theory of patches" and instead the
| use of patch-style representation was for optimizing storage.
| (just as git uses packs and deltas to optimize storage and
| performance, f'rinstance) So I don't really know what I'm
| missing.
|
| (If the sound theory of patches would let me better understand
| what occurred at a merge commit than git's tooling, that'd be
| just about enough to sell me on switching. except for the
| network effects of git & github.)
| johntb86 wrote:
| That's true, but git doesn't natively have a way to refer to a
| single diff. You can use a hash to refer to a commit, but that
| depends on the entire history up to that point. If you rebase a
| commit then the hash changes, even if the new commit is
| semantically the same.
| jdoliner wrote:
| It kinda does, `git show <commit>` does exactly what one
| would expect if commits were actually diffs. That is it shows
| the diff between <commit> and its parent.
| jdoliner wrote:
| It's technically correct. The key thing here is that it's
| isomorphic. You can either have a system of commits in which
| diffs are computed between commits or a system of diffs in
| which commits are computed by applying diffs. The trade-offs
| are in performance of various operations, not in the user
| exposed semantics. Git chooses to have its first class object
| be commits and diffs are computed on the fly. So again it's
| technically correct, but in practice commands like cherry-pick,
| which treats a commit like a diff between that commit and its
| parent really blur the line. I think in reality you can be a
| really advanced git user and not even realize that there's a
| difference between a commit based and diff base version control
| system, because in practice there really isn't much of a
| difference.
| divbzero wrote:
| If you inspect the files in the .git directory, you'll find
| commits stored as trees of directory and file objects. But it
| is true that they can be converted diffs on the fly, which is
| exactly what the git show command does.
| maweki wrote:
| I think we're running into a naming issue here. It's usefult to
| think of a single commit in itself as a diff. The DAG is a useful
| model for an accumulation of changes. The question is, what
| changes and operations make up a node in the DAG (i.e. what code
| is in this branch, compared to that? What code do they have in
| common)?
|
| To answer this: take the node and follow along the predecessor
| until you get one (or more) roots. All commits along the root are
| contained in the commit at hand. That's the history.
|
| Adding changes is, I think, the most useful mental model, even if
| it is not the implementation.
|
| Now what the author is saying is: A commit is not only the diff,
| but also the whole tree/history that the diff is based on. And
| that is also true and then the commit (the adding plus the past)
| is a snapshot.
|
| Do we have a good naming convention for the single node in the
| tree with its changes, compared to the single node in the tree
| with its changes AND the references to the parents with all their
| changes etc.?
| derriz wrote:
| > It's usefult to think of a single commit in itself as a diff.
|
| Except if it has multiple parents like a merge commit.
|
| Actually I don't agree even in general. It took me an
| unreasonably long time to become unafraid of git because I
| clung to the common VCS mental where commits were actually
| diffs.
| maweki wrote:
| But the snapshot-model also doesn't really make a lot of
| sense for merges. It's a snapshot of what then? A merge of
| all parent trees? What's a merge of two files then? Defining
| this merge-operation on trees is at least as mentally taxing
| as the alternative.
|
| Accumulating all the diffs from two (or more) ends (until
| they are common again) is at least as useful.
| cesarb wrote:
| > But the snapshot-model also doesn't really make a lot of
| sense for merges. It's a snapshot of what then?
|
| It's a snapshot of the final result.
|
| That's the beauty of the "commit as snapshot" model: each
| commit always contains the final result of the commit. It
| doesn't matter if the commit is a normal commit with a
| single parent, a merge commit with multiple parents, or
| even an initial commit with zero parents. It doesn't matter
| if the parent commits are unavailable (shallow
| repositories). It doesn't matter if the parent commits have
| been changed (grafts).
| derriz wrote:
| For me, a merge commit in git is just a snapshot like any
| other except that its metadata contains links to more than
| one parent.
|
| The parent child relationship acts as nothing more than
| remark that the child was derived from both parents in some
| way.
|
| Of course, commonly the child is derived by finding the
| most recent common parent, using heuristics to guess file
| identities after any renaming and then performing a 3-way
| line-based diff between what it thinks are corresponding
| files.
|
| But actually git doesn't really care - it's just another
| snapshot you've created and added to the DAG.
|
| I haven't found it helpful to think of what's going on in
| git in terms of an "accumulated file diffs" abstraction
| because git has no notion of file identity (across
| commits).
| viraptor wrote:
| You can have a diff to multiple parents - you get multiple
| status columns then. Similar to what you see in the diff in
| merge issues.
| zwieback wrote:
| Cherry-pick is what messes up the commit-as-snapshot idea for me.
| If I see a small commit that I feel I can merge into my branch
| then that commit feels like a diff and I don't want to care about
| the rest of the stuff that commit snapshots. I guess that's a
| good thing.
| SamBam wrote:
| I tend to agree.
|
| I am not someone who has a deep understanding of the inner-
| workings of git by any means, yet I am perfectly comfortable
| with rebasing and cherry-picking.
|
| For me, git is so much easier to intuit if I only think of it
| as diffs. When I rebase, I'm just rearranging diffs, or
| squashing them together, or whatever. If I try and think of
| everything as snapshots it actually gets more confusing for me.
| jasonwatkinspdx wrote:
| So, I think a useful simple way to think of it is "git
| creates diffs when it needs to on demand."
|
| When you're doing a cherry pick of say commit ce123, what
| you're asking git to do is: 1. Diff ce123 against its parent
| 2. Go apply that diff to some other branch
|
| Likewise rebasing is the same, but with an extra step to
| apply the inverse of the diff to the original commit first,
| then rewrite the history.
|
| One of the big advantages of this on demand diffing approach
| is it's much more robust vs conflicts. Back in the subversion
| days I wrote some shell scripts that did the equivalent of
| git cherry pick and rebase. I'd keep a couple extra copies of
| a checkout, would use the switch command to quickly put them
| into a specific state, then would just generate a diff
| manually to apply to my main working copy. It worked, and was
| often faster than manually copying text around between editor
| windows, but it was extremely conflict prone.
|
| So this distinction, of whether you store snapshots and diff
| on demand, or store diffs and snapshot on demand, is somewhat
| subtle but has important consequences.
| caterama wrote:
| Since you can go from diffs to snapshots, and snapshots to
| diffs, aren't they basically equivalent? I'm struggling to
| see the important consequences at the user level.
| viraptor wrote:
| You can't go from diffs to snapshots. Two identical diffs
| can be applied on different branches - looking just at
| the diffs, you don't know which branch it is.
| zwieback wrote:
| Yeah, good summary.
| [deleted]
| ChrisMarshallNY wrote:
| That's a cool explanation.
|
| I'm a bit slow on the uptake, so I had to re-read a couple of
| sections, but it was helpful.
| whack wrote:
| From a storage perspective, describing commits as snapshots seems
| like a bad mental model. Suppose I have a directory that is 100MB
| in size. If I take a snapshot of it, my snapshot would be 100MB
| in size. If I take a 2nd snapshot of it tomorrow, my 2nd snapshot
| would also be 100MB in size. My total storage needs would now be
| 300MB.
|
| Whereas if I had used git, and created 2 additional commits, each
| making a change to a small text file, my total storage size would
| be barely larger than 100MB. Describing the commits as a diff, as
| opposed to a snapshot, leads to a better intuitive understanding
| of why this would be the case.
|
| Not to mention other features the article discussed, such as
| cherry-picking. What does it even mean to "cherry-pick a
| snapshot"? In comparison, cherry-picking a diff and applying it
| to your current state, is far more intuitive.
|
| And let's not forget commit messages. If a commit is a snapshot,
| I would expect the commit-message to be descriptive of the entire
| snapshot. Whereas if a commit is a diff, I would expect the
| commit message to be descriptive of the diff. Which is exactly
| how most people use commit messages.
|
| Obviously both "diffs" and "snapshots" are leaky abstractions. If
| you insist on using the "snapshot" abstraction, you will need to
| resolve all of the above points of confusion by adding more
| complexity to your abstraction. And if you prefer to use the
| "diff" abstraction, you will eventually need to explain that a
| commit is actually a combination of diffs, along with some other
| metadata like a pointer to a parent commit. As a teaching tool,
| you can make either abstraction work. But I find it far more
| intuitive and useful to think of commits as "diffs + some
| metadata".
| jayd16 wrote:
| Depends on the diff. If the diff is not aligned by bits a
| single bit offset might cause double the size, ie the full file
| to delete and a full file add.
|
| >If you insist on using the "snapshot" abstraction
|
| But its not insisted. Both abstractions are used as needed.
| NTARelix wrote:
| After going through the "Git Internals"[0] docs, I found that
| the snapshot mental model has been much more helpful in
| understanding what my Git commands are doing, how someone's
| history got into a confusing state, etc. The primary model is
| that of the Merkle tree, and subsequently hashing, which are
| very simple and powerful concepts.
|
| [0]: https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-
| Po...
| goerz wrote:
| You can still think of them as snapshots. Git just does
| compression on the entire folder of snapshots, including de-
| duplication of data that doesn't change between snapshots.
|
| In fact, when I teach git to students, I don't even bother with
| the trees/blobs, which in my view are just an implementation
| detail. I just tell them to think of git zipping up their
| working directory together with some metadata (commit message,
| reference to parents), and putting that zip file into its own
| "compressed" storage inside the .git directory. That seems to
| be sufficient for a good mental model of how to work with git
| (independently of the git's somewhat baroque command line
| interface, which just takes getting used to)
| gilbetron wrote:
| So it only stores the _diff_ erence between the two
| snapshots? ;)
| planckscnst wrote:
| No, it stores an entirely new set of references to objects,
| as well as some of those objects themselves (any that are
| not identical to previously stored objects).
|
| You cannot look at a commit on its own and know exactly how
| it's different from the previous commit, but you do have
| the complete new state. You have to look at the parent
| commit's references and do an object-by-object comparison
| to identify exact changes. On the other hand, when you look
| at a diff, you can see exactly what has changed, but you
| cannot produce the version that came before without also
| having a complete copy of the current version.
| dbt00 wrote:
| The implementation-specific compression doesn't store
| deltas or diffs, it stores unique blocks of text.
|
| Git allows for shallow clones, which would be impossible if
| the protocol or implementation were based solely around
| diffs.
| detaro wrote:
| No. _If_ it _chooses to_ compress the commits, which it if
| I remember correctly not does automatically for each
| commit, but rather occasionally as a larger step, it uses
| the difference to whatever it deems to be a good candidate,
| if it finds one. E.g. if you have a file in commit A,
| change it massively in later commit B, and then on a
| different branch create commit C that also changes the file
| to one very similar to the one in B, git might very well
| compress C by storing the difference from B to C, despite
| those having no direct relationship in the commit graph. It
| can also choose to not use a delta to a different version
| entirely, and this is 100% an internal implementation
| detail of the storage system in git (afaik one of those
| implementation details is that it prefers candidates that
| are in the same commit chain, but it doesn 't have to - and
| it can easily jump multiple commits if that works better).
| If you ask git to show you a diff to the previous commit,
| it does not pull a diff from storage, but pulls two file
| versions from its storage backend (which if deltas have
| been used to store will resolve those) and diffs them.
| redisman wrote:
| I don't know that you need to teach them any of that. Version
| control is an abstraction. I have no clue what happens under
| the hood and I don't care.
| shuntress wrote:
| To some extent, this is true. I don't feel the need to
| totally understand gits packing logic or the specific
| mechanics of the various diff/merge algorithms.
|
| But some knowledge of how/why your tools work the way they
| do can be very helpful.
|
| Some knowledge of a tools internal working can be
| fundamental to efficient use of that tool. At the very
| least it can allow you to understand or derive your useful
| interactions with that tool rather than simply memorize how
| it is used.
| hmsimha wrote:
| This is the thing though. You're talking about snapshots
| which actually have duplication removed... in my mind this
| really fits more with the 'diff' model. I've already done the
| exploratory diving-into-git-internals thing years ago, so I
| could develop a better understanding of how things actually
| work.
|
| But for newcomers who want to understand how git is working,
| it really makes more sense to tell them it's 'like a diff.
| Not exactly under the hood, but think of it like a diff for
| now'. This is what I've been telling people as I've mentored
| a number of people in getting acquainted with git over the
| years, and if they're curious enough to look under the hood,
| they'll get a better understanding of the internals.
|
| As a programmer, what you're working with is essentially the
| diff. This is the easiest way to think about things
| initially. The fact that git is storing blobs under the hood,
| shallowly deduplicating blobs but still storing large chunks
| separately that may contain duplicate data, until it
| generates packfiles which do a deeper
| deduplication/compression, is really not that helpful.
| Telling people it's more like zipping is a bit disingenuous
| because it doesn't really explain how things are compressed
| more efficiently over the course of _many_ changes.
|
| If I have a 1MB code file and make 1000 commits of one-line
| changes then sure, git is initially storing large blobs
| representing those, but then will compress over the change
| set when it generates the packfile.
|
| Compared to making a zip of the file for every change (say
| these are 100KB compressed) and now you have people thinking
| the 1000 one-line changes generate 100MB in the .git
| directory.
|
| You may think that a 1MB file with many smaller changes is a
| fabricated example, but consider that dependency lockfiles
| (package-lock.json I'm looking at you) can easily grow to
| this size, and contain this many changes.
| formerly_proven wrote:
| The "snapshots which are stored as deltas, if that works"
| part is _unrelated_ to the diffs the git porcelain
| generates for you when you do a git-diff or git-show. The
| former is purely an implementation detail of the storage
| (albeit an important one), while the latter is entirely
| virtual, calculated from the snapshots every single time
| you view the data. That 's why operations like git-diff and
| git-blame can take some time on large trees or histories
| (and why e.g. git-blame has various options to tweak how it
| tracks files across revisions, because that is not
| something git does), while git-log is fast.
| ako wrote:
| Not really: if you do a checkout of a snapshot into an
| empty directory, you expect the entire state at the time of
| the snapshot, not just the diffs.
| goerz wrote:
| It may depend on the background of who you're talking to.
| Programmers may be very comfortable with diffs, but non-
| programmers (in my case, physics graduate students) usually
| aren't. On the other hand, everybody is familiar with
| snapshots: even high school student will end up with
| "report_v1.docx", "report_v2.docx", etc, which are
| snapshots at the file level (and work reasonably well as
| long as you have a consistent scheme and don't need
| branches). I've also routinely seen less-technical people
| organize their research / paper writing by making a weekly
| snapshot of their work folder ("project-2020-04-1").
| Telling these people that git basically does the same thing
| for them automatically with a tree-like "labeling scheme"
| that allows for branches tends to go over quite well, in my
| experience. For actually programmers, I'd be inclined to
| give them a more technical introduction to git's internals.
| I'd still point out that git stores compressed snapshots,
| not diffs (especially if they're older and may have
| previous SVN experience)
| hmsimha wrote:
| Those non-programmers are likely going to have a worse
| understanding of what is happening when you zip/compress
| something anyway, but I concede this is probably the most
| straightforward path if they have some understanding of
| what a zip is, and can't understand what a diff is. But
| even then I question if they should be using git, since
| `git diff`, `git show`, basically everything git exposes,
| is going to show them diffs.
| sagonar wrote:
| A storage with pure diff would be impossible to recover
| if you get a error in any commit. It would also be much
| slower to examine the data, and newer version control do
| not use pure diff.
|
| The version control system Mercurial had description
| about these problems on the homepage, "behind the
| scense", which was good reading.
|
| I am not sure if GIT is the best solution, but at least a
| "pure snapshot" is okey, but where a diff storage must in
| practise include some snapshot logic as well.
| gowld wrote:
| As a programmer I care about diffs only when I am comparing
| two versions. A commit creates a new version. "Snapshot" is
| a distraction.
| goerz wrote:
| Also (for less-technical audiences), I don't exactly dwell
| on the de-duplication. It's just "Git makes snapshots and
| puts them into .git in some efficient way. Don't worry
| about it. Or, if you want the details, read the Git SCM
| book."
| bosswipe wrote:
| The diff mental model doesn't work for things like `git
| checkout <commit>`.
| hmsimha wrote:
| I actually haven't had a problem with this, though
| perhaps it's because I understand what's happening at a
| deeper level. You're generally referencing commits which
| exist somewhere in this family of commits you can view
| with `git log --graph`. You can easily think of checkout
| as the path of diffs to get there. Files at commits are
| still whole objects, mentally, but the thing we care
| about as programmers working with multiple versions are
| the diffs.
|
| I have had it break down a bit more when working with
| stash though, because now the object you're referencing
| can exist outside of that graph-like commit family.
| dfox wrote:
| If you commit 100MB file, change few bytes in it and commit it
| again your .git/objects will almost certainly contain two 100MB
| objects. The fact that it is somewhat likely that running "git
| gc" or something similar will convert one of them into
| reference to the other one and some compact representation of
| the difference is implementation detail.
|
| While commit object does represent the snapshot it also
| references the previous state, thus the commit message usually
| describes what was changed between the referenced snapshot and
| the parent(s) that are also referenced from the commit object.
|
| As for the overall model and leakage between implementation
| details and how people use it interesting approach is used by
| SCCS/BitKeeper with its internal "weave" format that
| essentially is both snapshot and diff at the same time.
| jedimastert wrote:
| I prefer to think of a repo as a whole as a tree, where the
| nodes are snapshots and the vertices between each node is a
| diff. This sort of lands us in both places
| cesarb wrote:
| > From a storage perspective, describing commits as snapshots
| seems like a bad mental model. Suppose I have a directory that
| is 100MB in size. If I take a snapshot of it, my snapshot would
| be 100MB in size. If I take a 2nd snapshot of it tomorrow, my
| 2nd snapshot would also be 100MB in size. My total storage
| needs would now be 300MB.
|
| That's not what one would expect. Suppose I have a directory
| that is 100MB in size. If I take a snapshot of it ("btrfs
| subvolume snapshot"), my snapshot would be 100MB in size, but
| the storage needed for the original and the snapshot together
| would still be 100MB (plus a few kilobytes of overhead). If I
| take a second snapshot of it tomorrow ("btrfs subvolume
| snapshot" again), my second snapshot would also be 100MB in
| size, and my total storage needs would still be 100MB (plus a
| few kilobytes of overhead).
|
| If I made a change to a small text file before each snapshot,
| my total storage size would still be barely larger than 100MB.
|
| That is, when creating a snapshot, one would expect it to be
| copy-on-write. While not exactly what git does (it's a content-
| addressable storage instead of a copy-on-write storage), the
| end effect is similar enough for most purposes (the main
| difference being that undoing a change in git would not need
| extra storage, while a copy-on-write storage would store a new
| copy of the contents).
| barrkel wrote:
| Copy on write filesystems describe changes as a structural
| diff, effectively.
| klodolph wrote:
| That's not really true. The copy-on-write filesystems just
| allow multiple files to reference the same blocks, and only
| allow modifications to blocks if the refcount is 1. At
| least, at its simplest, that's how copy-on-write works. To
| copy a file, you copy the block references and increment
| the reference counts. You won't end up with a diff or
| deltas stored anywhere.
| whack wrote:
| I've learnt something new today, thanks for sharing. Looks
| like I had a naive understanding of how snapshotting actually
| works.
|
| I still think that it's more intuitive to describe commits as
| diffs, in the context of things like cherry-picking a commit
| or rebasing/reordering a series of commits.
|
| But given that you can also "check out" a commit, in order to
| get a specific snapshot of the repo, I can see the parallels
| between commits and snapshots. Maybe both analogies are
| equally useful in describing the different features that git
| provides.
| diroussel wrote:
| The point of the article is not an analogy. Git is based on
| snapshots. Abs diffs are computed from snapshots as needed.
|
| The snapshots are also de-duplicated and compressed, but
| that is not important.
|
| The article is a good one. And if you spend the time to
| understand git it gets easier to use.
| breischl wrote:
| >I still think that it's more intuitive to describe commits
| as diffs, in the context of things like cherry-picking a
| commit or rebasing/reordering a series of commits.
|
| If I understood the article correctly, those things
| actually are implemented via diffs. It's just that the
| diffs are calculated on-the-fly, used to create a new
| snapshot, and then discarded.
| AaronFriel wrote:
| It's helpful to understand git in terms of the "porcelain"
| and the "plumbing".
|
| The git commands you know and love are largely the
| porcelain, nice fixtures over other things. When you "git
| cherry-pick", under the hood what it's actually doing is
| querying that commit's parent(s), finding the diff the
| commit introduced relative to its parent(s), and then
| applies those same changes to the index and your working
| tree.
|
| Cherry-pick is porcelain on top of the plumbing.
|
| There are a few "write git yourself" tutorials out there,
| of which "Write yourself a Git!" is I think the most
| popular. In it, you'll learn how git really stores data,
| and you'll write a (fairly basic) git client that can do
| several things to locally manage a repository.
|
| Write yourself a Git!: https://wyag.thb.lt/
| spuz wrote:
| The correct way to think about snapshots and diffs when it
| comes to cherry-picking and rebasing is to realise that
| diffs are always derived from snapshots. I.e. the
| fundamental data-structure is the snapshot and from those
| we can build diffs. Those diffs are necessary to implement
| cherry-picking and rebasing but it's also possible to
| imagine an implementation git that has those features
| missing. It would still fundamentally work in the same way
| - it would just be slightly less useful.
|
| Edit: If you think this is just splitting hairs, I
| encourage you to look at the differences between git and
| pijul which is a VCS where the fundamental building block
| is diffs: https://pijul.com/
| throwaway894345 wrote:
| Copy-on-write is an implementation detail that allows for
| lower storage. The snapshot is still the full copy. One could
| try to argue that the same is true for git in that diffs (or
| content addressable storage) are just an internal
| implementation detail, but as the parent pointed out that's
| not quite true--our commits document the diff, not the
| materialized snapshot.
| crazygringo wrote:
| Clearly people are using two diametrically opposed
| definitions of snapshot.
|
| If a snapshot is defined is _opposed_ to a diff, then it 's
| clear snapshot means "full copy". If I snapshot the state of
| my cloud server, it creates a full copy of its disk in block
| storage somewhere, and takes several minutes to complete.
|
| You are describing snapshots that exist _as part of a diff
| system_ or copy-on-write system, where they use virtually no
| storage at all, because further changes are assumed to be
| applied _as diffs_ rather than overwriting previous data.
| Where the snapshot is a "marked" diff that can specifically
| be rewinded to, as opposed to a general ongoing stream of
| diffs.
|
| But that's a more advanced and system-specific definition of
| snapshot.
|
| As a general mental model, when you say "think of it as a
| snapshot not a diff", I think it's clear that the former
| definition is being used, and that the expectation is a fully
| copy that takes up disk space. Because otherwise, in the
| second case, all the snapshots _are_ just the most recent
| diff (on top of the entire prior history), so the sentence
| "think of it as a snapshot not a diff" doesn't really mean
| anything. The snapshot and the diff are the same.
| klodolph wrote:
| > If I snapshot the state of my cloud server, it creates a
| full copy of its disk in block storage somewhere, and takes
| several minutes to complete.
|
| Which cloud provider are you using? Neither Amazon nor
| Google take snapshots this way. Amazon EBS and Google
| Persistent Disk both use copy-on-write semantics for
| snapshots. If you take a hundred snapshots of a 100 GB
| disk, your total usage is 100 GB plus metadata. When you
| run a VM instance from that disk, the storage usage will
| increase as blocks change, to a maximum of 200 GB total
| storage (for live disk + out of date snapshot).
|
| When I use QEMU or VirtualBox at home, I also get copy-on-
| write snapshots of disks, although it's certainly possible
| to get a full copy if you want. I think the feature is
| pretty standard.
| crazygringo wrote:
| Digital Ocean. It absolutely takes snapshots by making a
| full copy:
|
| https://docs.digitalocean.com/products/images/snapshots/
|
| So this is a perfect example of what I mean by the word
| "snapshot" being used in two different ways by different
| people.
|
| Snapshot meaning "full copy" is one usage (Digital
| Ocean), snapshot meaning "diff checkpoint" is another
| usage (Google, AWS).
| klodolph wrote:
| Those aren't different definitions of "snapshot", though.
| crazygringo wrote:
| Of course they're different. They have different
| meanings, so they're different definitions.
|
| It's not like it's the same concept with different hidden
| implementation details.
|
| On Digital Ocean, I can delete the server but I still
| have the snapshot. On the others, you can't. One copies,
| the other bookmarks.
|
| They're _entirely_ different concepts, therefore
| different definitions.
| Ericson2314 wrote:
| > Obviously both "diffs" and "snapshots" are leaky
| abstractions.....
|
| Joel Spolsky wrote many great things, but "all abstractions
| leak" was not one of them (edit his but not good). I am very
| tired of programmers excusing their poor imagination with
| appeals to this nonsense.
|
| ------
|
| Commits store snapshots. Full stop.
|
| The "bad mental model" is not commits being snapshots, but
| things behind stored individually, i.e.
|
| > Sum |things| = |Product things|
|
| This comes up in many other contexts, especially when storage
| quotas are involved and it's unclear what to do when storage is
| deduped across quotas.
|
| -----
|
| git packfiles do use a delta encoding, but it's important to
| understand that there isn't any necessarily any correspondence
| between the history and the delta encodidng. In fact, commands
| like `git repack` exist _precisely_ to avoid path dependency
| issues from the repacks matching the history too much.
|
| Saying commits are diffs to explain the delta-encoding storage
| characteristics is wrong and confuses, not clarifies.
|
| ------
|
| > And let's not forget commit messages. If a commit is a
| snapshot, I would expect the commit-message to be descriptive
| of the entire snapshot. Whereas if a commit is a diff, I would
| expect the commit message to be descriptive of the diff. Which
| is exactly how most people use commit messages.
|
| It's git tree objects that are snapshots, commit objects have
| tree child and a prev commit child, so it is natural for them
| to describe the relationship between two states without
| appealing to hypothetical alternatives.
|
| > Not to mention other features the article discussed, such as
| cherry-picking. What does it even mean to "cherry-pick a
| snapshot"? In comparison, cherry-picking a diff and applying it
| to your current state, is far more intuitive.
|
| I might `git checkout somethingelse .` mid-rebase. What does
| that mean if commits are diffs? Nothing very clear. The better
| thing to teach people is about darcs and patch theory and those
| other modules. I think the git model and the patch theory model
| both have uses, and the fact that git makes people always work
| in the git model is a fundamental issue that cannot be fixed
| with analogies.
|
| - Patch theory is good for the things are you still working on
|
| - merkle dag of states is good for the things you've already
| done / agreed upon.
| gowld wrote:
| > All non-trivial abstractions, to some degree, are leaky.
|
| You look a bit silly making grandiose comments that take one
| web searching to disprove
|
| https://www.joelonsoftware.com/2002/11/11/the-law-of-
| leaky-a...
|
| > All non-trivial abstractions, to some degree, are leaky.
| heinrich5991 wrote:
| I think the emphasis was one "great". I.e. your parent
| wants to say that this thing Joel wrote was not great.
| breischl wrote:
| I'm fairly certain he was disagreeing with the content of
| the statement, not that Joel Spolsky wrote it.
|
| ie, yes Spolsky said that, but he was wrong.
| Ericson2314 wrote:
| Yes, thanks
| LukeShu wrote:
| _> Suppose I have a directory that is 100MB in size. If I take
| a snapshot of it, my snapshot would be 100MB in size._
|
| Not with `btrfs subvolume snapshot`, it won't. If that's not a
| snapshot, I don't know what is.
|
| From a storage perspective, no dammit, Git commits _are
| snapshots_ , look at the bits on disk if you don't believe it.
| This isn't something that people who like to write blog posts
| about Git made up for pedagogical purposes, it's how Git
| _actually works_.
|
| As you point out, it's wonky for pedagogical purposes; what
| does it mean to "cherry-pick" a snapshot? When thinking about
| cherry-picking, yeah, a diff makes more sense than a snapshot.
| But saying a diff is better pedagogically doesn't change the
| fact that a commit _actually is_ a snapshot (and when cherry-
| picking, it diffs to snapshots to create a patch, then applies
| that patch).
| jgraham wrote:
| > From a storage perspective, no dammit, Git commits are
| snapshots, look at the bits on disk if you don't believe it
|
| Except they're not. They're (often) packfiles, which are a
| delta encoding i.e. a diff. It's not necessarily the same as
| a specific commit, but appealing to "the bits on disk" is
| wrong.
|
| It is certainly true that the git object model each commit
| object refers to a tree that represents the complete state of
| the repository at that commit.
|
| It is also true that _many_ git commands implictly treat a
| commit as being the diff between the state of the tree in
| that commit and the state in the parent. For example git
| show, git rebase and git cherry-pick.
|
| It is simultaneously true that the on-disk storage system is
| optimised for performance and so doesn't map onto the object
| model in a trivial way.
| LukeShu wrote:
| _> They 're (often) packfiles, which are a delta encoding
| i.e. a diff. ...appealing to "the bits on disk" is wrong._
|
| That's fair. The diffs in a packfile have no relation to
| the "diff" that a commit would be if the commit were a
| diff; so it's wrong to use "but packfiles" when arguing
| that commits are diffs and not snapshots; but you're right,
| packfiles make my "bits on disk" argument not quite right.
|
| The way I look at it is that packfiles are a compression
| mechanism; and they don't alter the fact that fundamentally
| it's snapshots that are being compressed. But that's not
| the only way of looking at it.
| nyanpasu64 wrote:
| > It is also true that many git commands implictly treat a
| commit as being the diff between the state of the tree in
| that commit and the state in the parent. For example git
| show, git rebase and git cherry-pick.
|
| A commit is a snapshot, and you can compute the diff
| between a commit and any of its parents. If a commit has
| multiple parents, git cherry-pick bails out unless you pick
| a parent (usually -m 1), and git rebase, I think implicitly
| assumes the first parent.
|
| (EDIT: a commit's tree, its parents' trees)
| LukeShu wrote:
| _> If a commit has multiple parents, ... git rebase, I
| think implicitly assumes the first parent._
|
| `git rebase`'s behavior regarding merge commits is
| shockingly complicated, but much of the time: Because by
| default it linearizes the history, it actually just skips
| merge commits because it assumes that the merge has
| already happened implicitly by applying one of the
| merge's parents on top of the other parent.
| kazinator wrote:
| > _What does it even mean to "cherry-pick a snapshot"?_
|
| It means to do something like a three-way diff among three
| snapshots: the cherry-picked baseline, the target, and a common
| ancestor.
|
| You can do something similar with the diff3 tool, which takes
| three files (snapshots) as input, not diffs.
| Twirrim wrote:
| > From a storage perspective, describing commits as snapshots
| seems like a bad mental model. Suppose I have a directory that
| is 100MB in size. If I take a snapshot of it, my snapshot would
| be 100MB in size. If I take a 2nd snapshot of it tomorrow, my
| 2nd snapshot would also be 100MB in size. My total storage
| needs would now be 300MB.
|
| That's not the way storage snapshot works under most (all?)
| storage targeted file systems, filers etc.. What you're talking
| about there is a backup.
|
| Snapshots are not backups. Snapshots work on "copy on write"
| basis.
|
| Roughly speaking, when you take a snapshot you draw a line in
| the sand. "These were the files at this time". Snapshot
| operations as a result are super cheap and super fast. Future
| changes to those files results in the filer/file system writing
| the modified blocks to new locations, not overwriting the
| original data.
|
| So take a 100MB directory. I create a snapshot. That results in
| almost new storage usage, just a small amount of metadata. I
| write/modify 10MB of data, now the total storage cost is 110MB.
| If I take another snapshot after writing that 10MB. it's still
| only 110MB of storage usage.
| dbt00 wrote:
| if your filesystem was copy on write and implemented snapshot
| semantics internally (like WAFL for example, over 20 years old
| now), then the second snapshot would not take 100MB, it would
| just cost the metadata.
|
| A commit is a snapshot of a tree with a reference to it's prior
| ancestors. It's important to know that because it becomes
| extremely relevant when trying to do things like merges
| properly.
| towergratis wrote:
| Lookup Copy-On-Write. ZFS and BtrFS do it.
| outworlder wrote:
| Commits are snapshots.
|
| How to represent those snapshots, and fix the storage bloat a
| naive implementation would cause, is a completely different
| problem.
|
| One of the things that makes Git smart is that it doesn't try
| to optimize things prematurely. SVN and co. would store actual
| diff data, but this made some operations really hard to
| implement (and, in many cases, slow).
|
| Git has commits conceptually as snapshots. It's up to the
| storage code to figure out how to deal with this.
|
| > But I find it far more intuitive and useful to think of
| commits as "diffs + some metadata".
|
| Except that this is not what's happening. I wouldn't even call
| it an abstraction, it's how things actually work. What you call
| abstractions are actually operations. If we run a diff we are
| interested in the changes, but if you ask git to show you the
| commit it will show you just that.
|
| If you think a commit is a diff, you have a mismatch between
| the mental model and what's actually happening behind the
| scenes. This will make it difficult to understand concepts
| later on.
| slavik81 wrote:
| > If we run a diff we are interested in the changes, but if
| you ask git to show you the commit it will show you just
| that.
|
| git show <commit SHA-1> will output a diff.
| trulyme wrote:
| I think this is more a sign that git (porcelain) is not
| aligned with the underlying model.
|
| It is actually a pity that so little effort went into git
| UI. I find the OP explanation of git model awesome and the
| presented concepts beautiful, but the cli utility has
| countless naming and consistency problems which make me sad
| that hg didn't win over git. Life would be much simpler for
| many developers if it did, imho.
| munk-a wrote:
| > If you think a commit is a diff, you have a mismatch
| between the mental model and what's actually happening behind
| the scenes. This will make it difficult to understand
| concepts later on.
|
| I don't think those concepts are distinct as you're painting
| them. At a user visible level commits will almost always be
| visualized as diffs, which puts us at a place where - at the
| highest level and lowest level they're defined as pretty
| close to diffs, while at an intermediary level they're
| defined closer to snapshots.
|
| I honestly think they're neither, each expression method
| (diff vs. snapshot) can be translated pretty easily and both
| are trying to represent the same end goal. It can be helpful
| to know that commits are representative of the full state of
| the codebase that exists at a time, but that view can be at
| odds with merging and rebasing which use actual change sets
| to calculate - when a commit is being manipulated it's
| helpful to view it as a diff (and git does this) - while as,
| when a commit is being read, we're using it as a snapshot.
| mewse wrote:
| Structure purist, ingredient rebel: A snapshot between two
| levels of diffs is a sandwich.
| xmprt wrote:
| One way I like to think about this is that when you rebase a
| branch, the diffs are the same (barring any conflicts) but
| the commits are different. Just another reason commits aren't
| the same as diffs.
| klodolph wrote:
| The diffs are often different, even without conflicts. Try
| comparing them some time, and look closely at the diff...
| look at the lines starting with @. People usually ignore
| those lines but "patch" does NOT.
|
| This is not an irrelevant detail, but it's the result of a
| three-way merge. The three-way merge can update those @
| lines if it has a complete set of inputs (all three
| inputs). If you to make a patch from one branch and then
| apply it to a different branch without using the three-way
| merge algorithm (stripping the diff of all its context),
| the patch may fail to apply even if the three-way merge
| succeeded without conflicts.
| hibbelig wrote:
| > _If you think a commit is a diff, you have a mismatch
| between the mental model and what 's actually happening
| behind the scenes. This will make it difficult to understand
| concepts later on._
|
| I find that thinking of commits as snapshots is not so
| useful. I prefer to think of them as a pair of parent commit
| and diff.
|
| With that in mind, things like rebase become obvious: Take
| the same diff and attempt to apply it to a different parent.
|
| It's not clear to me how thinking of commits as snapshots
| helps me to explain operations such as rebase.
|
| I do concede, however, that "git cat" (I think that's the
| command) seems more closely related to a snapshot: you
| identify a commit and a file, and it will give you the
| content of that file at that commit. Clearly in this case the
| concept of a snapshot works well. But I need this very
| rarely.
| tlb wrote:
| Rebase doesn't work that way, though [0]. It first extracts
| the 3 versions (2 leafs and their common ancestor) and then
| does a diff & patch.
|
| This allows git to store the deltas between versions in the
| most efficient way on disk, while also letting it use
| contextual diffs to minimize the chance of spurious merge
| conflicts. Patching algorithms have various heuristics that
| make sense for programming languages, like special
| treatment for lines with only changes in whitespace.
|
| (Edited to add:) also, minimal diff algorithms have to do a
| lot of work to detect large blocks of text being moved
| around. This is part of what made Subversion, which used
| the same diff algorithm for storage compression and
| merging, painfully slow.
|
| [0] https://git-scm.com/book/en/v2/Git-Branching-Rebasing
| hibbelig wrote:
| Here is the paragraph that describes what rebase does:
|
| > _This operation works by going to the common ancestor
| of the two branches (the one you're on and the one you're
| rebasing onto), getting the diff introduced by each
| commit of the branch you're on, saving those diffs to
| temporary files, resetting the current branch to the same
| commit as the branch you are rebasing onto, and finally
| applying each change in turn._
|
| Is "applying the diff to a different parent" not a good
| way to describe this?
| tlb wrote:
| You're using the word 'diff' for 2 different things:
|
| - an efficient way to store 2 very similar files
|
| - the minimal set of changes made by a programmer to a
| file.
|
| Subversion uses the same diff algorithm for these 2
| functions, which is why people conflate them. But git
| uses different algorithms. The first one (which it calls
| deltas) are optimized for speed and compression ratio.
| The second set of algorithms (you can choose from a few,
| some of which are better at identifying rearrangements of
| large blocks of text) are optimized for merging 2
| programmer's changes without conflicts.
| haberman wrote:
| > With that in mind, things like rebase become obvious:
| Take the same diff and attempt to apply it to a different
| parent.
|
| You can think of it that way if you want. But it's not what
| Git actually does.
|
| Personally I much prefer to have my mental model match the
| actual reality of things.
|
| You may not use "git cat" very often, but what about "git
| checkout <SHA>"? If commits were stored as diffs, then Git
| would have to rebuild a tree of the very first commit, then
| replay every single diff up to the SHA you asked for.
|
| What it does in actuality is find the snapshot of that SHA
| and change the working tree to match it.
| hibbelig wrote:
| > _You may not use "git cat" very often, but what about
| "git checkout <SHA>"? If commits were stored as diffs,
| then Git would have to rebuild a tree of the very first
| commit, then replay every single diff up to the SHA you
| asked for._
|
| Yes, this is true. I don't know why it never bothers me.
| Maybe it's because you could also store the diffs in the
| opposite direction (i.e. store the tip of each branch in
| the clear, then store diffs from each commit to its
| parent). Computing the inverse of a diff should be a
| quick operation. Usually, when you check out something,
| it's the tip of a branch or near the tip of a branch.
|
| Anyway.
|
| Of course I know that storing trees makes it easy to
| compute diffs. Computing diffs will becomes slower with
| larger trees. On the other hand, storing diffs makes it
| slow to compute trees, and the more commits we've got,
| the slower the tree computation goes.
| kevincox wrote:
| > Computing diffs will becomes slower with larger trees
|
| Not usually. Computing a diff is roughly O(n) with the
| size of a diff. This is because unchanged leaves of the
| tree can be seen as identical (because the are content
| addressed) and are skipped. So to compute the diff you
| only need to recurse into changed directories.
|
| So having a million files in the root directory and one
| has changed is very fast to diff as you just diff that
| one file. The worse case is the diff happening in a very
| deeply nested directory with lots of files in each of the
| subdirectories but even that is quite cheap as diffing a
| sorted directory listing is O(n) with the size of the
| listing.
|
| (The actual worst case is diffing large files as most
| text diff algorithms are worse than O(n))
| yxhuvud wrote:
| > If commits were stored as diffs, then Git would have to
| rebuild a tree of the very first commit, then replay
| every single diff
|
| Well, it would usually be more efficient to figure out
| where the current checked out branch differ from the
| branch that is checked out, and then unapply and apply
| diffs as needed.
| wazari972 wrote:
| what about "git cherry-pick <commit>"?
|
| with this command you don't import a snapshot, but only
| the diff between <commit~>..<commit>, so the model
| parent+diff makes sense to me
| taberiand wrote:
| If git did rebuild the graph, right from the very first
| commit, the end result of the operation would look
| identical to the user as it does now.
|
| It seems to me the two mental models are interchangeable
| when it comes to the use of git from the users point of
| view. What is missing, from the users point of view, when
| they model commits as diffs+parents vs as snapshots?
|
| Now I think about it, it's probably that users have a bad
| understanding of the commit-as-diff models; they could
| similarly have a bad understanding of the commit-as-
| snapshot model I expect, I don't know that thinking in
| snapshots helps to understand git from an users point of
| view better than thinking (properly) in diffs.
|
| The article for example explains that any two commits can
| be differenced because the underlying snapshot trees can
| be compared, but the commit-as-diff model can as easily
| explain why comparing two commits works by tracing each
| commit back to the common base commit - so the commit-as-
| diff mental model just needs to remember that commits are
| fundamentally tied to the path they have back to the root
| commit.
|
| It seems to me if you take the diagrams from the article
| and remove the under-the-covers stuff leaving just the
| circles, the commits-as-diffs and commits-as-snapshots
| models look exactly the same.
| JoshuaDavid wrote:
| Merge commits are a bit hard to understand from the
| perspective of "a commit is basically just a parent
| commit plus diff".
|
| On the flip side, cherry-picking is hard to understand
| from the perspective of "a commit is basically just a
| snapshot, nothing more" (it's _also_ weird from the
| parent-commit-plus-diff perspective -- cherry-pick is
| kind of a weird operation, but useful enough that we keep
| it anyway despite it not fitting quite as cleanly into
| the git model as other operations).
|
| Outside those edge cases, though, people with "snapshot"
| and "parent + diff" mental models will make basically
| identical predictions about what the results of various
| operations with git will be.
| haberman wrote:
| > What is missing, from the users point of view, when
| they model commits as diffs+parents vs as snapshots?
|
| With the wrong mental model it's harder to predict what
| operations are expensive. If "git checkout <SHA>" truly
| did have to replay all diffs from the beginning of time,
| it would be a very expensive operation that is best
| avoided unless you absolutely need it. But in practice it
| is a very fast operation (one of the fastest) that there
| is no need to shy away from.
| [deleted]
| [deleted]
| klodolph wrote:
| The way you try to apply a diff to a different parent is by
| doing a three-way merge... the vast majority of tools do
| this by taking three files as arguments and producing a
| fourth as output. The three-way merge is the underlying
| process which makes merge, rebase, cherry-pick, and revert
| work. They are all just "three-way merge, shuffle the
| arguments around, and adjust metadata".
|
| The parent + diff storage is not isomorphic to snapshot
| storage. Snapshot storage reflects the actual usage of VCS
| tools... people make changes, and record the final state.
| Parent + diff does not do this, it records the changes,
| which requires creating a diff, and there are multiple ways
| to create a diff between two snapshots.
|
| Git postpones the "which diff is correct" question until
| you actually care about the answer.
| iudqnolq wrote:
| Why have so many people written long thoughtful explanations
| about how the author is wrong to suggest snapshots are a better
| mental model, and that you think all abstractions are leaky, but
| you find diffs a better mental model?
|
| The entire article is literally about how commits are literally
| snapshots. I would say people didn't read TFA, but a lot of
| people are quoting lines from TFA and then going on to argue
| with/expand on them in a way that is directly contradicted by the
| next few lines.
|
| I think it's because most of the people here have spent years
| working with git, and are so deeply attached to their
| understanding that they didn't hear most of what the article
| said.
|
| (Some commentators have pointed out specific oversimplifications
| the author makes like glossing over pack files, I'm referring to
| the people who say a git blob is a diff when the entire point of
| TFA is that it isn't)
| mekkkkkk wrote:
| What does TFA mean in this context?
| dekerta wrote:
| TFA = The F**ing Article
| mekkkkkk wrote:
| Thanks. I had a hunch. I'm familiar with "RTFM", but would
| probably get equally confused if "TFM" was used as a noun.
| hunter2_ wrote:
| I suspect the chronology is something like RTFM -> TFM ->
| RTFA -> TFA, but the second and third might be switched.
| Dropping the R does introduce obscurity, but being able
| to convey the underlying sentiment (that while the
| content could/should have been consulted, it seems as
| though it was not) without a verb allows for a
| nonconfrontational syntax similar to passive voice, but
| even moreso, and often without obvious "weasel" effect,
| to boot!
| nayuki wrote:
| And I believe this slang came from Slashdot, which is like
| the Hacker News forum in the decade before Hacker News
| doublerabbit wrote:
| The same context as RTFM
| iudqnolq wrote:
| it's an abbreviation to refer to the article being discussed
| on a site like this.
| Agingcoder wrote:
| Agreed, commits are snapshots, whether we like or not. For
| obvious storage efficiency reasons, the implementation then
| diffs/packs/etc, but this is a different issue altogether.
|
| I have found that I can't work with git with a different mental
| model (diffs). Every time things get messy, the diff model is
| not enough, whereas snapshots + commit graph + names/pointers
| make things natural.
|
| Interestingly enough, when migrating people from svn to git,
| explaining the actual model makes the transition much smoother,
| so it would seem I'm not the only one.
| koolba wrote:
| > Why have so many people written long thoughtful explanations
| about how the author is wrong to suggest snapshots are a better
| mental model, and that you think all abstractions are leaky,
| but you find diffs a better mental model?
|
| Once you remember (learn?) that a commit can have N parents, it
| becomes apparent that it cannot be a single diff.
| nightpool wrote:
| > Why have so many people written long thoughtful explanations
| about how the author is wrong to suggest snapshots are a better
| mental model, and that you think all abstractions are leaky,
| but you find diffs a better mental model?
|
| Probably because, to take their words at face value, they find
| diffs a better mental model? I think impugning "people [...]
| are so deeply attached to their understanding that they didn't
| hear most of what the article said" is a real bad faith
| reading, especially when you even acknowledge that central to
| people's arguments is "all mental models are leaky". This
| article may be technically correct about the way git internals
| are structured, but it makes cherry-pick and rebase _more_
| mentally complex for users to understand (you first have to go
| from commit = > patch), not _less_.
|
| Saying "Commits are collections of files + a parent commit, but
| you can diff it to generate a patch" and saying "Commits are a
| patch + a parent commit, and you can apply it to generate a
| collection of files" are isomorphic mental models--the fact
| that #1 is "correct" (for some value of correct that doesn't
| include the actual files stored on disk) is really besides the
| point.
| iudqnolq wrote:
| My point is that people criticizing TFA's proposed mental
| model are missing the fact that TFA doesn't propose a mental
| model, it explains how things work. Both have value, but
| they're distinct.
| nightpool wrote:
| I disagree. TFA is _explaining the mental model Git uses to
| structure their codebase_. If you 're writing code for Git,
| this is obviously very useful to understand, but if you're
| just using it, this is only one of several mental models
| available to you. In this case, I think it's right to say
| that the distinction the author is attempting to draw is
| immaterial to those not working on the Git codebase.
| iudqnolq wrote:
| If your code is written in a certain way that's a model,
| not a mental model.
| iainmerrick wrote:
| Yes! It just seems so strange not to care about how
| things _actually are_ in software. Is it a way of coping
| with the fact that so much software is so deeply layered
| and complex now?
|
| Maybe I'm misremembering, but I feel like I didn't see
| this usage of "mental model" much until fairly recently.
| The first I recall being surprised at was a discussion of
| a "mental model of Javascript" -- why would you need a
| mental model of something with a very detailed spec and
| multiple compatible implementations to study? If you want
| to understand how some aspect works, just look up how it
| actually does work.
| smallnamespace wrote:
| People are disagreeing with the author, not because they didn't
| necessarily read the article, but because _they don 't agree
| about how things should be defined_.
|
| At the root, this is a disagreement about semantics and
| philosophy, not about git itself. I'm going to refer to
| Aristotle here: _we think we have knowledge of a thing only
| when we have grasped its cause_ , and there are four general
| 'causes' [1]:
|
| - The material cause: 'What is it made of?'
|
| - The formal cause: 'What is the _ideal_ of this thing? ' ,
| e.g. what's its abstract nature?
|
| - The efficient cause: 'How did this thing come to be?'
|
| - The final cause: 'What is its purpose?' How is it actually
| used? What role does it play in the world?
|
| Here we can see that commits are used (at least in the git
| internals) as 'snapshots' -- they refer to bytes, not changes
| in bytes. That's pretty close to the formal and efficient
| causes -- the abstraction inside of git is closest to a
| snapshot, and that comes from the history of what Linus wanted
| when he wrote it.
|
| But! The underlying storage uses deltas (which are diffs) to
| save space. That's the material cause.
|
| But also, when we actually _use_ commits, git often creates
| diffs for us as a convenience (cherry-picking, rebasing), and
| hides the fact that they 're snapshots under the hood (final
| cause).
|
| So there's an inherent tension between the different ways to
| answer 'what is a thing?'. For commits, this is especially bad,
| since there's an even split between 'causes'.
|
| This tension never goes away because the most useful definition
| really depends on the context.
|
| [1] https://plato.stanford.edu/entries/aristotle-
| causality/#FouC...
| cesarb wrote:
| > The underlying storage uses deltas (which are diffs) to
| save space.
|
| Not necessarily! The base git storage stores each object
| individually, not as deltas ("disk space is cheap"); it's
| only after a "git gc" that they are stored as deltas to other
| (potentially unrelated) objects. The original implementation
| of git didn't even have the delta storage (pack files), it
| was added later as an optional optimization.
|
| So answering to "what it's made of?" with "deltas" comes with
| a huge caveat, that it's often partially or completely
| untrue.
| haberman wrote:
| > But! The underlying storage uses deltas (which are diffs)
| to save space. That's the material cause.
|
| This does not make the "commits are stored as diffs" story
| much more true:
|
| 1. This is only true of pack files, but pack files are only
| created once the repository exceeds a certain size.
|
| 2. Nothing about the pack file format requires that deltas
| follow the chronology of commits at all. The deltas could be
| stored in reverse order or even random order compared to the
| chain of commits.
|
| 3. The deltas in a pack file do not correspond to a change in
| a given commit, they are just the data to create a particular
| snapshot. If you find that a commit's file blob is stored in
| a pack file as a delta, that does not tell you anything about
| whether the file changed in _that particular commit_. You
| have to look at two commits and diff them to determine which
| files actually changed.
|
| If a person wants to think about version control in an
| abstract way, then yes the two views (commits vs diffs) are
| somewhat interchangeable. If a person wants to understand
| what actually happens when you run a Git command, the answer
| to that question is less open to interpretation.
| efaref wrote:
| The true zen of source control is that they are _both_.
| iudqnolq wrote:
| This is exactly what I'm talking about. A person posts "this
| is literally how this works", and someone replies
| "philosophically I would prefer to think it works
| differently, therefore you're wrong".
| ndand wrote:
| I used to think commits as snapshots, but it was confusing. Then
| I read "Git Internals".
|
| A commit contain the "whole" content of each file that we've
| commited. But since a commit has a pointer to a root commit, it
| also represents a working directory. Even though a commit contain
| "whole" files, the git internally stores only parts of the files
| as an optimization.
|
| When we diff two commits, we see the difference of the file
| contents in the corresponding working directories that the
| commits represent.
| dmuth wrote:
| If anyone does want to get more into the internals of Git without
| playing with a production repo, I built a "playground" awhile ago
| which creates a simple Git repo of synthetic commits which you
| can then play around with:
|
| https://github.com/dmuth/git-rebase-i-playground
|
| I know it says "rebase -i", which originally what I built it for
| (and what the exercises in the README are for), but you can
| really do whatever you want in it, and blow away/rebuild the repo
| with the included script.
|
| Enjoy!
| d_tr wrote:
| My first tutorial was the Pro Git book, and this fact was
| stressed well there so it stuck. Thinking of commits as snapshots
| also has the small advantage of making the first commit less
| special.
| Tomminn wrote:
| It strikes me as bizarre that something as old and as important
| as git is to the general version control problem, doesn't have a
| beautiful, complete and helpful user interface.
|
| With the status quo how it is, I definitely love articles like
| this because every time I use git I get a kind of anxiety that
| fades only in proportion to the depth with which I understand
| actual git mechanics.
|
| The thing I find strange is that when I interact with databases
| that have beautiful, helpful user interfaces, I have almost none
| of this anxiety, and just kind of accept "black box that handles
| things", and move on with my life.
|
| I figure I must not be alone in this psychological niche. Which
| again, makes it bizarre that the problem of giving git a
| beautiful, complete, helpful front end has not been solved.
| motoboi wrote:
| > It strikes me as bizarre that something as old and as
| important as git is to the general version control problem,
| doesn't have a beautiful, complete and helpful user interface.
|
| It has several.
|
| Tower is a wonderful interface in MacOS, Sublime-Merge too.
|
| Github is another, Gitlab also a very good. Gog is a free as in
| beer option too.
|
| There are several. None has dominated the market, tough.
| marcodave wrote:
| Would you like to talk about our lord and saviour IntelliJ ?
| nwatson wrote:
| I like SourceTree from Atlassian, I dip into the command-line
| from time to time but it meets many needs.
|
| Only problem is, no Linux version, only macOS and Windows. But
| that's now solved with WSL2 ... code in Linux/Docker/PyCharm
| etc on Windows WSL2, SourceTree on Windows.
| adamnew123456 wrote:
| I guess I'll be the one to make the obligatory "magit is
| awesome and if you use Emacs you should definitely check it
| out" comment.
|
| Other that being horribly slow on Windows I can't think of any
| downsides. Aside from the very rare black magic incantations it
| does everything I've needed from a Git frontend.
|
| If something like it existed for SVN ($JOB VCS of choice,
| sadly) I would abandon Tortoise in a heartbeat. IntelliJ is
| nice but the overhead of the VCS add-ons kill my startup time.
| SCLeo wrote:
| I can't agree with you more. git commands are definitely not
| designed for the current main stream usage (i.e. with services
| like GitHub/GitLab). Simple things like forking a repo from
| another user and edit locally requires >10 non-straight forward
| steps is far from ideal.
| mekkkkkk wrote:
| There are so many tools to help with this though? If you want
| to work with Github, there is an official Github CLI tool
| that makes forking easy peezy. Gitlab doesn't have an
| official one AFAIK, but there are unofficial ones. And if you
| want GUI there's a myriad of those as well. I don't
| understand this complaint at all.
| rhabarba wrote:
| Darcs users disagree.
| siawyoung wrote:
| Commits are conceptually snapshots, and everything else Git does
| is just an optimization over the naive "keep all versions of all
| files ever" (imagine implementing a version of Git that is just
| zipping the entire folder). Diffs are isomorphic to commits and
| are generated as needed.
|
| I wrote about it (albeit imprecisely) here:
| https://siawyoung.com/git-intuition
| samatman wrote:
| This blog post is the most compelling argument I've yet seen for
| pijul.
|
| Git _should work the way we think it does_! It 's confusing that
| snapshots are being converted into a few different forms of
| change object, which can be reconciled with merges or rebases or
| applying patches.
|
| Pijul (and darcs before it) actually works on the basis of
| patches, pijul with a robust theory of patches. A cherry-pick
| just moves a patch from one history-of-patches (branch) to
| another history-of-patches. One can share just a patch, and
| applying it is guaranteed to be the same action everywhere if
| that's possible, which it often is.
|
| I'm patiently waiting for pijul to be mature enough that I can
| move everything over to using it, it's one of the more exciting
| projects in the last ten years.
| jayd16 wrote:
| Can I make a shallow clone in Pijul?
| volta83 wrote:
| > I'm patiently waiting for pijul to be mature enough that I
| can move everything over to using it
|
| Pijul is super slow. I've tried it a couple of times, and is
| too slow to be usable.
| smichel17 wrote:
| I don't view git as a series of diffs. I view it as a logical
| extension of my file system to include a time dimension (or in
| fewer words, as snapshots).
|
| It replaces file-v1, file-v2, file-v2-with-changes-from-Alex,
| etc, that you commonly find on the hard drives of people not
| familiar with version control. That it can generate meaningful
| diffs is a product of the type of data we're storing.
| ausbin wrote:
| > Git should work the way we think it does!
|
| Hold on, who is "we"? Personally speaking, git works the way I
| think it does. Granted, I've written my own (simple) libgit2
| frontend, so I understand the git internals fairly well, on a
| high level at least
|
| I haven't looked into pijul, but why is teaching people a new
| tool more helpful than teaching people how the tool they
| already use works? (Like the OP blog post does.)
|
| Am I blinded by the knowledge I gained from writing my little
| tool and learning about git internals? I get that a tool you
| need to learn the internals of to use is probably a bad tool,
| but is asking git users to understand the contents of the OP
| blog post really too much? Maybe I'm just a git fanboy...
| notdonspaulding wrote:
| >> Git should work the way we think it does!
|
| > Hold on, who is "we"?
|
| I'm not the GP, but I agree that git should work the way "we"
| think it does, and I think a reasonable definition of "we" in
| the context of Git Users is probably SaaS/Startup/SMB
| software engineers.
|
| Git is popular enough to have many thousands of different use
| cases, but I would speculate that the distribution of use
| cases probably follows the distribution of public
| Github/Gitlab repos pretty closely.
|
| > Personally speaking, git works the way I think it does.
| Granted, I've written my own (simple) libgit2 frontend,
| ...snip...
|
| > Am I blinded by the knowledge I gained from writing my
| little tool and learning about git internals?
|
| Yes.
|
| > I get that a tool you need to learn the internals of to use
| is probably a bad tool, but is asking git users to understand
| the contents of the OP blog post really too much?
|
| Yes. Or rather, knowing git's internals is incredibly helpful
| if you've already decided to use git and now you're deciding
| _what workflow to use to develop software_ , because you can
| match your mental model of how to use git to the way git
| naturally wants to represent your stored work.
|
| However, if you come to git with an existing mental model of
| software development, and that existing mental model includes
| the idea of "branches" or "diffs" or "immutable history",
| then you're going to quickly and repeatedly run into
| stumbling blocks as your mental model doesn't match git's
| internal model. Git can _do_ branches and diffs and immutable
| history, of course, but they 're a leaky abstraction on top
| of the concepts git really cares about.
|
| > Maybe I'm just a git fanboy...
|
| Sure, nothing wrong with that!
| klodolph wrote:
| > Git _should work the way we think it does!_
|
| I think it works using snapshots... or are you saying that Git
| should work the way that _you_ think it does, and not how _I_
| think it does?
|
| It's clear that Git is not the final evolution of version
| control systems, that we are just currently in the "Git era"
| and at some point we're going to be in the "post-Git era" of
| VCS. It's unclear what that looks like, but I am skeptical when
| I hear these claims about Pijul.
|
| > One can share just a patch, and applying it is guaranteed to
| be the same action everywhere if that's possible, which it
| often is.
|
| My understanding is that you need to define a very weak version
| of "same version everywhere" which is useless. With Git, you
| can merge and get no conflicts, but that is no guarantee that
| the patch applied successfully... it just means that the merge
| operation didn't run into any obstacles. It's not just the
| patch that needs to be vetted by humans, it's the _state_ which
| must also be vetted, and that 's one of the problems that Git
| solves well.
| cmeacham98 wrote:
| No idea what Pijul is, but how does this not describe git?
|
| Unless your complaint is that a commit is really a set of
| diffs/patches?
| chriswarbo wrote:
| Pijul (and Darcs) operate on sets of patches. As a simple
| example, git commits have at least one 'parent', which
| imposes an order, e.g. let's say I edit file X in commit x
| and file Y in commit y; if I want both of those changes, git
| forces me to apply them in a particular order, e.g. [x, y].
| If someone else applied those same two commits in a different
| order, they'll get a different commit ID, which may cause
| problems e.g. when trying to merge their changes with ours.
|
| If we treat x and y as (sets of) patches instead, then the
| set {x, y} is the same as the set {y, x}; the order doesn't
| matter (we say those patches _commute_ ).
|
| The idea of commuting patches is _really_ useful, since we
| can rearrange patches to a more convenient form. For example,
| if we commit something we shouldn 't (like a password, or a
| huge binary), then later remove it, a system like git makes
| it hard to remove that file from the history. If we're
| dealing with sets of patches, we can simply swap them around
| until the 'add file' and 'remove file' patches are next to
| each other, then merge those two patches. Voila, the file no
| longer appears, the rest of the history remains intact, the
| branch's content is guaranteed to remain unchanged (since we
| only swapped commuting patches, which doesn't change
| anything; and merged two patches, which doesn't change
| anything).
| dan-robertson wrote:
| People using git think that commits are patches. But that
| isn't how git works. Git sometimes tries to let you treat a
| commit like the diff between it and it's parent and lets you
| try to rewrite history but these are really making new
| commits with new ids and this confuses people.
|
| In pijul, the objects you interact with _actually are_ diffs
| (aka patches) and then snapshots are well-formed sets of
| patches. Here, well-formed means that if a patch is in the
| set then so are it's dependencies (these dependencies aren't
| like parent commits in git, they're more like you need to add
| line 3 before you can delete it). So removing or modifying a
| patch in a branch isn't a horrific interactive rebase
| operation anymore.
|
| When you move a patch in pijul it doesn't affect any of the
| patches written before or after it (unless they depend on
| it). When you "move a patch" in git you rewrite the history
| and create new commits, so if I was talking about a commit
| (id) before the move, I would be talking about some dangling
| commit after the move and would need to update my id to the
| corresponding new post-move commit.
| diegocg wrote:
| I have visited the pijul site 2 or three times, every time I
| would start reading about a "sound mathematical theory", get
| bored, and close the tab. To this date I still don't know what
| pijul is trying to do and why I should be interested on it.
|
| They really should improve their documentation (hint, in case
| someone reads this: nobody except a few geeks give a shit about
| sound mathematical models. Show me how pijul makes my life
| easier compared to git, that's all I need)
| avodonosov wrote:
| Have you ever rebased a long chain of git commits onto new
| branch, where one of the first of those commits have a
| conflict with the new base, and after resolving this conflict
| for that commit you have the same conflict over and over
| again for all the subsequent commits, even if they did not
| modify that place in the code, and you need to manually
| resolve it again and again?
|
| Pijul will, as I understand, save us from those unnecesary
| repeated "conflicts".
|
| See also the answer by @chriswarbo about removing unndesired
| changes from history
| zemo wrote:
| you know about rerere right? https://www.git-
| scm.com/book/en/v2/Git-Tools-Rerere
| MattIPv4 wrote:
| I feel like this is missing something about the
| drawbacks? Or are there truly no drawbacks beyond disk
| usage for the cache, and folks should just enable it once
| they're aware it exists?
| mekkkkkk wrote:
| I guess it involves a bit of assumptions and guesswork to
| automatically replay your previous actions to files that
| in turn may have changed. It probably slightly increases
| the chances of Git doing something you didn't expect, and
| not tell you about it. Hence why it isn't default. Maybe?
| avodonosov wrote:
| No, I did hot know that, repeated everything manually.
| Will try that next time, thank you.
|
| BTW, pijul docs mention rerere as helping "in some
| cases":
|
| > This is why in these systems, conflicts are often
| painful, as there is no real way to solve a conflict once
| and for all (for example, Git has the rerere command to
| try and simulate that in some cases).
|
| https://pijul.org/manual/why_pijul.html
| bombcar wrote:
| I feel their example (with the ABGX) just makes me think
| "merging can result in weirdness silently and git and
| pijul do it different but silent" - it doesn't really
| argue that one is better than the other.
|
| (Most people probably use git as an effectively infinite
| string of zip files anyway. https://xkcd.com/1597/ )
| jayd16 wrote:
| Could one not add a new rebasing strategy to git by
| generating patches from the git history? Are the concepts
| non-translatable?
| avodonosov wrote:
| I think rebase alreaby works by generating patches, but
| for some reason the repeated conflicts happen...
| mdnahas wrote:
| I strongly disagree.
|
| Snapshots are a useful concept for programming. Each snapshot
| represent a compilable program with a certain set of features.
| So snapshot A has a certain set of features and B has another.
|
| Diffs are not a useful concept. Does the diff between A and B
| represent the new features in B? No. Because if it did, it
| would mean I could take any another compilable snapshot C and
| apply the diff of A and B to it, then I should end up with a
| snapshot D is compilable and has all the features of C with the
| new features in B. And that doesn't work with any programming
| language I know.
|
| It doesn't even work with the most trivial features.
|
| Diffs may be a useful concept when working with some data
| formats. But for programming languages, snapshots are the right
| concept.
| masukomi wrote:
| this... seems so very flawed and disprovable to me. Ignoring the
| obvious storage issues that have been discussed if commits were
| snapshots you could rebase and reorder them without ever worrying
| about conflicts. In reality you very much DO have to worry about
| conflicts because they are change instructions that transform a
| file from A->B->C if you try and reorder it as A->C->B you're
| going to have serious issues (assuming these all touch the same
| code) because C is a transformation from the B state to the C
| state. It blows up attempting to convert A->C because the
| instructions in that transformation describe going from B->C.
|
| > A commit is a snapshot in time. Each commit contains a pointer
| to its root tree,
|
| it so... _so_ very much isn't. It's not even a snapshot in time
| of a section of a file.
|
| It's a change instruction. No, it's not a "diff" but it also
| isn't a snapshot.
| nyanpasu64 wrote:
| commits are snapshots. cherry-picking/rebasing diffs a commit
| and its (first?) parent, and applies the diff on a base commit
| to create a new commit.
|
| if you `git replace` a single commit and change its contents,
| its children do not change their contents, so `git show`ing any
| direct child will show a new diff, not previously present,
| reverting the actions you've performed in `git replace`.
| cryptonector wrote:
| Yes, exactly, this is a very good post on the nature of Git.
|
| > Branches are pointers
|
| Yes. I would say they are named pointers. Commit hashes are weak,
| unnamed pointers.
| karmakaze wrote:
| This comes up from time to time and each time the comments debate
| the correctness/effectiveness of the title.
|
| The contents of the post does shed much light on how git operates
| and introduces a view that can help in navigating how to use git.
|
| Whether or not you want to think of a commit as a snapshot or a
| diff isn't material. It's best to think of it as a dual, since a
| diff on any base can create a snapshot, and a snapshot can create
| a diff from a snapshot.
|
| This very much mirrors the idea of a transaction log (of diffs)
| and a 'current' state. The current state is convenient, can
| benefit performance, but is not absolutely necessary. It doesn't
| even have to be the most recent, e.g. key frames in video
| compression. These are all just ideas, getting used to them and
| being able to move viewpoints between them is better than
| clinging to any one of them.
| ashton314 wrote:
| I really liked this video: the guy first walks you through how to
| build your own git-like utility with a handful of shell commands,
| then goes and walks through an actual git repo:
|
| https://youtu.be/qq_s2Hh--aQ
|
| Even the first 20 minutes was enough for me to have a
| substantially better understanding of how git works.
| mberning wrote:
| Am I the only person that doesn't want to understand the inner
| workings of my VCS in lurid detail? I don't have to know as much
| about any other developer tool in order to use it effectively.
| outworlder wrote:
| > I don't have to know as much about any other developer tool
| in order to use it effectively.
|
| You don't? How do you debug problems?
| mberning wrote:
| I just use the debugger and it mostly works how I expect. I
| don't have to go and study the data structures and other
| intricacies of the debugger itself to puzzle out why it works
| the way it does. Git is terrible in that way, as evidenced by
| the thousands of blog posts of people trying to describe the
| inner workings of it and how it will "make more sense" once
| you understand it as well.
| bombcar wrote:
| Think about ZFS - ZFS works perfectly fine for me and I don't
| have the foggiest idea how it works beyond "copy on write"
| magic.
| detaro wrote:
| Do you feel like you need to know this to use git? What did the
| blog post change about your use of git?
| [deleted]
| necovek wrote:
| > I believe that Git becomes understandable if we peel back the
| curtain and look at how Git stores your repository data.
|
| I agree, and like many, I have been saying that for years (nay,
| for more than a decade): and that's exactly the problem!
|
| You don't need to understand how an internal combustion engine
| works to drive a car... You don't need to understand how your
| graphics card renders stuff to develop a web page... You don't
| need to know how a brushless motor works to use a drill...
|
| There is a pattern there, and it's the one that makes sense.
|
| I've read up on the internals of git a dozen times by now. But I
| only occasionally need to do something weird that makes me go
| back to it, so I usually forget the relevant bits.
|
| The trouble is that I've used a distributed VCS that did not ask
| me to understand internals and it had a sane UI, and good model
| (like tree-like commit history, so a top-level commit log would
| only have merges, but you could dive deeper into individual
| commits if you so pleased). It wasn't perfect, but it's hard for
| me to accept that we have gone with a subpar solution where every
| "tutorial" starts with how you need to understand the internals!
| But you also need to memorise them, dammit!
|
| Just like I keep forgetting the Emacs rectangle editing shortcuts
| since I seldom use them, I'll keep forgetting the specifics of
| git internals that I might need once every 12 months.
|
| And it's not me, it's _you_, git!
| mdnahas wrote:
| Sadly, the bad part is git's user interface. It hides the
| pretty parts underneath.
|
| There is a concept of "the next commit" or, equivalently, "the
| pending commit". In the documentation, this gets called
| "indexed" or "cached" or "staged" --- three different names!
| And if you want to diff with it, you can't refer to it by name.
| You need to use an option, so it's "git diff --cached <other
| commit>.
|
| I know git's internals, mostly because it lets me navigate its
| bad user interface.
| gpspake wrote:
| I think it's a tragedy that just about every developer uses git
| but most learn add, commit, branch, and merge and then just stop
| learning.
|
| A lot of people are scared of rebase and cherrypick and shut down
| or get defensive when you mention them or try to encourage their
| use.
|
| The result is, because developers only have a hammer, they brute
| force merge everything which results in grotesque conflict
| resolutions and commit histories and makes it hard to untangle
| problems.
|
| At a previous job, another developer was kind enough to walk
| through rebasing on the command line with vim. I was receptive
| and in about 10 minutes, I realized there was a significant set
| of standard features and day to day Git use I was previously just
| oblivious to.
|
| These days, the UI for rebasing and cherry picking in Gitkraken
| is state of the art and effortless and I use them every day
| without hesitation and without the fear that comes from not
| understanding or knowing what I'm doing. Still, I constantly
| struggle with coworkers merging feature branches from 100 commits
| ago in to new feature branches and brute force resolving
| conflicts across half a dozen files in one commit without any
| context.
|
| I see it all because I have visibility in to the history and
| branch relationships but I still get shrugs and eye rolls when I
| bring it up. I don't necessarily want to dictate nitpicky git
| usage but I have a hard time accepting when people just to refuse
| how rebasing and cherrypicking work when they're both core basic
| features of a tool we all use every day. Proper Git use is one of
| those hills I'll die on, though so I don't intend to shut up
| about it any time soon :)
|
| Edit: My practical advice: If you use git every day and you don't
| know how to rebase, reset, cherrypick, and stash from the command
| line, make it a goal. Then, once you're comfortable, learn how to
| do it in a visual tool like Gitkraken and make an effort to
| incorporate them in to your daily workflow. My guess is things
| will become a lot less tedious and confusing when things get
| messy.
| 9dev wrote:
| Honestly, I don't really see your point. Yes, We keep our
| commit messages as clean and descriptive as possible. Yes, if
| we have the time, we split our commits into logical groups of
| changes. Yes, we work on feature branches for mature projects.
| We do all this with the git integration of IntelliJ, and I
| don't see the slightest reason to waste any time with the
| syntax of our version control tool! I'd gladly force everyone
| on the team to use ,,stuff" as the single, exclusive commit
| message, if that improved velocity (which it obviously
| doesn't). Because all this discussion about proper git usage is
| nothing but bike-shedding.
| dr-detroit wrote:
| NOBODY on my team learned ANYTHING about git. They stopped
| using it on new projects its like everyone considers their
| (shabby) work proprietary. Thank you for coming to my TED talk:
| life is suffering
| forrestthewoods wrote:
| > I think it's a tragedy that just about every developer uses
| git but most learn add, commit, branch, and merge and then just
| stop learning.
|
| This is because Git is too hard to use.
|
| How do I know that Git is too hard to use? Because there are
| literally thousands of blog post tutorials explaining how easy
| Git is to learn. Things that are easy do not need thousands of
| different guides telling you how easy it is.
| JadeNB wrote:
| > This is because Git is too hard to use.
|
| > How do I know that Git is too hard to use? Because there
| are literally thousands of blog post tutorials explaining how
| easy Git is to learn. Things that are easy do not need
| thousands of different guides telling you how easy it is.
|
| I'm not sure that's convincing. I think that a lot of guides
| about how easy it is indicate that it's _slightly_ difficult
| to learn. That results in a lot of people struggling for a
| little bit, overcoming the struggle, and feeling a sense of
| accomplishment and enlightenment, which they then want to
| share.
|
| (There's also a difference between how hard something is to
| _use_ and how hard it is to _learn_. I 'd argue that there's
| often a trade-off to be made, where some sacrifice on
| difficulty learning results in a reward in ease of use--in
| the sense that, for example, vim is far easier to use than
| any other editor for a seasoned vimmer.)
| gkoberger wrote:
| The counter argument would be that maybe git is so basic and
| easy to learn that everyone feels comfortable enough with it
| to write a tutorial?
|
| I think a large amount of content is more a factor of Git's
| ubiquity than its difficulty.
| Supermancho wrote:
| > git is so basic and easy to learn that everyone feels
| comfortable enough with it to write a tutorial?
|
| Nobody writes tutorials on how easy Lyft or Uber apps are
| to use. Easy interfaces don't need lots and lots of
| tutorials. That's exclusively the result of poorly designed
| interfaces AND complicated systems.
| gkoberger wrote:
| Lyft is an app. Git is a tool for developers. I have no
| clue how they're related.
|
| That being said, I googled "How to use Lyft" and there's
| a ton of results.
| zwieback wrote:
| My SCM journey: RCS-PVCS-cvs-VSS-MKSSI-svn-Perforce-git
|
| I hate all of them but learned to use them because what's the
| alternative?
| gpspake wrote:
| I respectfully disagree on the basis that, yes revision
| control is hard, but git provides a relatively beautifully
| simple api and vocabulary on top of an inherently complex and
| absolutely necessary set of concepts.
|
| When you're working on a codebase with multiple people, there
| are going to be changes and the changes have to be
| consolidated and the conflicts have to be resolved. I
| believe, with a reasonable amount of time and effort,
| developers can learn that API and vocabulary and I have yet
| to encounter anything comparable in terms of ease of use and
| "grok-ability" - especially with modern GUI tools.
|
| git is one of the most ubiquitous and unavoidable
| technologies in software development and it's 100% worth the
| time and effort to understand and be good at it.
| ajross wrote:
| > This is because Git is too hard to use.
|
| Git is hard to _LEARN_. It is objectively very easy to _USE_
| for those who have learned it, so much so that the population
| of "I used to use git until I found ..." evangelists is
| effectively zero. Tools like mercurial exist in the
| marketplace of ideas mostly by peeling off users who haven't
| yet started using git productively by promising 80% of the
| features for 10% of the effort.
|
| In fact, I don't know that there has been a new tool since
| vim or emacs that so well illustrated this dichotomy between
| ease of learning and ease of use.
|
| But to be honest: it really is needlessly hard to learn. The
| content of the linked article is that git is built on an
| extremely simple foundation of data structures and operations
| that anyone can understand. But the takeaway from the article
| is that _no one does understand it_ , because that layer is
| hidden behind a facade of tools that completely obscure it.
| Where are the "blobs" in git reset? What is the "index"? Is
| it a "tree" (it's not, IIRC)? I definitely agree with people
| who complain about the porcelain layer's design. But I still
| use git every day and love it.
| morsch wrote:
| You have evidence that git is hard to use, but no evidence
| establishing that it's _too_ hard. Some things just aren 't
| easy.
| outworlder wrote:
| > Edit: My practical advice: If you use git every day and you
| don't know how to rebase, reset, cherrypick, and stash from the
| command line, make it a goal. Then, once you're comfortable,
| learn how to do it in a visual tool like Gitkraken and make an
| effort to incorporate them in to your daily workflow. My guess
| is things will become a lot less tedious and confusing when
| things get messy.
|
| I would add git bisect to the list. It's incredibly useful (if
| your codebase is sane).
| trulyme wrote:
| I read some description ob what that is and it looks like
| checking out different commits (via bisection) until you
| figure out where in the history some change happened. Is
| there some other benefit I am missing?
| iudqnolq wrote:
| It automatically does a binary search, and you can use it
| completely automatically if you can write a script that
| determines if the bug was present.
|
| The other day I used it to write a good bug report. I first
| used it to find the earliest commit I could compile on my
| machine, then I used it to find the commit where a certain
| command would fail.
| jfengel wrote:
| It's not about finding a particular change, it's about
| which change broke something.
|
| In the best cases, it's totally automatic. You know that it
| worked at commit A and is broken by commit Z. So it checks
| out commit M and runs the tests. If they succeed, then it
| broke somewhere between M and Z. If they fail, then it
| broke somewhere between A and M. So it checks out either H
| or S, depending, and repeat.
|
| It's not always that easy, especially when your tests and
| environment are complicated. There's often manual
| intervention, which is tedious. Still, log2 N steps is
| often manageable, especially if the computer is taking care
| of the tracking for you.
| trulyme wrote:
| Thank you (and the sibling) for the explanation, sounds
| useful!
| zwieback wrote:
| True, but you could also argue the opposite view that it's a
| sign of git's usability that beginners can get by with just
| those commands. The problem doesn't crop up until those lazy
| users start doing things that make the repo messy.
| phailhaus wrote:
| That's honestly the opposite of good design. It's hiding the
| complexity to make it seem easy for beginners, then slamming
| them with inscrutable error messages when they "don't use it
| right." It leads to a system with a deceptively gentle
| learning curve that requires you to suddenly learn everything
| all at once when you hit an issue.
| zwieback wrote:
| Good point. I'm not sure which side I'm on, to me git feels
| like a good core with an atrocious UI, even after years of
| use I have to look up whether to use this or that option
| and whether it's uppercase or lowercase, do I use "--" or
| ":" and on and on.
|
| Mind you, I'm not complaining, most utilities I've written
| for my internal users are worse! It's when something gets
| out and used by the masses that you wish you had had the
| time to put together a coherent user interface.
| redisman wrote:
| It's really just using the old generic version control
| commands everyones used to.
| phendrenad2 wrote:
| I know how to rebase, reset, cherry-pick, stash, reflog,
| assume-unchanged, and many other advanced techniques.
|
| I still prefer to add/commit/branch/merge. I often copy-paste
| changes into a new branch, just because I don't enjoy recalling
| arcane commands from memory or googling them for the umpteenth
| time.
|
| I suspect that git is a leaky abstraction that doesn't fit the
| corporate software development workflow. I think that git is a
| hammer and non-distributed development is the screw we're
| hitting with it.
| robaato wrote:
| How do you track which cherry picks have been done?
| chriswarbo wrote:
| I avoid rebase because it rewrites history, and I'd rather have
| an accurate log of what happened. Cherry-pick can certainly be
| useful for grabbing particular _commits_ ; although I find
| myself more often using `format-patch`+`am` to grab particular
| _files_ (which also works across repos).
| iudqnolq wrote:
| I never understood the "rewrites history" argument. The
| original commits don't necessarily faithfully represent my
| thought process, I might have a larger commit because I got
| distracted in the middle of the day or a shorter commit
| because I wanted to make sure my code was backed up at the
| end of the day.
| bentcorner wrote:
| I use rebase often _because_ it rewrites history - it lets me
| squash commits into a conceptual single commit, or re-order
| commits together that chronologically make more sense next to
| each other.
| secondcoming wrote:
| We've never had an issue with git merge'ing all over. I've
| never looked at our git graph because I've never had to.
|
| Maybe rebase is a tool to help poor software development
| practices? (and your colleagues letting branches go stale is
| one of those)
| Sn0wCoder wrote:
| I agree if you merge early and often everything goes smooth
| and if you have a stale branch your doing something else
| wrong. At least for the work I do, web dev. Some of our
| pipeline folks are working under a different paradigm and
| they run into these problems often but also understand how to
| rebase and cherry pick. As always just depends,
| kazinator wrote:
| > _help poor software development practices_
|
| Says someone who's "never looked at our git graph".
| secondcoming wrote:
| Entertain me then, why do you need to look at it?
| f154hfds wrote:
| Be safe out there everyone. Squash with caution, don't force
| push master. And remember, reflog is always there to help if
| you get into trouble.
| kgwgk wrote:
| > about every developer uses git but most learn add, commit,
| branch, and merge and then just stop learning.
|
| What are branch and merge? /j
| bluedino wrote:
| Git has over-complicated source control for the majority of
| developers. Things were much simpler with svn.
| MereInterest wrote:
| Svn's complete inability to handle merges at a file level may
| have been simpler, but was by no means better. Needing to
| coordinate who is allowed to edit each file in order to avoid
| a painful merge is common with svn, and nearly unheard of
| with git. Svn looks at the inherent complexity of multiple
| people working on the same code base, shrugs, and figures
| that it is somebody else's problem.
| kevin_thibedeau wrote:
| SVN is a versioned storage engine pretending to be source
| control. Branching and tagging? We'll just expect everyone to
| obey an implied policy in our filesystem tree. Oh you think
| you want to merge these things that have a clear ancestor in
| the DAG? Not so fast buddy.
| chipotle_coyote wrote:
| IIRC, merging in SVN is... basically something you never
| wanted to do. :)
| tehbeard wrote:
| > Things were much simpler with svn.
|
| Not from my experience as a newbie with SVN at the start of
| uni and in the beginning at $job.
|
| In both cases, it was temperamental, prone to network issues
| (this was in both student accom. -> uni server, and LAN at
| $job) and did not like users working on the same files.
|
| Git took some learning, and it took reading Git Magic
| <http://www-cs-students.stanford.edu/~blynn/gitmagic/> to go
| from <https://xkcd.com/1597/> to the friend mentioned in the
| alt text.
|
| SVN still feels like I'm pulling teeth all these years later.
| ravenstine wrote:
| Rebasing and cherry-picking are awesome tools once you know how
| what they're actually doing. I think people avoid rebase for a
| few reasons; the term "rebase" doesn't mean anything outside of
| Git so it's not obvious what it is doing under the hood, and
| inexperienced Git users might use it to change the history on
| the main branch, which I see as an antipattern.
|
| There's nothing inherently wrong with merging, but I personally
| don't like it because I find merge commits harder to understand
| than regular commits. Better to use things like rebasing and
| cherry-picking to move commits arbitrarily and then squash some
| commits into units of work that make sense.
|
| Stash is crappy though, IMO, because it's not branch-specific.
| Instead of stash, I like to fork the branch I'm working on and
| create a "WIP" commit. That way I don't lose track of work I
| had in progress that only belongs in a certain branch.
| breck wrote:
| Agreed. Git should be treated as a deep skill, as important to
| practice and train with as unit testing and regular
| expressions.
|
| Think of your git history as a product and art form in itself.
| If you don't enjoy writing your commit history, readers will
| not enjoy reading it.
|
| On a tactical level, I highly recommend buying Sublime Merge 2.
| blacktriangle wrote:
| Alternative interpretation: git is a terrible terrible tool.
| It solve's Linus' problem, but he also wrote the damn thing.
| Had GitHub not entered the scene we'd likely all be using
| something else, maybe even SVN still.
|
| Maybe distributed source control really is this complicated
| and treating git as a deep skill is justified, but having
| also used Darcs and Mercurial I have a hard time believing
| that git's usability issues are inherent complexity and are
| in fact an artifact of git itself.
| codyb wrote:
| Regular expressions? I really enjoy using them and futzing
| around but I encourage the people I work with to stay away
| and to avoid using in production when possible.
| trulyme wrote:
| Whoa, why? That sounds like an awful advice to me. By all
| means, use regexes, but make sure you understand the theory
| behind them (state machines) so you will know not to parse
| HTML with them. They are really pretty easy to do right
| once you grasp the concepts.
| breck wrote:
| You can write correct ad hoc regex parsers for many
| subsets of HTML depending upon your needs.
| turminal wrote:
| Regular expressions are useful as tools for searching
| through code, filtering logs, searching through the
| filesystem..., even if you never commit them into your
| codebase.
| breck wrote:
| Yes, good clarification. I probably write 10x (maybe
| 100x? more?) more throwaway regexes than regexes I
| commit.
| chipotle_coyote wrote:
| I found Sublime Merge -- although it may have been version 1
| when I tried it! -- very unintuitive and fiddly, a lot like
| using Vim for doing three-way merges. Definitely one of those
| "YMMV" kinds of tools. (I mean, I'm sure Vim is terrific at
| it once you get the hang of it.)
|
| Personally, I've settled on
|
| * Getting pretty familiar with the git command line
|
| * Using a decent GUI diff and three-way merge tool (I use
| Kaleidoscope)
|
| * Using GitUp, an open source Mac git client, on occasions
| where I want to get kind of arcane: committing individual
| lines of files in separate commits, re-ordering commits (on
| an unmerged feature branch because I'm not a complete
| monster), etc.
|
| I suspect having already discovered GitUp is a good chunk of
| why I didn't get into Sublime Merge; it can do a lot of
| advanced stuff in ways I personally find easier to grok.
| Vinnl wrote:
| It's also one of the few tools that is likely to be a constant
| factor in your job for a long time. Yes, it's not super easy,
| and you can sort of get by with minimal effort, but it's not
| that much time to invest compared to how much benefit you'll
| reap.
|
| And I don't mean memorising commands and their arguments, but
| rather _understanding_ Git from first principles.
|
| (I wrote this visual tutorial for that purpose, takes about
| fifteen minutes to go through :
| https://agripongit.vincenttunru.com/)
| bombcar wrote:
| This looks nice but scrolling on Safari on Big Sur is causing
| weird artifacts. maybe if I scroll really slow
| gregmac wrote:
| > My practical advice: If you use git every day and you don't
| know how to rebase, reset, cherrypick, and stash from the
| command line, make it a goal. Then, once you're comfortable,
| learn how to do it in a visual tool like Gitkraken and make an
| effort to incorporate them in to your daily workflow.
|
| I do agree you should learn rebase, reset, cherrypick and
| stash, but I don't agree that you need to learn on the CLI. I
| mean, use the CLI if you prefer that, but the git GUIs are
| perfectly adequate for performing any of these operations.
|
| I used to use git CLI heavily, but in the past few years I have
| simply not needed to, to the point that aside from a small
| handful of the most common operations I don't even remember a
| lot of it anymore. Partly this is due to maturity of the GUIs,
| and partly because old practices like SSHing to a dev server to
| edit+commit something there are just totally obsolete and
| unnecessary these days.
|
| Even for a simple commit, there's a massive convenience of
| seeing the timeline and being able to interactively stage and
| look at diffs that is just miles ahead of CLI, and lets me
| break down commits into better units of work and write better
| messages.
|
| There's a stupid gatekeeping thing some developers still do
| about git CLI, I don't get it. It's as valid as dictating what
| text editors, color schemes, input devices, or OSes "real
| developers" use. Judge people on their output, not their tools.
| trulyme wrote:
| I don't force my choice on others, but I'm firmly in "git
| cli" camp. The reason is simple - cli is available
| everywhere. I'm sure GitKraken & co. are great once you get
| used to them, but apart from GitLab graph view (which GitHub
| sadly lacks, and cli tool afaik also) I don't miss anything.
| But again, this is just my personal preference and I agree
| that developers should be free to use gui if they prefer it.
| gregmac wrote:
| > cli is available everywhere
|
| I guess that's part of what I was getting at though. Where
| are you doing your development that isn't your workstation?
|
| I work on a whole bunch of different things -- from
| personal stuff running on my laptop or VMs in my house to
| cloud services deployed across dozens of AWS services --
| and things have just got to a point where I have no need to
| do a commit anywhere but my workstation. (Well,
| technically, I have two: one personal, one work).
|
| I definitely used to do it years ago, but now I don't
| remember the last time I had to use git on a remote system.
| trulyme wrote:
| I use two at the moment but they change over time, along
| with the OS. I am not talking about developing remotely,
| though it does happen that I use git push to deploy for
| some smaller projects. But I could have used GUI for that
| too. I guess I just don't want to get used to a tool I
| won't be able to keep using.
| leokennis wrote:
| I think the issue here is that using git is not a goal unto
| itself. Git is a tool/system that should get out of your way as
| much as possible. Instead it has arcane commands and options
| making anything but the most basic operations Shakespeare
| novels on the command line.
|
| My goal is to have my code in the repo. So if git starts being
| a pain, it's much easier to store my edits locally, pull a
| fresh copy of the repo, copy over my edits again and commit +
| push.
|
| If you have a good cook, let him/her cook dishes and let
| someone else care about sharpening the knives and cleaning the
| dishes.
|
| If you have a brilliant programmer, let him/her write good
| code. Don't bother them with understanding binary trees and
| hashes of snapshots of diffs of local repo's of pointers of
| objects in a blob graph lalalalala.
| fshbbdssbbgdd wrote:
| My usual workflow is to frequently merge master into my feature
| branch during development, then I squash before merging back to
| master. As far as I can tell, this gives a clean history so
| shouldn't bother the rebase fans (who prioritize a simple
| commit graph), and shouldn't bother bisect fans (who get
| confused by fake historical commits). Is there something bad
| about this approach?
| mschuster91 wrote:
| > A lot of people are scared of rebase and cherrypick and shut
| down or get defensive when you mention them or try to encourage
| their use.
|
| Because a _lot_ of people have been burned and way too many
| hours been lost due to a rebase gone wrong. Cherry-pick and
| stash are trivial operations, reset (outside of "undo a git
| add") and especially rebase are not.
|
| The learning curve for both is _steep_ , the potential for
| failure extremely high, so I understand why organizations go as
| far as entirely banning rebase.
| [deleted]
| zo1 wrote:
| People are scared of rebase because of the constant scare-
| mongering around it. "Rebase is Evil" and "Never use Rebase",
| etc. Then we end up with junior devs that are too-scared to
| even use their git IDE's built-in rebase-remote branch onto
| remote so they end up littering the entire repo with "Merged
| branch-A from origin into branch-A".
|
| It's so bad that even seasoned developers that haven't delve
| deeply into git have no idea that this sort of rebase is
| practically harmless. Instead they parrot "Rebase is Evil"
| without thinking twice.
| Stratoscope wrote:
| What puzzles me is how resistant many developers are to using
| or even considering a Git GUI. I prefer SmartGit, but GitKraken
| is nice too.
|
| People tell me, "I'm so much more productive on the command
| line" and then it turns out all they know is pull/commit/push
| and using a local branch. Anything outside that brings terror:
| "I never use rebase. What if something goes wrong in the
| rebase? Now I've lost all my work and I have to pull a fresh
| copy of the repo from scratch."
|
| Yes, I have heard exactly that.
|
| One thing I love about SmartGit is how it unifies features that
| the Git command line presents as separate and unrelated
| concepts. The reflog? Click the Recyclable Commits checkbox and
| now all of your reflog commits show up as ordinary commits just
| like any other.
|
| Stashes? Same thing. Turn on the checkbox to make a stash or
| all stashes visible and now they show up as ordinary commits,
| which is all they are under the hood.
|
| Want a diff between two commits, whether they be normal commits
| or stash or reflog commits? Click one commit, ctrl+click the
| other, and you instantly see the differences between the two.
| No need to check out a reflog commit temporarily just to have a
| look at it.
|
| Yet I have only had a 5-10% success rate in getting anyone to
| take a look at any Git GUI, much less using one. I would be
| really interested in understanding why so many developers are
| reluctant to doing anything other than the Git command line.
| kevincox wrote:
| I agree with your assessment. I think GUIs are great things
| that you don't do often enough to memorize (or for things
| that are inherintly visual, but not relevant here) but they
| are often looked down on.
|
| There are many people who do enough git to know how it works
| well and be familiar enough with enough of the commands that
| they don't need a GUI and are likely faster on the command
| line. But for every one of those people there are at least 2
| who would work faster, and more accurately with a good GUI.
| Solvitieg wrote:
| In practice, rebasing increases conflicts, requires teams to
| time their merges, and obfuscates the history of the project.
|
| I never understood why people think this is a good pattern.
| mekkkkkk wrote:
| This. I don't know if it's because of a lack of understanding
| or bad workflow, but almost all of the times I've seen bugs
| caused by git operations to slip through the cracks, it's
| been because someone decided that a pretty history was of
| upmost priority. Rebasing is probably necessary in some
| cases, but it can be a real foot gun as well.
| Espressosaurus wrote:
| In a large repo with many people merging, it helps keep
| things organized.
|
| In my experience you can make an argument for a merge-based
| workflow up to around 6 people. By 12 it's painful and hard
| to track what's going on, doubly so when you have a dev
| branch and multiple sustaining branches or something more
| complicated.
|
| By the time you get to 100 people or more committing to the
| same repo, it just becomes absolute chaos, and at least you
| can maintain a semblance of sanity in your official branches
| by forcing a rebase-based workflow on them.
| TheLocehiliosan wrote:
| What you need to understand is people use rebase on their
| unshared branches. It's part of crafting your commit history
| to be a coherent set of atomic changes instead of the path
| you took while developing it all.
|
| You rebase BEFORE you merge into the mainline branch.
| fshbbdssbbgdd wrote:
| Do you run your test suite against each of the commits you
| create when rebasing? If not, isn't this "coherent set of
| atomic changes" misleading? It seems like a lot of effort
| to make a fake clean-looking history.
| xoudini wrote:
| There shouldn't be an issue in doing so. During a rebase
| you'll either have no conflicts -- in which case there
| isn't an issue -- or you'll have to stop to resolve
| conflicts, and you might as well run tests before
| continuing the rebase. In both cases I'd argue that the
| statement "coherent set of atomic changes" applies.
| fshbbdssbbgdd wrote:
| Correct me if I'm missing something here - but a lack of
| conflicts during rebase only means that the few lines
| surrounding your changes weren't changed in the upstream.
| The rest of the repo changed, and this will often cause
| some kind of inconsistent state. I've encountered this
| situation frequently when using git bisect.
| xoudini wrote:
| When you rebase, you basically replay the history of your
| branch since it diverged from the branch you're rebasing
| onto. Thus, the branch is always in a consistent state
| (or equally consistent to when you originally authored
| the commit you're replaying). And of course this assumes
| the target branch is already in a consistent state.
| fshbbdssbbgdd wrote:
| If the upstream is like this:
|
| A -> B
|
| And you branch off B and start making changes, then the
| upstream continues on its own:
|
| A -> B -> C -> D
|
| Now you rebase your dev branch off D. Your changes get
| replayed on top of D and create new commits. Some of
| those commits might not be valid, because they take code
| that worked in the context of B and put it in the context
| of D. The history seems clean if all you do is look at
| the diffs, but if you bisect and try to use the repo in
| one of the rewritten commits, you may find it doesn't
| even compile (even if that commit was fully functional
| before rebasing).
| xoudini wrote:
| Hm, you're right. The simplest example I could think of
| right now is the upstream having renamed/deleted
| something that the dev branch depends on, but didn't
| directly touch. That would definitely cause a "broken"
| history during the rebased commits, and is technically
| unavoidable.
| breischl wrote:
| When I've done this, your private/dev branch may be a
| series of broken commits. Then you rebase onto main,
| squash to one commit, and test (if necessary). So what
| shows up on the main branch is a single, squashed, tested
| commit that contains one logical unit of code (usually a
| feature or fix).
|
| In this model the main branch history is "real" in that
| it records the sequence of changes to the production
| code. It's "fake" in that it doesn't record the exact
| sequence of fumbling steps and backtracks you took to get
| there. But IME the latter is usually not very useful
| anyway.
| xoudini wrote:
| In some cases I agree, but squashes can end up so large
| that doing a `git bisect` (which is quite useful in
| finding the comparatively small commit which introduced a
| bug) becomes unfeasible.
| fshbbdssbbgdd wrote:
| I like the squashed commit approach. I get there by
| merging upstream into my dev branch when developing, then
| squashing before I merge my changes into the upstream. As
| far as I can tell, that has the same outcome as rebase
| with squash. Both approaches create a simple commit
| graph, and both avoid fake intermediate commits.
| Solvitieg wrote:
| For sure, I understand that.
|
| It tends not to be an issue when a developer is working on
| an isolated feature that only he or she cares about, that
| is reviewed in a timely matter, and gets directly committed
| to main.
|
| Often this is not the case.
| greggman3 wrote:
| I don't have a solution and maybe the problem is just not
| solvable but ...
|
| The tragedy is that git is so hard to learn. Start a github
| project (I know github is not git). Take a PR, have the PR have
| a conflict, now, try to explain to the new user how they can
| fix their PR via git to not conflict. You'll be stuck giving
| them a giant lesson, probably an hour to write the
| instructions, then several back and forths.
|
| Mostly, either they already know git and fix it themselves OR I
| give up and merge it by hand myself since it's easier than
| becoming a git teacher for them.
| outworlder wrote:
| > Take a PR, have the PR have a conflict, now, try to explain
| to the new user how they can fix their PR via git to not
| conflict.
|
| Is this a Git problem? I recall entire workdays being wasted
| on SVN and CVS back in the day with multiple people trying to
| make sense of a merge.
|
| In Git this is actually easier to do (and easier to do
| repeatedly, with git rerere and similar).
| iudqnolq wrote:
| It's a problem, and the place to fix it is in git. That
| makes it a git problem. Just because things were worse back
| in the day doesn't mean we can't have nice things.
| wruza wrote:
| _I think it 's a tragedy that just about every developer uses
| git but most learn add, commit, branch, and merge and then just
| stop learning._
|
| This implies that they think in a wrong way and not have a
| wrong tool. A real tragedy is that git took over the world (in
| minds of lovers of shiny-new things and in saas) without most
| of the world realizing that they don't even need it, because
| they wouldn't even like to think in its way. The world wanted
| quick subversion and instead got this in-all-regards UX
| monster.
___________________________________________________________________
(page generated 2021-04-08 23:00 UTC)