https://itoshkov.github.io/git-tutorial

itoshkov.github.io

Follow me on GitHub

Overview

    A conceptual model is an explanation, usually highly simplified,
    of how something works. It doesn't have to be complete or even
    accurate as long as it is useful.

    Don Norman - The Design of Everyday Things

In this tutorial I'll try to describe how git works, without using
git. Instead, we'll create a simple, git-like system using just zip
files diff and patch.

The idea is to build a good model of how git works conceptually.

You can go here if you prefer to watch this tutorial instead of
reading it.

I'll be using some simple Linux commands like mkdir, cd, ls, cp and
of course zip.

Version Control Without Git

The Humble Beginning

Let's start by creating our new project:

mkdir ProjectX
cd ProjectX

And let's put our top-secret program in main.c:

#include <stdio.h>

void main() {
        printf("ALL YOUR BASE ARE BELONG TO US.");
}

This is our starting point. This is our first version and we want to
save it and be able to come back to it. One way we can do that is to
copy the whole ProjectX directory, like this:

cd ..
cp --recusive ProjectX ProjectX-v0
cd ProjectX

Nice! Well, sort of.

When we run our program we see that we forgot to print new line at
the end. Let's fix this.

#include <stdio.h>

void main() {
        printf("ALL YOUR BASE ARE BELONG TO US.\n");
}

We can see the differences between this version and v0 like this:

diff --recusive --unified ../ProjectX-v0 .

The --recusive option tells diff go compare the directories
recursively, and the --unified - to show the differences in the
so-called unified format.

--- ../ProjectX-v0/main.c       2021-09-06 13:58:33.718024787 +0300
+++ ./main.c    2021-09-06 13:58:46.690096675 +0300
@@ -1,5 +1,5 @@
 #include <stdio.h>

 void main() {
-       printf("ALL YOUR BASE ARE BELONG TO US.");
+       printf("ALL YOUR BASE ARE BELONG TO US.\n");
 }

As you can see, the file main.c has one line removed and one line
added.

OK, it's time to save this new version of our software:

cd ..
cp --recusive ProjectX ProjectX-v1
cd ProjectX

Archives

One of the problems with copying folders like that is, that they tend
to take a lot of space. Also, it's easy to accidentally change things
in the "archive" folder, instead of in the working one.

We can fix these by using zip files instead of folders. Zip files
still can be changed, but at least it's harder to do so by mistake.

Let's also start calling these archives commits. For now a commit is
just a snapshot of the current sources, stored in a zip file.

Here's how to "convert" the old directories to the new format:

cd ..
cd ProjectX-v0
zip -r ../ProjectX-v0.zip .
cd ..
cd ProjectX-v1
zip -r ../ProjectX-v1.zip .
cd ..
rm -rf ProjectX-v0 ProjectX-v1 # remove the old directories

One thing we can't do directly anymore is to compare versions. But we
can just unzip the version(s) that we need in some temporary folders
and then compare. It's a bit of a hassle, but we can automate it with
a small script if we want to.

Another thing I don't like is, that the version archives are just
lying around and littering the parent directory. Let's move them in a
new folder inside the project:

cd ProjectX
mkdir -p .repo/commits
mv ../ProjectX-v0.zip repo/commits/c0.zip
mv ../ProjectX-v1.zip repo/commits/c1.zip

In Linux, UNIX and macOS files and folders starting with dot . are
"hidden". If you type ls you won't see them, but if you type ls -a
you will. Besides that, they are just normal files and folders.

We created a commits folder inside, because we'll proably want to put
other things in the .repo in the future.

To see how our current system works, let's modify the main.c file
again and make a new version:

#include <stdio.h>

void main() {
        printf("CATS: ALL YOUR BASE ARE BELONG TO US.\n");
}

zip -r .repo/commits/c2.zip .

  adding: main.c (stored 0%)
  adding: .repo/ (stored 0%)
  adding: .repo/commits/ (stored 0%)
  adding: .repo/commits/c1.zip (stored 0%)
  adding: .repo/commits/c0.zip (stored 0%)
  adding: main (deflated 85%)
  adding: main.o (deflated 67%)

Oops! We added the previous commits in the zip file too. Notice that
we are also storing the main.o and main files. These are not needed
as they are generated by the compiler.

To fix this, let's create a list of files that we want to track.
We'll put it in another hidden folder, called .commit.

mkdir .commit

Put the following in a .commit/track file:

.commit/*
main.c

rm .repo/commits/c2.zip
zip -r -i@.commit/track .repo/commits/c2.zip .

The -i@.commit/track option tells zip to include only the files
mentioned in the .commit/track file.

Branches

So far our versions are named sequentially c0, c1, c2 and so on. And
this order is the only thing that tells us that c2 was created from
c1, and that c1 was created from c0. That's good enough for linear
development, but is very insufficient for branching development.

But what are branches and why do we need them? Suppose that we have
released c2 in the world, and are working on new features. We have
created new commits c3 and c4, but overall the feature is not ready
yet.

While we are working on it, we receive complaints about some major
problem in c2. And our customers can't wait for us to finish with our
new feature. They need a fixed c2 now!

Luckily, we do have our c2 source code. We can "switch" to it, fix
the bug, and release the good version. But where should we put the
fix itself?

We might be tempted to temporary save the changes somewhere, switch
back to our c4 version, then apply the fix and make a c5 commit.
After all, we do want this fix to be part of the next version that we
are going to release, right?

But what if there's yet another problem with the "fixed" c2? We
wouldn't be able to go back to exactly this code, as we didn't save
it anywhere. No, we need a better solution.

We can modify the commits, so they "know" which their parrent commit
is. Let's add this in another file, called .commit/info. For c2 it
will look like this:

parent: c1

Since it is in the .commit folder, it will be tracked automatically.

The first commit is a bit special as it doesn't have a parent. Its
.commit/info will look like this:

parent:

We can also add more information to this file like:

  * author of the commit
  * date and time of the commit
  * one line summary
  * bigger, multi-line description of the changes

But we won't do that in this tutorial, to keep things simple.

We went back and fixed all our commits to have this file. Now we can
switch back to c2, implement the fix and create a new commit, let's
say c5. The commit history will now look like this:

c0 -- c1 -- c2 -- c3 -- c4
             \
              c5

Good! Now we can go back to c5 when we need to.

Like when we needed to properly fix the broken fix from before. Let's
assume we added one more commit to c4, we then went back to c5 and
implemented the new fix on top of it. The picture now will look like
this:

c0 -- c1 -- c2 -- c3 -- c4 -- c6
             \
              c5 -- c7

Names

So far so good, but we have just 2 branches and it's getting a bit
tedious to remember the top commit for each of them. Let's name them!

We can have a file for each branch, containing the name of the most
recent commit of this branch. For the main branch this will be c6 and
for release-1 - c7:

mkdir .repo/branches
echo 'c6' > .repo/branches/main
echo 'c7' > .repo/branches/release-1

The echo 'c6' > .repo/branches/main command will just create a new
file called .repo/branches/main and make its content be c6. If the
file already exists, it will overwrite it.

Let's create another commit in our main branch to see how it works:

# Set the parent info:
echo 'parent: c6' > .commit/info

# Make the commit:
zip -r -i@.commit/track .repo/commits/c8.zip .

# Update the branch pointer:
echo 'c8' > .repo/branches/main

This is tedious and errorprone. I hope somebody will create a program
to automate it ...

Switching

Above I was saying things like "let's switch to c2", or "let's switch
to main", but I never explained how. Here is how:

# Remove all the files and subfolders except for `.repo`:
rm -rf * .commit

# Unzip the relevant commit
unzip .repo/commits/c2.zip

It's not that bad, if we ignore the ugly way we clean up the working
folder, right?

But what if we now go to lunch, and when we come back next day we
forget which commit we switched to? Yes, sometimes lunches are that
long. We can then write some code and then commit:

# Set the parent info:
echo 'parent: c8' > .repo

# Make the commit:
zip -r -i@.track .repo/commits/c9.zip .

# Update the branch pointer:
echo 'c9' > .repo/branches/main

Oops! We committed in the wrong branch! The commit was based off of
c2 but we forgot that and treated it as if it was based on c8
instead.

It's time to add another level of indirection. We can create a file
.repo/HEAD, which will contain the name of the current branch or
commit.

Switching to a branch will look like this:

# Remove all the files and subfolders except for `.repo`:
rm -rf * .track

cat .repo/branches/main
# This will tell us which commit the main brach points to

# Unzip the relevant commit
unzip .repo/commits/c8.zip

# Update HEAD to point to  a branch
echo 'branches/main` > .repo/HEAD

Switching to a commit - like this:

# Remove all the files and subfolders except for `.repo`:
rm -rf * .track

# Unzip the relevant commit
unzip .repo/commits/c2.zip

# Update HEAD to point to specific commit
echo 'commits/c2` > .repo/HEAD

You might be wondering why would you want to switch to a specific
commit instead to a branch? A common case is if you want to go back a
couple of commits to see if a bug was present there or not.

In git, this situation when the HEAD points to a specific commit
instead of a branch, is known as a "detached HEAD state". I mention
it here, because it sounds scary (and a bit gruesome), but it's
really nothing to worry about.

Let's switch back to main as shown above and make a commit there:

cat .repo/HEAD
# --> branches/main (ok, we're on the main branch)

cat .branches/main
# --> c8 (the top commit of main is c8)

# Set the parent info:
echo 'parent: c8' > .commit/info

# Make the commit:
zip -r -i@.commit/track .repo/commits/c9.zip .

# Update the branch pointer:
echo 'c9' > .repo/branches/main

Notice that, even though we made a new commit, we didn't have to
change .repo/HEAD as we are still on the same branch.

Merging

Here is the current situation:

                                 HEAD -> main
                                          |
                                          v
c0 -- c1 -- c2 -- c3 -- c4 -- c6 -- c8 -- c9
             \
              c5 -- c7
                    ^
                    |
                release-1

We want to create a new release, but first we want to grab the fixes
from release-1 and apply them to our main branch.

One way to do that, is the following:

 1. Find the most recent commit, which is parent to both c9 and c7.
    This is c2.
 2. Find the changes between c2 and c7 (the release-1 branch). We can
    unzip both the commits in 2 temporary folders and then diff -r -u
    c2-temp c7-temp > changes.diff them.
 3. Patch the code in the working folder (c9): patch < changes.diff
 4. Manually fix any conflicts
 5. Commit

There's one more thing. This new commit should have two parents. So
let's make its .commit/info file read

parent: c9 c7

And the full picture now looks this way:

                                       HEAD -> main
                                                |
                                                v
c0 -- c1 -- c2 -- c3 -- c4 -- c6 -- c8 -- c9 -- c10
             \                                 /
              c5 -- c7 ------------------------
                    ^
                    |
                release-1

As you can see, we moved only main, but left release-1 unchanged.
That is because we merged the latter into the former.

Collaboration and remote repositories

TODO

Git

I will now make what is known as a pro move, and direct you to the
excellent Pro Git book! It is free and easy to read, and now,
hopefully, even easier to understand!

It might look like a cop-out and it is. But also, there are already
quite good tutorials, which describe how to use git. My main
objective here was to describe how git works and to some extent to
answer the question "why is it like that".

This page was generated by GitHub Pages.