diff --git a/bib/bibliography.bib b/bib/bibliography.bib index c72659c..4a1b3b6 100644 --- a/bib/bibliography.bib +++ b/bib/bibliography.bib @@ -469,3 +469,14 @@ {https://doi.org/10.1146/annurev-statistics-060116-054148}, DATE_ADDED = {Tue Nov 17 08:59:07 2020}, } + +@Book{brown2012_volum_ii, + author = {Brown, Amy and Wilson, Greg}, + title = {The architecture of open source applications, Volume + {II}}, + year = 2012, + publisher = {Creative Commons}, + url = {https://www.aosabook.org/en/index.html}, + isbn = 9781105571817, +} + diff --git a/images/git-graphs/repo_labels_alice.m4 b/images/git-graphs/repo_labels_alice.m4 index 0acecb8..17e2fca 100644 --- a/images/git-graphs/repo_labels_alice.m4 +++ b/images/git-graphs/repo_labels_alice.m4 @@ -10,10 +10,10 @@ digraph repo_labels { m9 -> b1; include(`feature.dot') m7 -> f1; + node[color=cyan]; + edge[color=cyan]; f3 -> f4; - node[color=green]; - edge[color=green]; node[group=feature]; bobf1 -> bobf2 -> bobf3 -> bobf4; m7 -> bobf1; diff --git a/images/git-graphs/repo_labels_alice.svg b/images/git-graphs/repo_labels_alice.svg index 7a406d1..b58068a 100644 --- a/images/git-graphs/repo_labels_alice.svg +++ b/images/git-graphs/repo_labels_alice.svg @@ -115,13 +115,13 @@ bobf1 - + m7->bobf1 - - + + @@ -231,13 +231,13 @@ f4 - + f3->f4 - - + + @@ -253,35 +253,35 @@ bobf2 - + bobf1->bobf2 - - + + bobf3 - + bobf2->bobf3 - - + + bobf4 - + bobf3->bobf4 - - + + diff --git a/images/git-graphs/repo_labels_bob2.m4 b/images/git-graphs/repo_labels_bob2.m4 index bada7b7..7651c37 100644 --- a/images/git-graphs/repo_labels_bob2.m4 +++ b/images/git-graphs/repo_labels_bob2.m4 @@ -10,8 +10,8 @@ digraph repo_labels { m9 -> b1; include(`feature.dot') m7 -> f1; - node[color=red]; - edge[color=red]; + node[color=cyan]; + edge[color=cyan]; f3 -> f4; node[color=green]; diff --git a/images/git-graphs/repo_labels_bob2.svg b/images/git-graphs/repo_labels_bob2.svg index f8c0dc9..b510228 100644 --- a/images/git-graphs/repo_labels_bob2.svg +++ b/images/git-graphs/repo_labels_bob2.svg @@ -231,13 +231,13 @@ f4 - + f3->f4 - - + + diff --git a/posts/from-graphs-to-git.org b/posts/from-graphs-to-git.org index fae6f15..4be8001 100644 --- a/posts/from-graphs-to-git.org +++ b/posts/from-graphs-to-git.org @@ -1,7 +1,7 @@ --- title: "From graphs to Git" -date: 2021-03-01 -tags: git +date: 2021-03-08 +tags: git, graphs toc: true --- @@ -11,21 +11,24 @@ This is an introduction to Git from a graph theory point of view. In my view, most introductions to Git focus on the actual commands or on Git internals. In my day-to-day work, I realized that I consistently rely on an internal model of the repository as a directed acyclic -graph. This is not something very original, many people have said the -same thing, to the point that it is a running joke (TODO: insert links -here). However, I have not seen a comprehensive introduction to Git -from this point of view. +graph. I also tend to use this point of view when explaining my +workflow to other people, with some success. This is definitely not +original, many people have said the same thing, to the point that it +is a [[https://xkcd.com/1597/][running joke]]. However, I have not seen a comprehensive +introduction to Git from this point of view, without clutter from the +Git command line itself. How to actually use the command line is not the topic of this article, -you can refer to the man pages or the excellent [[https://git-scm.com/book/en/v2][/Pro Git/]] book. I will -reference the relevant Git commands as margin notes. +you can refer to the man pages or the excellent [[https://git-scm.com/book/en/v2][/Pro Git/]] book[fn::See +"Further reading" below.]. I will reference the relevant Git commands +as margin notes. My target audience is basically myself a few years ago: background in -maths and computer science, but no direct experience of large-scale -codebases in Git. I also assume that we are curious about the internal -model of Git: if you only want a quick fix for your latest mistake but -don't care about understanding what's going on, this post is not for -you. +maths and/or computer science, but no direct experience of large-scale +codebases in Git. I also assume that you are curious about the +internal model of Git and the related data structures: if you only +want a quick fix for your latest mistake but don't care about +understanding what's going on, this post is probably not for you. This post is also highly opinionated about what I consider important when working on production codebases in a professional setting. Of @@ -43,7 +46,7 @@ commit), a diff representing changes (some lines are removed, some are added), and a commit message. It also has a name[fn:hash], so that we can refer to it if needed. -[fn:hash] Actually, each commit gets a SHA-1 hash that identifies it +[fn:hash] Actually, each commit gets a [[https://en.wikipedia.org/wiki/SHA-1][SHA-1]] hash that identifies it uniquely. The hash is computed from the parents, the messages, and the diff. @@ -63,11 +66,18 @@ of it, using [[https://git-scm.com/docs/git-log][=git log=]]. Here is an example of a repo: + [[file:/images/git-graphs/repo.svg]] -In this representation, each commit points to its children, and they -were organized from left to right as in a timeline. The /initial -commit/ is the first one, the root of the graph, on the far left. +In this representation, each commit points to its +children[fn:parent-child], and they were organized from left to right +as in a timeline. The /initial commit/ is the first one, the root of +the graph, on the far left. + +[fn:parent-child] In the actual implementation, the edges are the +other way around: each commit points to its parents. But I feel like +it is clearer to visualize the graph ordered with time. + Note that a commit can have multiple children, and multiple parents (we'll come back to these specific commits later). @@ -100,7 +110,9 @@ an alias, in order to have meaningful names when navigating the graph. In this example, we have three branches: =master=, =feature=, and =bugfix=[fn::Do not name your real branches like this! Find a -meaningful name describing what changes you are making.]. +meaningful name describing what changes you are making.]. Note that +there is nothing special about the names: we can use any name we want, +and the =master= branch is not special in any way. /Tags/[fn:branch-tag] are another kind of label, once again pointing to a particular commit. The main difference with branches is that branches may move @@ -152,14 +164,18 @@ than one parent (for example, the fifth commit from the left in the graph above).[fn:merge:{-} As can be expected, the command is [[https://git-scm.com/docs/git-merge][=git merge=]].] -At this point, we need to talk about /conflicts/. Until now, every -action was simple: we can move around, add names, and add some -changes. But now we are trying to reconcile two different versions -into a single one. These two versions can be incompatible, and in this -case the merge commit will have to choose which lines of each version -to keep. If however, there is no conflict, the merge commit will be -empty: it will have two parents, but will not contain any changes -itself. +At this point, we need to talk about /conflicts/.[fn:merge-conflicts] +Until now, every action was simple: we can move around, add names, and +add some changes. But now we are trying to reconcile two different +versions into a single one. These two versions can be incompatible, +and in this case the merge commit will have to choose which lines of +each version to keep. If however, there is no conflict, the merge +commit will be empty: it will have two parents, but will not contain +any changes itself. + +[fn:merge-conflicts] {-} See /Pro Git/'s [[https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging][chapter on merging and basic +conflict resolution]] for the details on managing conflicts in practice. + ** Moving commits: rebasing and squashing @@ -265,7 +281,7 @@ Now Bob can see Alice's work, and has some idea to improve on it. So he wants to make a new commit on top of Alice's changes. But the =alice/feature= branch is here to track the state of Alice's repository, so he just creates a new branch just for him named -=feature=, where he add a commit: +=feature=, where he adds a commit: [[file:/images/git-graphs/repo_labels_bob2.svg]] @@ -292,4 +308,61 @@ about all of these in /Pro Git/. * Internals +*Note:* This section is /not/ needed to use Git every day, or even to +understand the concepts behind it. However, it can quickly show you + +Let's dive a little bit into Git's internal representations to better +understand the concepts. The entire Git repository is contained in a +=.git= folder. + +Inside the =.git= folder, you will find a simple text file called +=HEAD=, which contains a reference to a location in the graph. For +instance, it could contain =ref: refs/heads/master=. As you can see, +=HEAD= really is just a pointer, to somewhere called +=refs/heads/master=. Let's look into the =refs= directory to +investigate: +#+begin_src sh + $ cat refs/heads/master + f19bdc9bf9668363a7be1bb63ff5b9d6bfa965dd +#+end_src + +This is just a pointer to a specific commit! You can also see that all +the other branches are represented the exact same way.[fn:head:You +must have noticed that our graphs above were slightly misleading: +=HEAD= does not point directly to a commit, but to a branch, which +itself points to a commit. If you make =HEAD= point to a commit +directly, this is called a [[https://git-scm.com/docs/git-checkout#_detached_head]["detached HEAD"]] state.] + +Remotes and tags are similar: they are in =refs/remotes= and +=refs/tags=. + +Commits are stored in the =objects= directory, in subfolders named +after the first two characters of their hashes. So the commit above is +located at =objects/f1/9bdc9bf9668363a7be1bb63ff5b9d6bfa965dd=. They +are usually in a binary format (for efficiency reasons) called +[[https://git-scm.com/book/en/v2/Git-Internals-Packfiles][packfiles]]. But if you inspect it (with [[https://git-scm.com/docs/git-show][=git show=]]), you will see the +entire contents (parents, message, diff). + +* Further reading + +To know more about Git, specifically how to use it in practice, I +recommend going through the excellent [[https://git-scm.com/book/en/v2][/Pro Git/]] book, which covers +everything there is to know about the various Git commands and +workflows. + +The [[https://git-scm.com/docs][Git man pages]] (also available via =man= on your system) have a +reputation of being hard to read, but once you have understood the +concepts behind repos, commits, branches, and remotes, they provide an +invaluable resource to exploit all the power of the command line +interface and the various commands and options.[fn:magit:Of course, +you could also use the awesome [[https://magit.vc/][Magit]] in Emacs, which will greatly +facilitate your interactions with Git with the additional benefit of +helping you discover Git's capabilities.] + +Finally, if you are interested in the implementation details of Git, +you can follow [[https://wyag.thb.lt/][Write yourself a Git]] and implement Git yourself! (This +is surprisingly quite straightforward, and you will end up with a much +better understanding of what's going on.) The [[https://www.aosabook.org/en/git.html][chapter on Git]] in +cite:brown2012_volum_ii is also excellent. + * References