I don’t know exactly how GitHub does it, but here is a possible way. This requires some knowledge of how git stores its data.
The short answer is that repositories can share the objects database, but each has its own links.
We can even imitate it locally to prove the concept.
There are three things in the bare repo directory (or in the .git/ subdirectory if it's not bare) that are minimal for the repo to work:
- the
objects/ subdirectory in which all objects are stored (commits, trees, blobs ...). They are saved either as files with names equal to the hash of the object, or in .pack files. - The
refs/ subdirectory, which stores simple files, such as refs/heads/master , whose contents are the hashes of the object that it refers to. - a
HEAD file that says what is the current commit. Its value is either a raw hash (which corresponds to a separate head, that is, we are not on any named branch), or a text link to a ref link where the actual hash can be found (for example ref: refs/heads/master ), which means we are on the master branch)
Suppose someone creates their original (unbranched) orig repo in Github.
To simulate, locally we do
$ git init --bare github_orig
We assume this is happening on Github servers. Now there is an empty github repository. Then we imagine that from our own computer we are cloning the github repository:
$ git clone github_orig local_orig
Of course, in real life, instead of github_orig we will use https://github... Now we have cloned the github repository into local_orig .
$ cd local_orig/ $ echo zzz > file $ git add file $ git commit -m initial $ git push $ cd ..
After that, the github_orig object dir will contain our clicked commit object, one blob for file and one tree object. The refs/heads/master file will contain a commit hash.
Now let's get an image of what might happen when someone pressed the Fork button. We will create the git repository, but manually:
$ mkdir github_fork $ cd github_fork/ $ cp ../github_orig/HEAD . $ cp -r ../github_orig/refs . $ ln -s ../github_orig/objects $ cd ..
Please note that we copy HEAD and refs , but make a symbolic link for objects . As we can see, making a plug is very cheap. Even if we have dozens of branches, each of them is just a file in the refs/heads directory that contains a simple hexadecimal hash (40 bytes). For objects we refer only to the catalog of source objects - we do not copy anything!
Now we mimic that the user creating the fork locally clones the forked repo:
$ git clone github_fork local_fork $ cd local_fork $
We see that we have successfully cloned, although the repo to which we clone does not have its own objects , but a link to the original repo. The fork user can now create branches, github_fork , and then click on github_fork . Objects will be placed in the objects directory, which will be the same for github_orig ! But refs and HEAD will be changed and will no longer match the values in github_orig .
So, the bottom line is that all repositories belonging to the same forking tree have a common pool of objects, while each repo contains its own links. Anyone pushing themselves towards their forked repo modifies their own links, but puts objects in a shared pool.
Of course, in order to be really useful, you need to take care of something even more important - the main thing is that the git garbage collector should not be called if the repo in which it is called has knowledge about all the links, and not just about itself. Otherwise, it can drop objects in the shared pool that are not available from its links, but can be accessed from other refpos links.