How does the git tracking file change internally?

Can someone explain how git knows that the files X, Y and Z have changed? What is the process behind the scenes that recognizes when a file has not yet been added or has changes? I ask, because with Subversion it’s easy to understand that it keeps track of these things, having a .svn directory under each folder, but for git I can’t find a description of the internal workings of this, I doubt that it looks at all the subdirectories for changes, since this pretty fast.

So out of curiosity, what is the inner workings?

+8
git version-control
source share
4 answers

The mechanisms by which the status of a file is determined are quite simple. To find out which files were delivered, simply scatter the HEAD tree with the index. Any elements that appear only in the index were set to be added, all elements that appear only in HEAD were deleted, and all elements that differ from each other had changes.

Similarly, it would be possible to detect unspecified changes by dividing the index into the working directory.

In particular, your question asks how this can be so fast (after all, computing the SHA1 hash of a file is not entirely fast). Here the index appears again - also known as the cache. The index also has fields for file size and file modification time . That way, you can simply stat(2) create a file on disk and compare it with the size of the index file and the time the file changed to see if the hash file should or not.

+10
source share

You can find your answer in the free Pro-Git book in the Git chapter, Internal Versions.

This chapter explains how git works behind the hood.

As Leo said, git checks the SHA1 files to see if it has changed, you can check it (taken from git Internals):

 $ echo 'version 1' > test.txt $ git hash-object -w test.txt 83baae61804e65cc73a7201a7252750c76066a30 

Then write the new contents to the file and save it again:

 $ echo 'version 2' > test.txt $ git hash-object -w test.txt 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a 
+4
source share

If the answer in a possible duplicate is not enough, you can take a look at this http://www.geekgumbo.com/2011/07/19/git-basics-how-git-saves-your-work/

In short, Git uses SHA-1 file contents to track changes. Git tracks four objects: blob, tree, commit, and tag.

To answer your question about how it tracks changes here, a quote from this link:

A tree object is how Git tracks file names and directories. There is a tree object for each directory. A tree object points to SHA-1 drops, files in this directory and other trees, subdirectories during commit. Each tree object is encrypted, you guessed it, the SHA-1 hash of its contents and is stored in .git / objects. The name of the trees, since they are SHA-1 hashes, allows Git to quickly see if there have been any changes in any files or directories by comparing the name with the previous name. Pretty smooth.

+3
source share

I found this article very helpful.

https://codewords.recurse.com/issues/two/git-from-the-inside-out

Git is built on a graph. Almost every git team manages this graph. To deeply understand Git, focus on the properties of this schedule, not the workflows or commands.

Retrieve - make a commit that is not the first commit

The user sets the contents of data/number.txt to 2 . This updates the working copy, but leaves the index and HEAD fixed as they are.

The user adds the file to Git. This adds a blob containing 2 to the objects. It points the index entry for data/number.txt to the new blob.

+1
source share

All Articles