When exactly git truncates objects: why doesn't git gc delete commits?

I am working on a git course and want to mention that the lost links were not lost until git gc started. But having confirmed this, I found out that this is not so. Even after running git gc --prune=all --aggressive lost links still exist.

It is clear that I did not understand something. And before I say something wrong in the course, I want to get my facts straight! Here is an example script illustrating the effect:

  #!/bin/bash git init # add 10 dummy commits for i in {1..10}; do date > foo.txt git add foo.txt git commit -m "bump" foo.txt sleep 1 done; CURRENT=$(git rev-parse HEAD) echo HEAD before reset: ${CURRENT} # rewind git reset --hard HEAD~5 # add another 10 commits for i in {1..10}; do date > foo.txt git add foo.txt git commit -m "bump" foo.txt sleep 1 done; 

This script will add 10 dummy commits, reset to 5 commits in the past and add another 10 commits. Just before resetting, it will print the hash of the current HEAD.

I expect to lose an object in CURRENT after running git gc --prune=all . However, I can still run git show in this hash.

I understand that after running git reset and adding new commits, I essentially created a new branch. But my original branch no longer has links, so it does not appear in git log --all . I also assume that this will not be ported to any remote.

My understanding of git gc was to delete these objects. This does not seem to be the case.

Why? And when exactly does git gc delete objects?

+6
source share
1 answer

For the object to be cropped, it must meet two criteria. One of them is related to date / time: it had to be created 1 long enough to mature for collection. The "long enough back" part is what you set with --prune=all : you override the default setting of "at least two weeks."

The second criterion is where your experiment goes wrong. To be cropped, the object must also be inaccessible. As twalberg noted in a comment , each of your supposedly abandoned commit (and therefore their corresponding trees and blobs) actually refers to Git "reflog" entries.

For each such commit, there are two reflog entries: one for HEAD and one for the name of the branch to which HEAD itself pointed at the time of the commit (in this case, reflog for refs/heads/master , i.e. the master branch). Each reflog entry has its own timestamp, and git gc also completes the entries for the log, but with a more complex set of rules than the simple "14 days" by default to expire an object. 2

Therefore, git gc can first delete all reflog entries that support the old object , and then trim the object. It just doesn't happen here.

To view or delete reflog entries manually, use git reflog . Note that git reflog displays entries by running git log with the -g / --walk-reflogs (plus some additional display formatting options). You can run git reflog --all --expire=all to clear everything, although this is a baton when a scalpel may be more appropriate. Use --expire-unreachable for more selectivity. For more on this, see the git log documentation and, of course, the git reflog documentation .


1 Some Unix-y file systems do not store file creation time ("birth"): the st_ctime field of the st_ctime structure is the inode change time, not the creation time. If there is a creation time, it is at st_birthtime or st_birthtimespec . 3 However, each Git object is read-only, so the file creation time is also its modification time. Therefore, st_mtime , which is always available, gives the time the object was created.

2 The exact rules are described in the git gc documentation , but I think that by default, 30 days for unreachable commits and 90 days for achievable commits is a worthy summary. However, the definition of what is available here is unusual: it means accessibility from the current value of the link for which this reflog contains old values. That is, if we look at the reflog for master , we find the commit that master identifies (for example, 1234567 ), and then see if each reflog entry for master (for example, master@ {27} ) can achieve this particular commit ( 1234567 again).

3 This specific confusion of names is given to you by POSIX standardization specialists. :-) The st_birthtimespec field is a struct timespec in which seconds and nanoseconds are recorded.

+10
source

All Articles