How to reduce the depth of an existing git clone?

Question

How to reduce the depth of an existing git clone?

I have a clone. I want to reduce the history on it, without cloning from scratch with reduced depth. Work example:

$ git clone git@github.com:apache/spark.git # ... $ cd spark/ $ du -hs .git 193M .git

Good, so not so, but, but it will serve for this discussion. If I try gc , it will become smaller:

 $ git gc --aggressive Counting objects: 380616, done. Delta compression using up to 4 threads. Compressing objects: 100% (278136/278136), done. Writing objects: 100% (380616/380616), done. Total 380616 (delta 182748), reused 192702 (delta 0) Checking connectivity: 380616, done. $ du -hs .git 108M .git

However, quite large (git pull assumes it is still being pushed / removed on the remote). How about repack?

 $ git repack -a -d --depth=5 Counting objects: 380616, done. Delta compression using up to 4 threads. Compressing objects: 100% (95388/95388), done. Writing objects: 100% (380616/380616), done. Total 380616 (delta 182748), reused 380616 (delta 182748) Pauls-MBA:spark paul$ du -hs .git 108M .git

Yup, no less. --depth for repack for clone is not identical:

 $ git clone --depth 1 git@github.com:apache/spark.git Cloning into 'spark'... remote: Counting objects: 8520, done. remote: Compressing objects: 100% (6611/6611), done. remote: Total 8520 (delta 1448), reused 5101 (delta 710), pack-reused 0 Receiving objects: 100% (8520/8520), 14.82 MiB | 3.63 MiB/s, done. Resolving deltas: 100% (1448/1448), done. Checking connectivity... done. Checking out files: 100% (13386/13386), done. $ cd spark $ du -hs .git 17M .git

Git pull says it’s still one step away from the remote control, which does not surprise anyone.

OK - so how to change an existing clone to a small clone, did not attach it and did not forget it again?

+16

git

paul_h Jul 03 '16 at 16:21

source share

4 answers

since at least git version 2.14.1 is

 git fetch --depth 10

this will result in getting the latest commits from the source, if any, and then truncate (or lengthen) the local history to a depth of 10.

trimmed commits will no longer be accessible in the usual way, but will still be delayed in the repository (reflog). if there are no other links holding them, then they will eventually be deleted automatically by git gc .

You can also delete old commits immediately. To do this, you must remove all links that may contain them. it is mainly reflog and tags. then run git gc .

note that reflog clears after a while, but the tags remain forever. therefore, if you want to free up disk space from old commits, you need to remove the tags manually.

if you deleted the tags, then the next git fetch selection will only contain the tags for the commits that are currently in the repository.

clear log:

 git reflog expire --expire=all --all

remove all tags:

 git tag -l | xargs git tag -d

delete all hanging objects:

 git gc --prune=all

+7

lesmana Sep 01 '17 at 16:37

source share

Edit, February 2017: This answer is deprecated / erroneous. Git can make a small clone smaller, at least internally. Git 2.11 also has --deepen to increase clone depth, and it looks like there are possible plans to allow negative values (although now they are rejected). It's not clear how well this works in the real world, and your best bet is to clone a clone, as in jthill's answer .

You can only deepen the repository. This is primarily due to the fact that Git is created around adding new material. The path to small clones is that your (receiving) Git receives the sender (another Git) to stop sending "new material" when it reaches the depth-cloning argument and the coordinate with the sender to understand why they stopped at that moment, although obviously more history is required. Then they write the identifiers of the “truncated” commits to a special .git/shallow file, which both places the repository as shallow, and the notes that commit are truncated.

Note that during this process, your Git is still adding new things. (In addition, when he completed the cloning and exits, Git forgot what depth is, and over time it becomes impossible to even figure out what it is. All Git can say that it is a shallow clone, because the .git/shallow file contains identifiers commit, still exists.)

The rest of Git continues to be built around this concept of “add new material,” so you can deepen the clone, but not increase its superficiality. (There is nothing good, consistent verb for this: the opposite of the deepening of the pit fills it, but the filling has the wrong connotation. Diminish may work, I think I will use it.)

In the theory of git gc , which is the only part of Git that ever actually kicks something, ¹ may possibly reduce the repository, even turning the complete clone into a small one, but no one wrote the code for this. There are a few complex bits, for example, do you drop tags? Shallow clones start using sans tags for implementation reasons, so converting a repository to a shallow one or reducing the existing shallow storage might require dropping at least some of the tags. Of course, any tag indicating a commit destroyed by a decreasing action should go away.

Meanwhile, the --depth argument to git-pack-objects (passed through git repack ) means something else: the maximum length of the delta chain when Git uses its modified xdelta compression on Git objects stored in each pack file. This has nothing to do with the depth of individual parts of the DAG commit (as calculated from each branch of the branch).

¹ Well, git repack completes throwing things away as a side effect, depending on which flags are used, but it invoked this path on git gc . This is true for git prune . In order for these two teams to really do their job properly, they need to start git reflog expire first. "Normal user" end of the sequence "clean" - git gc ; he is dealing with all of this. Therefore, we can say that git gc is how you discard the accumulated "new material", which turned out to be undesirable in the end.

+2

torek Jul 03 '16 at 21:05

source share

OK, here's a bash it attempt that ignores non-default branches, and also assumes the console is called "origin":

 #!/bin/sh set -e mkdir .git_slimmer cd $1 changed_lines=$(git status --porcelain | wc -l) ahead_of_remote=$(git status | grep "Your branch is ahead" | wc -l) remote_url=$(git remote show origin | grep Fetch | cut -d' ' -f5) latest_sha=$(git log | head -n 1 | cut -d' ' -f2) cd .. if [ "$changed_lines" -gt "0" ] then echo "Untracked Changes - won't make the clone slimmer in that situation" exit 1 fi if [ "$ahead_of_remote" -gt "0" ] then echo "Local commits not in the remote - won't make the clone slimmer in that situation" exit 1 fi cd .git_slimmer git clone $remote_url --no-checkout --depth 1 foo cd foo latest_sha_for_new=$(git log | head -n 1 | cut -d' ' -f2) cd ../.. if [ "$latest_sha" == "$latest_sha_for_new" ] then mv "$1/.git" "$1/.gitOLD" mv ".git_slimmer/foo/.git" "$1/" rm -rf "$1/.gitOLD" cd "$1" git add . cd .. else echo "SHA from head of existing get clone does not match the latest one from the remote: do a git pull first" exit 1 fi rm -rf .git_slimmer

Usage: 'git -slimmer.sh <folder_containing_git_repo>'

0

paul_h Jul 05 '16 at 1:28

source share

jthill · Accepted Answer · 2016-07-05T01:48:51+0000

 git clone --mirror --depth=5 file://$PWD ../temp rm -rf .git/objects mv ../temp/{shallow,objects} .git rm -rf ../temp

This really is not cloning from scratch, as it is a purely local work, and it creates practically nothing but files with closed packages, probably only tens of kilobytes. I would take the risk that you will not get more efficient than this, you are completing a job that uses more space in the form of scripts and test work than it does in the form of several kb repo overheads.

How to reduce the depth of an existing git clone?

More articles: