How to reduce the size of a bloated Git repository by non-interactively crushing all commits except the last ones?

Question

How to reduce the size of a bloated Git repository by non-interactively crushing all commits except the last ones?

My Git repo contains hundreds of gigabytes of data, so I'm trying to remove old legacy commits because they make more and less. I need a quick solution; the faster the better.

How to crush all commits except the last ones and do it without having to manually square each one in an interactive rebase ? In particular, I do not want to use

git rebase -i --root

My repo

I have these commits:

 A .. B .. C ... ... H .. I .. J .. K .. L

I want this (crush everything between A and H in A ):

 A .. H .. I .. J .. K .. L

There is an answer to the squash of all commits, but I want to keep some of the later commits. I do not want to cut the last commits . (I especially need the first two counts to be counted from above.)

+7

git git-rebase git-rewrite-history rebase

sanmai Jun 11 '14 at 2:00

source share

3 answers

Original poster comments :

if we take a 10004 commit snapshot, delete all the commits before it and make commit 10004 the root commit, I will be just fine

One way to do this here is if your current job is called branchname . I like to use a temporary tag whenever I do a big reboot to double check that there were no changes, and mark the point where I can reset to return if something goes wrong (not sure if this is the standard procedure or not, but it works for me):

 git tag temp git checkout 10004 git checkout --orphan new_root git commit -m "set new root 10004" git rebase --onto new_root 10004 branchname git diff temp # verification that it worked with no changes git tag -d temp git branch -D new_root

To get rid of the old branch, you need to remove all the tags and branch tags; then

 git prune git gc

cleanse it of your repo.

Note that you will temporarily have two copies of everything until you have gc 'd, but this is inevitable; even if you make standard squash and rebase, you still have two copies of everything until rebase ends.

+3

MM Jun 11 '14 at 2:27

source share

Problem xy

Please note that the original poster has an XY issue where it tries to figure out how to crush its older commits (issue Y) when its real issue is actually trying to reduce the size of its Git repository (issue X), as I mentioned in the comments :

Having a lot of commits does not necessarily inflate the size of your Git repository. Git is very effective at compressing text files. Are you sure that the number of commits is the actual problem that leads to your large repo size? A more likely candidate is that you have too many versions of binary assets that Git also does not compress (or generally) compared to text files.

Despite this, for completeness , I will also add an alternative solution to Matt McNabb answer to problem Y.

Squash (hundreds or thousands) of old commits

As the original poster has already noted, the use of interactive permutation with the --root flag --root not be practical when there are many commits (numbering in hundreds or thousands), in particular, since interactive relocation will not work effectively on such a large number of them.

As Matt McNabb noted in his answer, one solution is to use the orphan branch as a new (crushed) root, and then to reinstall from above. Another solution is to use several different branch flushes to achieve the same effect:

 # Save the current state of the branch in a couple of other branches git branch beforeReset git branch verification # Also mark where we want to start squashing commits git branch oldBase <most_recent_commit_to_squash> # Temporarily remove the most recent commits from the current branch, # because we don't want to squash those: git reset --hard oldBase # Using a soft reset to the root commit will keep all of the changes # staged in the index, so you just need to amend those changes to the # root commit: git reset --soft <root_commit> git commit --amend # Rebase onto the new amended root, # starting from oldBase and going up to beforeReset git rebase --onto master oldBase beforeReset # Switch back to master and (fast-forward) merge it with beforeReset git checkout master git merge beforeReset # Verify that master still contains the same state as before all of the resets git diff verification # Cleanup git branch -D beforeReset oldBase verification # As part of cleanup, since the original poster mentioned that # he has a lot of commits that he wants to remove to reduce # the size of his repo, garbage collect the old, dangling commits too git gc --prune=all

The --prune=all for git gc ensures that all dangling commits are garbage collected, not just those older than 2 weeks, which is the default setting for git gc .

+2

user456814 Jun 11 '14 at 15:02

source share

jthill · Accepted Answer · 2014-07-23T18:46:58+0000

The fastest counting lead times will almost certainly be with transplants and filter-branch , although you could be faster with the adjusted commit-tree rev-list sequence.

Rebase is designed to apply changes to other content. What you are doing here is preserving the contents and intentionally playing back the history of changes that created them, so almost all the rebases of the most tedious and slow work are lost.

The payload here, working with your photo,

 echo `git rev-parse H; git rev-parse A` > .git/info/grafts git filter-branch -- --all

_{Documentation for git rev-parse and git filter-branch .sub>}

The filter branch is very careful that it can be restored after a failure at any time, which is certainly safe .... but it is really useful when restoring by simply updating it again will not be faster and easier if everything goes south you. Failures that happen rarely and restart, usually cheap, the thing is to do a “safe”, but very fast operation, which will almost certainly work. For this, it's best to do it on tmpfs (the closest equivalent I know on Windows would be ramdisk, like ImDisk ), which will flash quickly and won't touch your main repo until you make sure you have results that are Do you want to.

So, on Windows, let's say T:\wip is in ramdisk, and note that the clone is not copying anything here. Also, after reading the git clone --shared , look at the insides of the clone to see the real effect, it's very simple.

 # switch to a lightweight wip clone on a tmpfs git clone --shared --no-checkout . /t/wip/filterwork cd !$ # graft out the unwanted commits echo `git rev-parse $L; git rev-parse $A` >.git/info/grafts git filter-branch -- --all # check that the repo history looks right git log --graph --decorate --oneline --all # all done with the splicing, filter-branch has integrated it rm .git/info/grafts # push the rewritten histories back git push origin --all --force

There are enough possible options for what you might want to do, and what could be in your repo that could be useful for almost any of the options for these commands. The above testing will do what it says, but it may not be exactly what you want.

How to reduce the size of a bloated Git repository by non-interactively crushing all commits except the last ones?

My repo

Problem xy

Squash (hundreds or thousands) of old commits

More articles: