Git filter branch led to a disabled story: how to get rid of old commits?

The scenario is as follows:

I have a large CVS repository that I want to convert to 14 different git repositories. cvs2git part of the process is beautiful and leads to a large repo.git repository.

For each of the 14 git repo, I clone the main repo and I run the following command:

 git filter-branch -d /tmp/rep --tag-name-filter cat --prune-empty --subdirectory-filter "sub/directory" -- --all 

However, before this command, I need to run another git filter-branch command for some git repositories, because I need to rewrite the commits to move the file from the directory to another. --tree-filter is the option I'm using. The following is an example of a command line that is executed:

 script_tree_filter="if test -f rep/to/my/file && test -d another/rep ; then echo Moving my file ; mv rep/to/my/file another/rep; fi" git filter-branch -d /tmp/rep --tag-name-filter cat --prune-empty --tree-filter '$script_tree_filter' -- --all 

At the end of the process (14500 commits: it takes about 1 hour!) I clear ref and use git gc :

 git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d git reflog expire --expire=now --all git gc --prune=now 

In the end, I get a 1.2Go repository (which is still clearly too large), and looking at the commits, I see that many old ones are still present. They concern files and directories, which should no longer be here after the command --subdirectory-filter .

There is a gap in the commit history between unwanted and good commits, as shown in gitk --all :

discontinuity seen in gitk

I am sure that these commits are still present due to tags on some of them. If so, is it possible to remove those tags without deleting one of the good commits?

If tags are not the cause, any idea?

For more information, the contents of the refs directory (in the git repository obtained using the filter subdirectory) are empty:

 $ ls -R refs/ refs/: heads original tags refs/heads: refs/original: refs refs/original/refs: heads tags refs/original/refs/heads: refs/original/refs/tags: refs/tags: 

I found that the branches and tags are listed in the packed-refs file in the git repository:

 d0c675d8f198ce08bb68f368b6ca83b5fea70a2b refs/tags/v03-rev-04 95c3f91a4e92e9bd11573ff4bb8ed4b61448d8f7 refs/tags/v03-rev-05 

The file contains 817 tags and 219 branches.

+7
git git-filter-branch tree
source share
2 answers

I managed to solve my problem by changing the way I used cvs2git : instead of converting the entire CVS base and using the subdirectory-filter command, I converted each of the submodules I wanted. In my case, this led to the launch of 18 different cvs2git commands:

Before

 cvs2git --blobfile=blob --dump=dump /path/to/cvs/base # Module 1 git filter-branch --tag-name-filter cat --prune-empty --subdirectory-filter "path/to/module1" -- --all # Module 2 git filter-branch --tag-name-filter cat --prune-empty --subdirectory-filter "path/to/module2" -- --all 

Now

 # Module 1 cvs2git --blobfile=blob_module1 --dump=dump_module1 /path/to/cvs/base/path/to/module1 # Module 2 cvs2git --blobfile=blob_module2 --dump=dump_module2 /path/to/cvs/base/path/to/module2 

Each repository has a great story.

Why didn't the previous method work? I assume that cvs2git was confused with all submodules (some of them changed their directory name during their history).

@Michael @CharlesB Thank you for taking the time to respond and help me.

+5
source share

I bet you get hit with this:

  • Differences between CVS and git branch / tag models: CVS allows you to create a branch or tag from arbitrary combinations of source versions from several source branches. It even allows you to modify files that were not previously added to the same thread / tag. Git, on the other hand, only allows a complete source tree, as it existed at some point in history, to be branched or marked as a unit. Moreover, the origin of the git version has implications for the contents of this revision. This difference means that it is fundamentally impossible to present an arbitrary CVS history in a git repository 100% in good faith. cvs2git uses the following workarounds:

    • cvs2git attempts to create a branch from a single source, but if it cannot figure out how to do this, it creates a branch using a "join" of several source branches. In pathological situations, the number of merge sources for a branch can be arbitrarily large. The resulting history implies that whenever a file was added to a branch, the entire source branch was merged into the destination branch, which is clearly incorrect. (An alternative to omit the merge would be to discard information that some content has been moved from one branch to another.)

    • If cvs2git cannot determine that a CVS tag can be created from a single revision, then it creates a tag patch branch named TAG.FIXUP, and then tags that thread. (This is a necessary workaround so that git only allows marking existing revisions.) The TAG.FIXUP difference is created as a merge between all branches that contain the file changes included in the tag, which includes the same tradeoff described above for branches. The TAG.FIXUP branch is cleared at the end of the conversion, but (due to the technical limitation of the git quick import file format) is not deleted. There are several situations where a tag can be created from a single revision, but cvs2git does not understand this and creates an extra tag fix branch. After the conversion, you can remove the unnecessary tag patch branches by running the contrib / git -move-refs.py script file in the resulting git repository.

  • There is no verification that the CVS branch and tag names are git legal names. There may be other git limitations that should also be checked. see cvs2git

Do you show refs directory of new dirs or big repo after conversion? You can remove tags in one large export repo before filtering and splitting a large repo.

You can remove tags in a large repo by simply deleting the file in the directory - this is just a link to SHA.

+2
source share

All Articles