Migrating from svn to git. Which option is better: giant trunk, submodules, subtrees

I know there are many questions about the same, but I still need more information. I am exploring the possibility of porting our SVN repo to git and trying to figure out which approach (monolithic trunk, submodules, subtrees, etc.) will be the best for our repo.

Below is information about our project and SVN repository:

  • The project is a Java application packaged in war.
  • This is a modular application. Each module is developed by a separate team and then packaged as a can.
  • War depends on these cans.

Basically, our structure is as follows:

repo |-application(war) |-module1 (for example, ui stuff) |--module1Submodule1 |--module1Submodule2 |-module2 (for example, database access stuff) |-... 

Each module has its own tags and branches.

Size of svn repo on my local machine with all branches, tags, etc:

  • more than 2.5 million files
  • more than 20 GB of space
  • there are versions 311615
  • Files are mostly source code, blobs

Typical conditions:

  • 200+ Dev and QA of the whole team.
  • Various teams commit their modules / submodules. (This may be a problem with the git repo monolith, since git requires that all changes be made before clicking, svn only warns about changed changes)
  • Branch
  • Nested application

Future operations:

  • Gerrit
  • The developer commits, the commit is checked, the tests are executed with commit, if green, then commit is approved for merging with the master branch

Questions:

  • Can this repo be considered large for git (I mean there are many posts that note that git does not scale well for large repositories, but what is “big”?)
  • What are the pros and cons of each approach:
    • Monolith repo (only git like svn, anti-pattern?)
    • Submodules
    • Subtrees (Am I right, that every change in the module will require a commit in the repository of the subtree, and then push the change to the aggregated subtree of the repo?)
    • Separate repositories for each module
    • Any others..
  • Is it possible to save history from SVN for each of them?
  • I need as many links as possible (I did not find official links for "slow for large repo")

Thank you in advance!

+7
git svn migration
source share
3 answers

Story

History can be saved for all of these approaches using git svn: http://git-scm.com/book/en/Git-and-Other-Systems-Migrating-to-Git It is even possible to revert to previous commits.

However, there were suggestions not to save the story and just leave the svn repository frozen for about 6 months, and the whole story will change in the git repository. I do not agree with such advice, because history is important for our project. I bet no one makes that decision.

Giant Highway Approach

  • You must clone the entire large tree, even if you plan to only work in one subdirectory (basic use case)
  • some git commands will be slow (for example: git status, as needed to check the whole tree)
  • Even if you configure jenkins to run assemblies only for certain parts of repo (This can be done using the "include" property of the jenkins git plugin). Still need to pull out all the repos to complete the build. This is unlikely to affect the whole work, because a “clean” test will take a long time even to create small modules.

Concern: having 200+ Dev and QA as a whole, I suspect that it will be quite difficult to push the changes in the end.

  • Changes are transferred to the master branch only after the review is approved by gerrit and the tests have been passed, so we won’t have a continuous push-push flow, go bankrupt-pull-out by pressing
  • However, gerrit may reject the merge if the main branch has been changed since the commit was clicked on gerrit, you will need to click "rebase" and repeat the tests.
  • The Linux kernel has a monolithic repo, because c / C ++ has no control dependency, such as java: building a kernel that looks like a war against a dependency bank is not like that.

Trivia

What are the stages, their cost and the total cost of migration using this approach?

  • git svn clone SVN_URL REPO_NAME
  • Jenkins material

How can it support code coding? What changes are needed for the VCS / tools perspective? Suppose a full CI launch takes 15 minutes.

  • Jenkins must have an “include” filter in the scm trigger to filter changes for a specific part of the project. Ist is not so difficult, but still requires some effort to configure and verify them. In the case of "wiping the workspace before assembly", the entire repo should be cloned by all the time. This can increase the overall time from commit to “approved tests” because verification will be rather slow.

What are the effective workflows of developers?

  • Developers use local / remote function branches
  • Push changes to gerrit
  • Gerrit checks changes in tests
  • Change merges with the main branch

Submodules

Most of the caveats described here are http://git-scm.com/book/en/Git-Tools-Submodules , and here is http://codingkilledthecat.wordpress.com/2012/04/28/why-your-company-shouldnt -use-git-submodules /

The main problem is that you have to commit twice

  • To submodule itself
  • To aggregate a repo - update a submodule No sense. Why would you ever need repo aggregation if dependencies are managed through an artifact repository?

In fact, submodules are created for cases where there is a library that can be reused with different projects, but you want to depend on a specific library tag with the possibility of updating the link in the future. However, we will not mark every commit (only release after each commit), and changing versions of dependencies (released) in a war will be easier than maintaining submodules. Java dependency management makes things easier.

It is not recommended to point to the head of the submodule and leads to problems with submodules, so this approach is a dead end for snapshots. And again, we don’t need it, because java dependency management will do everything for us.

Quizzes What are the steps, their cost and the total cost of migration using this approach?

  • git svn clone SVN_URL REPO_NAME for each module
  • Create git repo aggregation
  • Add module repositories as submodules for repo aggregation

How can it support code coding? What changes are needed for the VCS / tools perspective? Suppose a full CI launch takes 15 minutes.

  • Gerrit supports both merging and commits with submodules, so it should do well.
  • Jenkins stuff - triggers for submodule changes and aggregation of repo changes (argh! No sense in two places!)

What are the effective workflows of developers? (Gerrit process is omitted)

  • Developers pass to the submodule
  • Creating a tag for fixing it
  • Developer moves to repo aggregation
  • cd to submodule, verification tag
  • commit repo aggregation with modified submodule hash

Or

  • Developer changes submodule
  • Discards changes to the submodule so as not to lose the changes.
  • commit repo aggregation with modified submodule hash

As you can see, the developer’s workflow is cumbersome (you always need to update two places) and does not meet our needs.

Subtrees

The main problem is that you will have to commit twice To a subdirectory with a combined tree Press changes to the original repo

Subtrees are a better alternative to submodules, more robust and combining the source code of submodules to aggregate repos instead of just referencing it. This makes it easy to maintain such an aggregating repo, but the problem with subtrees is the same as for submodules, which makes double commits completely useless. You do not have to commit changes to the original repo module and you can commit it with repo aggregation, this can lead to inconsistency between repos ...

The differences are well explained here: http://blogs.atlassian.com/2013/05/alternatives-to-git-submodule-git-subtree/

Quizzes What are the steps, their cost and the total cost of migration using this approach?

  • git svn clone SVN_URL REPO_NAME for each module
  • Create repo aggregation
  • Perform subtree merge for each module

How can it support code coding? What changes are needed for the VCS / tools perspective? Suppose a full CI launch takes 15 minutes.

  • Gerrit doesn't seem to support the merge subtree very well ( https://www.google.com/#q=Gerrit+subtrees )
  • But we cannot be sure that we will try
  • Jenkins. Triggers on subtree repositories and aggregate repo changes (argh! No sense in two places!)

What are the effective workflows of developers? (Gerrit process is omitted)

  • Developer changes something in subtree (inside repo aggregation)
  • Developer performs repo aggregation
  • The developer does not forget about changing the original repo (no sense!)
  • Developer doesn’t forget NOT to mix subtree changes with aggregate repo change in one commit

Again, as with submodules, it makes no sense to have two places (repoes) where codes / changes are present. Not for our case.

Separate repositories

Individual repositories look like the best solution and follow the original git intent. The repo granularity may vary. The thinnest case is to have a repo group per maven group, however this can lead to too many repositories. We also need to think about how often one particular svn transaction affects multiple modules or release groups. If we see that a fix usually affects 3-4 release groups, then these groups should form a repo.

I also think that it is worth at least decoupling the api modules from the implementation modules.

Quizzes What are the steps, their cost and the total cost of migration using this approach?

  • git svn clone SVN_URL REPO_NAME for each more or less fine-grained number of modules

How can it support code coding? What changes are needed for the VCS / tools perspective? Suppose a full CI launch takes 15 minutes.

  • Jenkins works for each repo separately. No 'enable filters. Just do a check, build, expand.

What are the effective workflows of developers?

  • Developers use local / remote function branches for each repo
  • Push changes to gerrit
  • Gerrit checks changes in tests
  • Change merges with the main branch
+7
source share

I will give you a small answer. It will be simple and may leave much to be desired, but it can also help.

  • Forget the story. When will you need it? You always have an old svn for reference, and in a few months the need for it will even decrease. This is not always practical, but please carefully examine what your real needs are for older code.

  • Make extensive use of branches.

  • Use different git repositories for different modules.

  • Forget about svn models when deciding what to do in git.

btw if you need a story - $ git svn clone http://svn/repo/here/trunk

+1
source share

I also don't think you should move the story from SVN to Git. Keep your old SVN repository in read-only mode if you really need to keep a story. IME, SVN differs from Git in that storytelling can actually generate a misleading story.

Use a separate Git repository for every thing that has an independent build process. This may be at the module level or more or less fine-grained. Then, if you really need it, you can “stitch” these repositories together with a “super” repository that has only a directory structure and submodules.

Use hooks and configuration to prevent force pushing and any other common branches. It rarely needs to be done, and only as part of some recovery process, so it should be on administrators, not developers. However, provide a well-known “namespace” for branches that developers can use to share commits and branches either with themselves or with others, and let these branches be easily portable.

Encourage many private branches to developers, but have a clear, manageable process for creating (branching) and deleting (merging) common branches. Merge with rebase vs. rebase --squash is an open question for transferring commits from the developer's branch to the general branch, but make a decision in the team and everyone uses the same style. (I prefer merging, but others are also acceptable.) Deploy Gerrit or something like this as soon as possible so that the code is reviewed before it appears in the general branch, and moving commits from developers to the general branch is automated (turn the policy into process).

NTN

+1
source share

All Articles