Hg to git conversion and subpopulation

Question

Hg to git conversion and subpopulation

Despite the participation of the two sub-parts, I ask this as a combined question, because the way it is broken into parts is not what matters. I am open to different ways to achieve what I want, as long as the final result retains all the significant history and the ability to check, study and build / test historical versions. The goal is to remove the hg and sub-model that has been used so far and move to a single tree in git, but without sacrificing history.

What I'm starting with is the Mercurial repository, which consists of some top-level code and several subitems, where the bulk of the interesting story lies. Some branching / merging ones, but nothing is crazy. The end result I want to achieve is a single git repository, without submodules, which:

For each commit, there is a git command in the original top-level hg repository that checks for exactly the same tree that you would check with the corresponding hg commit with all its subrepo details.
These git make the corresponding sequential hg-fixation of the upper level, are descendants of each other, with a fixation corresponding to all the corresponding subrepocoses between them.

The main idea that I have to achieve this is to repeat all the top-level hg .hgsubstate , and for each top-level command that changes .hgsubstate , it will also .hgsubstate over all the paths from the old version to the new revision for the submodule (possibly branching) . At every step:

Check out the relevant hg versions for the top level and all subheadings.
Remove everything from the git index.
Select all elements from hg to git index.
Use git-write-tree and git-commit-tree to create the commit with the desired parents, using the authorship, date, and commit message from the corresponding hg commit.
Record the correspondence between new git commit and hg commitments for use in generating future parent commits.

Should this work? Is there a better way to achieve what I want, maybe backups with hg will be hidden first? The biggest thing I don’t understand about is how to accomplish the desired iteration, so practical advice on how to achieve it will be great.

One additional limitation: source repositories include content that cannot be published (this is an additional step of git-filter-branch after the basic conversion is completed), therefore decisions related to downloading a repo for third-party processing are not viable.

+8

git version-control mercurial mercurial-subrepos

R .. May 10, '16 at 17:43

source share

6 answers

khrm · Answer 1 · 2016-05-13T21:57:56+0000

What you wrote may or may not solve the problem. But it is not easy. The main problem is that you need to complete the transaction so that your sub-positions and the main repo are consistent. I recreated this problem on a small scale and was able to have consistency between subreports as well).

My decision:

Using the hg convert extension, I converted the main repo to a repo without subrepos (and related information).

 cd main awk '{ print $1}' .hgsub | xargs -n 1 echo 'exclude' > ../filemap echo exclude .hgsub >> ../filemap echo exclude .hgsubstate >> ../filemap cd .. hg convert --filemap filemap main mainConv cd mainConv hg update

Convert subrepo by renaming to --filemap.

 cd .. echo rename . subRepo > subFileMap hg convert --filemap main/subRepo subRepoConv cd subRepoConv hg update

Pull the subrepos onto the converted main repo.
```
 cd ../mainConv hg pull -f ../subRepoConv 
```
You will notice several heads in the repo while pulling (because the subrepo has its own head). Combine them:
```
  hg heads hg merge <RevID from subrepo (not main repo)> hg ci -mMergeOfSubRepo 
```

You need to repeat 3 and 4 for each sub-report.

But the commits will not be sorted. Therefore, put them in order, as done here https://stackoverflow.com/a/318960/ :

  cd .. hg clone -r 0 mainConv mainOrdered cd mainOrdered for REV in `hg log -R ../main -r 'sort(1:tip, date)' --template '{rev}\n'` do hg pull ../main -r $REV done

Now convert this ordered Mercurial repo to git with http://repo.or.cz/w/fast-export.git :

 cd .. git clone git://repo.or.cz/fast-export.git git init mainGit cd mainGit ../fast-export/hg-fast-export.sh -r ../mainOrdered git checkout HEAD

Felipec · Answer 2 · 2016-05-18T18:03:18+0000

Yes. It is best to manually commit using git commit-tree . There are many conversion tools, but they will never give you exactly what you want. On the other hand, a handwritten script will give you all the flexibility you need.

I have written many of these scripts, including git remote-hg .

Lazy badger · Answer 3 · 2016-05-11T05:20:04+0000

Unrelated offtopic

I'm sure you have chosen the worst idea of migration (from Mercurial to Git), but this is your choice and your responsibility at last

Migration rate

My knowledge of Git is pretty weak, so for Mercurial + subrepo -> monolithic Git I can only see and describe this way:

Mercurial + subrepo → monolithic Mercurial → monolithic Git repo

To combine the history of subrepos with the history of the repo wrapper, you can (corrected by alexis comment) use my idea from an earlier question about Convert Extension
A monolithic Mercurial repo with an additionally polished history (one root, without anonymous heads, without at least related bookmarks) can be easily clicked on an empty Git-repo using hg-git

Giacomo tesio · Answer 4 · 2016-05-19T16:08:36+0000

This is what I did to solve a similar problem:

Convert every mercury repository with fast-export
Add subrepository directories as deleted in the parent repo
In the parent repo, git checkout -b to specify a name for each subreport repository
git read-tree --prefix=pathsubrepo/ -u subrepobranch for each subreport

This is more or less what I did in a bit more detail (adapted from bash history ... but actually does not work)

Step 1

 cd ~ git clone git://repo.or.cz/fast-export.git git init parent_repo cd parent_repo ~/fast-export/hg-fast-export.sh -r /path/to/old/mercurial/parent git checkout HEAD cd ~ git init subrepo1 cd subrepo1 ~/fast-export/hg-fast-export.sh -r /path/to/old/mercurial/parent/subrepo1 git checkout HEAD cd ~ git init subrepo2 cd subrepo2 ~/fast-export/hg-fast-export.sh -r /path/to/old/mercurial/parent/subrepo2 git checkout HEAD

Step 2

 cd ~/parent_repo git remote add sub1 $HOME/subrepo1/ git remote add sub2 $HOME/subrepo2/

Step 3

 cd ~/parent_repo git checkout -b sub1master sub1/master git checkout -b sub2master sub2/master

Step 4

 cd ~/parent_repo git read-tree --prefix=subrepo1/ -u sub1master git read-tree --prefix=subrepo1/ -u sub2master

After that, you can git branch -D sub1master and git branch -D sub2master , since you no longer need them.

R .. · Answer 5 · 2016-08-15T19:19:22+0000

It seems that I was absent from my question, and the discussion of possible solutions was a correct understanding of graph theory. Ideas like “iterating all the way from the old revision to the new edition” were not really clearly defined, or at least not reflecting what I expected from them. Coming to this from a more rigorous point of view, I think I have an approach that works.

To begin with, the problem is: Subrepo revisions represent only the state of their own subtrees at a given point in history. I want to match them with revisions that represent the state of the entire merged tree. Then the DAG sub-mode can be combined with the top-level DAG in a meaningful way.

For this sub-pre-version of R, we may ask what changes of the top-level repo (or parent-repo, if we have several levels of sub-repo), include R or any descendant of R. Assuming a single root, this set of revisions has the lowest common ancestor (or maybe more than one) that seems like a good candidate. Indeed, if the top-level version S that we use with R is not a common ancestor of the revisions that R or its descendants use (but matching is otherwise reasonable), then R will have a descendant R ', the related top-level revision S 'is not a descendant of S. In other words, a story derived from sub-scanty will have confusing / meaningless jumps between revisions of a top-level tree.

Now, if we want to choose a common ancestor, the lowest one makes sense in terms of making these changes something that can be verified, built and verified, and in terms of providing a reasonable idea that the state of the top-level repo (and other sub provisions) It was at a time when changes were made in support. Of course, the root of the entire top-level DAG will also work, but it will not produce meaningful, usable changes that could be verified; choosing a root will be equivalent (in terms of ease of use) to a naive repo merge, having one root in each subreport and simply merging with the history of subrepoly whenever the top-level repo updates the versions it uses.

So, if we can use LCA to assign a top-level version T (R) to each sub-R version of R, how does this translate to

Whenever a revision of subrepo R has a T (R) different from T (P) for each parent P of R, it effectively combines the new changes from the top-level repo (and other subsequences) into the story sub-sequentially. The transformation should represent this as two commits:

The actual subrepo fixes R using the old top-level revision. If R has one parent P (and not a merge union), it will be T (P). If R had several parents, it is unclear if there is an ideal choice to use, but T (P) for any parent P should be reasonable.
A merge negotiation that combines the inverse transform C (T (R)) of the top-level repo compilation T (R) associated with R, where C (T (R)) simply merges (1) above.

In addition to C (T (R)), which refers to (1) as the merge parent, all other references to R in the transformation must use (2). This includes transformations of any descendants of T (R) into top-level repos that use revision R of this sub-class, and transformations of direct children of R.

I believe that the above (albeit poorly worded) description indicates everything that is needed to merge the top-level DAG and subrepo. Each revision of subrepo receives the full version of the tree and ends up connecting to a single DAG for the converted repo through "command commits" (when subrepo merges the new linked version of the top level and when the top level merges the changes that have changed).

Finally, the last step in creating a git repo is simply to rename the combined DAG, either in a topologically sorted form, or through the first walk with depth, so that every git commit-tree already has all the necessary parental changes present.

Devy · Answer 6 · 2016-05-20T17:20:24+0000

Try Facebook Hg ↔ Git converter: FbShipIt . Most of what you described should work well with this transaction conversion tool that copies commits between Mercurial and Git.

FbShipIt has a caveat: it does not understand merge commands, but it can be bypassed with git rebase .

Hg to git conversion and subpopulation

More articles: