Advanced Subversion to Git Migration Guide
Scope of This Guide
This article supposes that you have taken the decision to replace an existing Subversion (svn) repository with a Git one and that you are interested in doing more than just a simple straightforward mirror of the repository which can be easily created by git-svn. In particular, it explains how to:
- Transform svn branches and tags into Git branches and tags.
- Update commit messages to replace the mentions of svn revisions, which don't make sense in a Git repository, with corresponding Git commit IDs.
- Exclude some parts of the old svn repository, possibly extracting them in their own independent Git repositories and using them as submodules of the main one.
- Configure email notifications for Git commits in a way similar to svn notifications.
- Provide a replacement to ViewVC.
The migration is supposed to be done under a Unix system and was tested under Linux. Other Unix versions, including OS X, should work with only minor changes, if any, but it is not recommended using Microsoft Windows, if only because Git works much slower there. In the code snippets below, $ represents the shell prompt and this shell is assumed to be Bourne-compatible (i.e. not [t]csh).
Creating Git Repository
We are going to use svn2git script to import svn repository in Git. It is based on git-svn, but this tool not only imports all commits but also converts the existing svn branches and tags to the corresponding native Git objects and so should be used even if you already have an existing git-svn clone.
Before starting the conversion, it is recommended to create the authors file mapping the Subversion account names to full author names, including email, used by Git. For repositories with the small number of contributors it's simple enough to create this file manually as it has a very simple structure with each line containing svn account on the left and Git author name on the right of an equal sign, e.g.:
svnname = A.U. Thor <email@example.com>For repositories with large number of contributors, it is usually preferable to create this file automatically (e.g. using this script) and then edit it.
svn2git is written in Ruby, so the simplest way to install it is by using RubyGems. If you don't have the gem command already on your system, you should be able to install it from a package called "rubygems" or similar. By default, gems are installed in a system-wide directory and their installation requires superuser permissions. In case you don't have them or don't want to install the script globally, you can do
to install the gems in the specified directory. And to install the script itself you need to just do
$ export GEM_HOME=~/ruby/gems $ PATH=~/ruby/gems/bin:$PATH
$ gem install svn2git
Doing the conversion
Notice that this step can take a long time (up to several days), so, if possible, prefer to run it on a local Subversion repository, i.e. on the same server that hosts it. If this is impossible or impractical, consider using svnadmin dump and load commands to make a local copy of the repository.
Converting the repository is rather anticlimactic as it's done with this simple command:
As mentioned above, ideally "repository-url" should be a file:// one, e.g. file:///var/git/repo but any kind of URLs is accepted here if you are prepared to wait for sufficiently long time. And the authors file is the full path to the file containing the author names mappings created above -- if you don't have it, simply omit this option.
$ svn2git repository-url -m --authors authors-file
Finally notice that the -m option is only necessary if you intend to follow the post-processing step replacing svn revision numbers with the corresponding Git commit IDs below, otherwise you can omit it in order to get rid of "git-svn-id" metadata lines in the commit messages.
The new Git repository is now fully operational and you could start using it immediately. But in practice it may be useful to apply some transformations to the new repository first and the sections below explain how to perform some common ones. Again, these steps are purely optional and you could decide to do all, one or none of them.
If you do decide to modify the repository further, it is strongly recommended to make a backup, i.e. a separate git clone, of the repository you have just created, to be able to return to it if the history rewriting goes wrongly.
Updating svn revisions mentions
It it common to have the mentions of svn revisions in the repository history, e.g. a commit message could be "Fix regression introduced in r12345.". These revision numbers don't make sense in a Git repository as there is no good way to find the commit corresponding to them and, if you decide to remove the ugly "git-svn-id" metadata lines from the commit messages, simply no way at all. So, to preserve the information encoded in them, you can decide to replace the svn revisions with the corresponding Git SHA-1 commit IDs in the commit message text so that the message becomes "Fix regression introduced in 1a2b3c." instead.
To do this, download the msgfilter-rev2sha script , put it somewhere in your PATH and run the following command
This may result in some false positives as the script considers any runs of 5 or more digits representing an svn revision earlier than the one currently being processed as a revision number but in practice this seems to be very rare (only 1 known false positive in more than 100,000 revisions in the 2 big repositories the script was tested with).
$ git filter-branch --msg-filter msgfilter-rev2sha \ --tag-name-filter cat -- --date-order --all
Also notice that it is important to specify --date-order option above to ensure that the revision mentioned in a commit message is already rewritten by the time this revision itself is being filtered. Of course, this assumes that the commit messages only refer to the revisions in their past, but then if you have a functional time machine you probably don't need to use a version control system anyhow.
Another common task is to exclude some of the contents of the old svn repository when transitioning to Git. For example, it is relatively common to store big binaries files in svn but this is not recommended when using Git, so you may decide to use "git filter-branch" again to get rid of them and reduce your new repository size.
Notice that if you followed the instructions above and replaced the references to svn revisions in the commit messages with references to Git commit IDs, you now need to update these references again, as they change when rewriting history. To do it, download the msgfilter-updatesha script and put it alongside the previously download msgfilter-rev2sha. Otherwise, i.e. if you skipped the revision rewriting step, you don't need to rewrite anything here neither and should simply omit the --msg-filter option from the command below.
Finally, to remove the contents and history of a subdirectory "assets" from the repository, run
Notice that you can list several directories in the "git rm" command, if needed.
$ git filter-branch --prune-empty \ --index-filter 'git rm -q -r --cached --ignore-unmatch assets' \ --msg-filter msgfilter-updatesha \ --tag-name-filter cat -- --date-order --all
Splitting directories in a different repository
Instead of removing the contents of a subdirectory completely from Git, you can also decide to extract it in a standalone Git repository and use it as a submodule. Somewhat surprisingly, knowing that subtrees are the alternative to using submodules, the easiest way to create a repository for such submodule, is to use git subtree command. This command is included in Git itself since 1.7.11 but can also be retrieved directly from Git sources and installed by just copying this file anywhere into your PATH.
Consider, for example, that you want to split 3rdparty/foo directory in its own repository. This can be done with the following steps:
The next step is to push the new repository to its final location somewhere. Once this is done, the "new-foo" subdirectory can be entirely removed and the original "3rdparty/foo" subdirectory of the main repository can be removed from history as explained in the previous section and replaced with a submodule pointing to the new repository containing only "foo".
$ git subtree split --prefix=3rdparty/foo --branch=foo-only $ mkdir new-foo $ cd new-foo $ git init $ git pull ../ foo-only
Replacing Subversion Tools
Commit Notification Emails
Git comes with a simple but fully functional post-receive-email script that can be used for sending an email notification whenever some commits are pushed to the repository. This works perfectly well but is different from the usual Subversion commit notification emails where one email is generated for each commit instead. To ease the transition, it is possible to use git-notifier which sends one email per change, i.e. a commit or a tag, instead of only a single email for each push.
Setting the notifier up is straightforward and described well on its home page, however there is a huge potential pitfall which is not mentioned there as of time of this writing (2013-03-31): if you install the script as a hook for an existing repository, the first time a commit is done, it will try to send email for each and every one of the already existing commits and tags. As a typical Subversion repository can have dozens of thousands of revisions, this is clearly not ideal. To avoid this from happening, run the script first from the command line from the repository directory using --updateonly option before installing it as a hook!
Web Repository Viewer
There are many Web viewers for Git repositories but only a couple of them support showing commits as side-by-side diffs which can be important to provide the same functionality as available in ViewVC with Subversion. The most actively maintained and widely used of them is currently cgit, so this is the recommended choice.
Unfortunately, cgit instructions are not as exhaustive as they could have been, so installing and configuring could be somewhat challenging. The following tips can be helpful if you run into any troubles with it:
- If you get an error about missing cryptographic functions when linking, edit Makefile and ensure that git/libgit.a appears before -lcrypto.
- If you specify the repositories to serve manually, be warned that the project-list option value must be a file containing the repositories paths and not the paths themselves.
- If you use scan-path option to find the repositories automatically, be even more warned that this option must be the last one specified in cgitrc file and that not following this advice will result in all options following it being silently ignored.
- If you want to use cache, it needs to be explicitly enabled using cache-size option as, in spite of being mentioned many times in the description, cache is not enabled by default.
Final advice: if there are any other problems and you want to debug the program, edit Makefile to put -O0 into it as it doesn't respect CFLAGS passed on the command line and remember that you can debug any CGI script from the command line by passing it its parameters in QUERY_STRING, e.g.
$ QUERY_STRING='url=...&id=...' ./cgit
After verifying that your cgit installation works, don't forget to enable view filters, they make cgit significantly more pleasant to use. The commit links example filter can be used as is after simply replacing the URL in it with the correct URL of your bug tracker. In the syntax highlighting filter, uncomment the line using highlight v3 syntax as it is the current version of the program and used by most Linux systems by now.