Migration from SVN to Git

Recently I had a task to migrate large SVN repository to Git and I would like to share with you how we did it.

Step 1, size

There is principal difference between SVN and Git. In SVN after checkout, will download only one revision. But in Git you will download whole repo. Bigger repository means longer clone times and in general slower repository. Experts in that area advice to have repository under 1Gb in size without LFS.

As a result, you need to do some homework and reduce repository size in SVN. Try to find everything that is not needed anymore. There is no penalty for garbage in SVN if you are not using it, but in Git it will be dead data that everyone will download. If you are using service like GitHub, it may incur extra cost. For example, GitHub will charge you for LFS traffic and storage separately. You can save a lot of money by doing this step.

Step 2, binary files

Git does not work well with binary files. If you have .png file in repository and then you change and commit it, then effectively you will have 2 .png files in history forever and everyone will have to download it.

Git way is solving this problem is to use LFS. Many people mistake LFS for binary files storage. It is not true. It is storage for large files. Any kind of large files. If you have large file that is not switch from version to version quite often, then it is very good candidate to put in LFS.

So, on this step, you must go over your repository and find all your large files. Some files are easy to find, for example executable files or image files. Some files could be just big. For example, in our repository we had few huge .json files. Make sure that you need all these files. The less files the better. Build such list files and patterns to move. You will need them at Step 6.

Step 3, preparation

We did migration using git svn fetch command. This command works very slow on Windows and as a result, I recommend using Linux. Best way is to create virtual machine with any cloud provider. I used Ubuntu Server 20.04 LTS for this purpose.

After you login install necessary Git bits:

sudo apt-get update
sudo apt-get install git
sudo apt-get install git-lfs

Step 4, history

Decide how much history do you need. I know many of you would like to have all history, but if your repository is quite old and, it could be not practical to import all history. There is rarely needed to go more than few years in history. Remember, long history will be dead weight in 99.99% of cases. Your SVN repository will still be there and in case of emergency you can still go there to check all details. Ideally you need to start history after some major release and have about 2-3 years of history. Of course, if your repository is not that big, you can import all history.

For my needs I created virtual PC with 36 cores and on that PC git svn fetch command able to import 5-10 revisions per second. This will give estimate time it will take to import repository. But in your case it could be different, because it depends on how bug are files you change and it depends on speed of your disk, speed of your SVN server and many other variable.

We didn’t import any branches and I don’t know what to do in this case. I’m not sure if Git able to migrate merge property of SVN and without that, branches are mostly useless.

Step 5, before you start import

Before you start final import, you will need few test runs. You need to know how long it will take to do import and you will have to plan accordingly. For example, if import takes 10 hours probably it is good idea to run final run it on weekend. If it takes 1 hour, then you can run almost any time.

Also test run will allow you to build final list of command to execute. I will provide commands we used, but your case could be slightly different, and I cannot provide revision and ignore list, because they will be specific to your case. So, create some file with all commands you executed or even create script that does it for you.

Lastly test run will allow you and your team to see how reasonable size of your repository and how practical it is. Perhaps you need repeat Step 1, Step 2 and Step 3

Step 5, import

Now it is time to import. Execute following commands:

mkdir import
cd import
git svn init svn://server_naem/repo_name/trunk
git lfs install

As you can see, I used svn protocol. By some reason our server returns error when I used http and https protocols during import on one of the revisions.

Now it is time to actual import:

git svn fetch -r <rev_start>:HEAD --username=<user_name> --ignore-paths="<ignore_paths>"

obvious instead <rev_start> you should put revision to start import from. For whole repo, just remove -r and everything after that until --username. <user_name> is username to login to SVN server.  <ignore_paths> is Perl regular expression of paths that will be ignored. For example you can use this:

--ignore-paths="^xyz|^\.abc|.*\/obj\/"

to skip directory xyz and .abc in root of your repository and skip all directories with name obj. I suggest search any regular expression tester in internet and test your expression before using it. There are many non-trivial things.

Normally this parameter will contain files and directories you don’t need or removed from SVN repository on Step 1 or event before. In example about xyz could be directory that you don’t need; .abc could be file you want to ignore; and obj could be directories you just removed from SVN. Last is very important, because even you removed obj directories in SVN, they still stored in the history, and they still will be in history of Git.

git svn featch command is re-startable. For example, if connection dropped to your SVN server, it is possible to restart this command without -r parameter (and its value). I did some testing, and everything looked ok, but our case I decide to restart whole process, just in case.

At the end you will have local git repository from SVN. Before you continue, I strongly recommend creating copy of this repository. In this case, if you messed up further steps, you would have copy and don’t need to repeat whole import process.

Step 6, migrate binary files to LFS

Next step to rewrite history and extra-large files into LFS. To do this you need to execute following command:
git lfs migrate import --include="*.lib,*.dll,*.nupkg"

Obviously in your case include will be different. It is possible to specify file names or masks. There is one caveat. They are case-sensitive. As a result, if you have a.lib and b.Lib, that mask will only include a.lib. To fix this, you may need to use command line this:

git lfs migrate import --include="*.[Ll][Ii][Bb]"

I wrote small app that converts list from first command to list to second variant, just to be sure.

After that migration is completed, and you can push this repository to your Git server:

git remote add origin https://my_git_server/repo.git
git branch -M master
git push -u origin master

I suggest pushing that to test repo, and double check everything: that there are no unwanted files in repository and its history. Check that files that are supposed to be in LFS are in LFS, etc. After that ask few members of you team to play with it. For example, do some changes, commit and push them. Then pull these changes on another computer. Check clone speed, size on the disk etc. We found that many Git tools that our team used for previous projects are simply not able to handle large repositories and we had to change them.

If you are not happy with, repeat from Step 1.

Some statistics for our case

We checked out our SVN repo and added it to Git without any import and history. We added to LFS only files over 100Mb because system simply rejected them. That repo size was 20 Gb after clone.

After that, we did migration with history and basic LFS and we got 32 Gb Git repository size after clone. We migrated about 30 000 revisions, and it took around 9 hours. Checkout from SVN repository was 36 Gb on disk, so Git repository was little bit smaller.

Then we spent few days actively removing unused large files and pushing many other files to NuGet packages to our private NuGet server. After that cloned repo size became less than 12Gb. Without working files and lfs directory, our repo size is less than 800 Mb in size. We decided that it is good enough and stopped at that step.

I hope it will help someone.