AWS DevOps & Developer Productivity Blog
Migrating Subversion repositories to AWS CodeCommit
In this post, we walk you through migrating Subversion (SVN) repositories to AWS CodeCommit. But before diving into the migration, we do a brief review of SVN and Git based systems such as CodeCommit.
About SVN
SVN is an open-source version control system. Founded in 2000 by CollabNet, Inc., it was originally designed to be a better Concurrent Versions System (CVS), and is being developed as a project of the Apache Software Foundation. SVN is the third implementation of a revision control system: Revision Control System (RCS), then CVS, and finally SVN.
SVN is the leader in centralized version control. Systems such as CVS and SVN have a single remote server of versioned data with individual users operating locally against copies of that data’s version history. Developers commit their changes directly to that central server repository.
All the files and commit history information are stored in a central server, but working on a single central server means more chances of having a single point of failure. SVN offers few offline access features; a developer has to connect to the SVN server to make a commit that makes commits slower. The single point of failure, security, maintenance, and scaling SVN infrastructure are the major concerns for any organization.
About DVCS
Distributed Version Control Systems (DVCSs) address the concerns and challenges of SVN. In a DVCS (such as Git or Mercurial), you don’t just check out the latest snapshot of the files; rather, you fully mirror the repository, including its full history. If any server dies, and these systems are collaborating via that server, you can copy any of the client repositories back up to the server to restore it. Every clone is a full backup of all the data.
DVCs such as Git are built with speed, non-linear development, simplicity, and efficiency in mind. It works very efficiently with large projects, which is one of the biggest factors why customers find it popular.
A significant reason to migrate to Git is branching and merging. Creating a branch is very lightweight, which allows you to work faster and merge easily.
About CodeCommit
CodeCommit is a version control system that is fully managed by AWS. CodeCommit can host secure and highly scalable private Git repositories, which eliminates the need to operate your source control system and scale its infrastructure. You can use it to securely store anything, from source code to binaries. CodeCommit features like collaboration, encryption, and easy access control make it a great choice. It works seamlessly with most existing Git tools and provides free private repositories.
Understanding the repository structure of SVN and Git
SVNs have a tree model with one branch where the revisions are stored, whereas Git uses a graph structure and each commit is a node that knows its parent. When comparing the two, consider the following features:
- Trunk – An SVN trunk is like a primary branch in a Git repository, and contains tested and stable code.
- Branches – For SVN, branches are treated as separate entities with its own history. You can merge revisions between branches, but they’re different entities. Because of its centralized nature, all branches are remote. In Git, branches are very cheap; it’s a pointer for a particular commit on the tree. It can be local or be pushed to a remote repository for collaboration.
- Tags – A tag is just another folder in the main repository in SVN and remains static. In Git, a tag is a static pointer to a specific commit.
- Commits – To commit in SVN, you need access to the main repository and it creates a new revision in the remote repository. On Git, the commit happens locally, so you don’t need to have access to the remote. You can commit the work locally and then push all the commits at one time.
So far, we have covered how SVN is different from Git-based version control systems and illustrated the layout of SVN repositories. Now it’s time to look at how to migrate SVN repositories to CodeCommit.
Planning for migration
Planning is always a good thing. Before starting your migration, consider the following:
- Identify SVN branches to migrate.
- Come up with a branching strategy for CodeCommit and document how you can map SVN branches.
- Prepare build, test scripts, and test cases for system testing.
If the size of the SVN repository is big enough, consider running all migration commands on the SVN server. This saves time because it eliminates network bottlenecks.
Migrating the SVN repository to CodeCommit
When you’re done with the planning aspects, it’s time to start migrating your code.
Prerequisites
You must have the AWS Command Line Interface (AWS CLI) with an active account and Git installed on the machine that you’re planning to use for migration.
Listing all SVN users for an SVN repository
SVN uses a user name for each commit, whereas Git stores the real name and email address. In this step, we map SVN users to their corresponding Git names and email.
To list all the SVN users, run the following PowerShell command from the root of your local SVN checkout:
svn.exe log --quiet | ? { $_ -notlike '-*' } | % { "{0} = {0} <{0}>" -f ($_ -split ' \| ')[1] } | Select-Object -Unique | Out-File 'authors-transform.txt'
On a Linux based machine, run the following command from the root of your local SVN checkout:
svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > authors-transform.txt
The authors-transform.txt
file content looks like the following code:
ikhan = ikhan <ikhan>
foobar= foobar <foobar>
abob = abob <abob>
After you transform the SVN user to a Git user, it should look like the following code:
ikhan = ifti khan <ikhan@abc.com>
fbar = foo bar <fbar@abc.com>
abob = aaron bob <abob@abc.com>
Importing SVN contents to a Git repository
The next step in the migration from SVN to Git is to import the contents of the SVN repository into a new Git repository. We do this with the git svn utility, which is included with most Git distributions. The conversion process can take a significant amount of time for larger repositories.
The git svn clone command transforms the trunk, branches, and tags in your SVN repository into a new Git repository. The command depends on the structure of the SVN.
git svn clone may not be available in all installations; you might consider using an AWS Cloud9 environment or using a temporary Amazon Elastic Compute Cloud (Amazon EC2) instance.
If your SVN layout is standard, use the following command:
git svn clone --stdlayout --authors-file=authors.txt <svn-repo>/<project> <temp-dir/project>
If your SVN layout isn’t standard, you need to map the trunk, branches, and tags folder in the command as parameters:
git svn clone <svn-repo>/<project> --prefix=svn/ --no-metadata --trunk=<trunk-dir> --branches=<branches-dir> --tags==<tags-dir> --authors-file "authors-transform.txt" <temp-dir/project>
Creating a bare Git repository and pushing the local repository
In this step, we create a blank repository and match the default branch with the SVN’s trunk name.
To create the .gitignore
file, enter the following code:
cd <temp-dir/project>
git svn show-ignore > .gitignore
git add .gitignore
git commit -m 'Adding .gitignore.'
To create the bare Git repository, enter the following code:
git init --bare <git-project-dir>\local-bare.git
cd <git-project-dir>\local-bare.git
git symbolic-ref HEAD refs/heads/trunk
To update the local bare Git repository, enter the following code:
cd <temp-dir/project>
git remote add bare <git-project-dir\local-bare.git>
git config remote.bare.push 'refs/remotes/*:refs/heads/*'
git push bare
You can also add tags:
cd <git-project-dir\local-bare.git>
For Windows, enter the following code:
git for-each-ref --format='%(refname)' refs/heads/tags | % { $_.Replace('refs/heads/tags/','') } | % { git tag $_ "refs/heads/tags/$_"; git branch -D "tags/$_" }
For Linux, enter the following code:
for t in $(git for-each-ref --format='%(refname:short)' refs/remotes/tags); do git tag ${t/tags\//} $t && git branch -D -r $t; done
You can also add branches:
cd <git-project-dir\local-bare.git>
For Windows, enter the following code:
git for-each-ref --format='%(refname)' refs/remotes | % { $_.Replace('refs/remotes/','') } | % { git branch "$_" "refs/remotes/$_"; git branch -r -d "$_"; }
For Linux, enter the following code:
for b in $(git for-each-ref --format='%(refname:short)' refs/remotes); do git branch $b refs/remotes/$b && git branch -D -r $b; done
As a final touch-up, enter the following code:
cd <git-project-dir\local-bare.git>
git branch -m trunk master
Creating a CodeCommit repository
You can now create a CodeCommit repository with the following code (make sure that the AWS CLI is configured with your preferred Region and credentials):
aws configure
aws codecommit create-repository --repository-name MySVNRepo --repository-description "SVN Migration repository" --tags Team=Migration
You get the following output:
{
"repositoryMetadata": {
"repositoryName": "MySVNRepo",
"cloneUrlSsh": "ssh://ssh://git-codecommit.us-east-2.amazonaws.com/v1/repos/MySVNRepo",
"lastModifiedDate": 1446071622.494,
"repositoryDescription": "SVN Migration repository",
"cloneUrlHttp": "https://git-codecommit.us-east-2.amazonaws.com/v1/repos/MySVNRepo",
"creationDate": 1446071622.494,
"repositoryId": "f7579e13-b83e-4027-aaef-650c0EXAMPLE",
"Arn": "arn:aws:codecommit:us-east-2:111111111111:MySVNRepo",
"accountId": "111111111111"
}
}
Pushing the code to CodeCommit
To push your code to the new CodeCommit repository, enter the following code:
cd <git-project-dir\local-bare.git>
git remote add origin https://git-codecommit.us-east-2.amazonaws.com/v1/repos/MySVNRepo
git add *
git push origin --all
git push origin --tags
(Optional if tags are mapped)
Troubleshooting
When migrating SVN repositories, you might encounter a few SVN errors, which are displayed as code on the console. For more information, see Subversion client errors caused by inappropriate repository URL.
For more information about the git-svn utility, see the git-svn documentation.
Conclusion
In this post, we described the straightforward process of using the git-svn utility to migrate SVN repositories to Git or Git-based systems like CodeCommit. After you migrate an SVN repository to CodeCommit, you can use any Git-based client and start using CodeCommit as your primary version control system without worrying about securing and scaling its infrastructure.