AWS DevOps Blog

Replicating and Automating Sync-Ups for a Repository with AWS CodeCommit

by Chenwei (Cherry) Zhou, Software Development Engineer


 

Many of our customers have expressed interest in the following scenarios:

  • Backing up or replicating an AWS CodeCommit repository to another AWS region.
  • Automatically backing up repositories currently hosted on other services (for example, GitHub or BitBucket) to AWS CodeCommit.

In this blog post, we’ll show you how to automate the replication of a source repository to a repository in AWS CodeCommit. Your source repository could be another AWS CodeCommit repository, a local repository, or a repository hosted on other Git services.

To replicate your repository, you’ll first need to set up a repository in AWS CodeCommit to use as your backup/replica repository. After replicating the contents in your source repository to the backup repository, we’ll demonstrate how you can set up a scheduled job to periodically sync up your source repository with the backup/replica.

Where do I host this?

You can host your local repository and schedule your task on your own machine or on an Amazon EC2 instance. For an example of how to set up an EC2 instance for access to an AWS CodeCommit repository, including a sample AWS CloudFormation template for launching the instance, see Launch an Amazon EC2 Instance to Access the AWS CodeCommit Repository in the AWS for DevOps Guide.

 

Part 1: Set Up a Replica Repository

In this section, we’ll create an AWS CodeCommit repository and replicate your source repository to it.

  1. If you haven’t already done so, set up for AWS CodeCommit. Then follow the steps to create a CodeCommit repository in the region of your choice. Choose a name that will help you remember that this repository is a replica or backup repository. For example, you could create a repository in the US East (Ohio) region and name it MyReplicaRepo. This is the name and region we’ll use in this post.
  2. Use the git clone --mirror command to clone the source repository, including the directory where you want to create the local repo, to your local computer. You are not cloning the repository you just created in AWS CodeCommit. You are cloning the repository you want to replicate or back up to that AWS CodeCommit repository. For example, to clone a sample application created for AWS demonstration purposes and hosted on GitHub (https://github.com/awslabs/aws-demo-php-simple-app.git) to a local repo in a directory named my-repo-replica:
git clone --mirror https://github.com/awslabs/aws-demo-php-simple-app.git my-repo-replica

IMPORTANT

  • DO NOT use your working directory as the local clone repository. Your work-in-progress commits would also be pushed for backup.
  • DO NOT make local changes to this local repository. It should be used for sync-up operations only.
  • DO NOT manually push any changes to this replica repository. It will cause conflicts later when your scheduled job pushes changes in the source repository. Treat it as a read-only repository, and push all of your development changes to your source repository.
  1. Change directories to the directory where you made the clone:
cd my-repo-replica
  1. Use the git remote add RemoteName RemoteRepositoryURL command to add the AWS CodeCommit repository you created as a remote repository for the local repo. Use an appropriate nickname, such as sync. (Because this is a mirror, the default nickname, origin, will already be in use.) For example, to add your AWS CodeCommit repository MyReplicaRepo as a remote for my-repo-replica with the nickname sync:
git remote add sync ssh://git-codecommit.us-east-2.amazonaws.com/v1/repos/MyReplicaRepo

When you push large repositories, consider using SSH instead of HTTPS. When you push a large change, a large number of changes, or a large repository, long-running HTTPS connections are often terminated prematurely due to networking issues or firewall settings. For more information about setting up AWS CodeCommit for SSH, see For SSH Connections on Linux, macOS, or Unix or For SSH Connections on Windows.

Tip

Use the git remote show command to review the list of remotes set for your local repo.

  1. Run the git push sync --mirror command to push to your replica repository.
  • If you named your remote for the replica repository something else, replace sync with your remote name.
  • The --mirror option specifies that all refs under refs/ (which includes, but is not limited to, refs/heads/, refs/remotes/, and refs/tags/) will be mirrored to the remote repository. If you only want to push branches and commits, but don’t care if you push other references such as tags, you can use the --all option instead.

 

Your replica repository is now ready for sync-up operations. To do a manual sync, run git pull to pull from your original repository, and then run git push sync --mirror to push to the replica repository. Again, do not push any local changes to your replica repository at any time.

 

Part 2: Create a Periodic Sync Job

You can use a number of tools to set up an automated sync job. In this section, we’ll briefly cover four common tools: a cron job (Linux), a task in Windows Task Scheduler (Windows), a launchd instance (macOS), and, for those users who already have a Jenkins server set up, a Freestyle project with build triggers. Feel free to use whatever tools are best for you.

Note

Some hosted repositories offer options for syncing repositories, such as Git hooks, notifications, and other triggers. To learn more about those options, consult the documentation for your source repository system.

 

All of the following approaches rely on commands that pull the latest changes from the source repository to your local clone repo, and then mirror those changes to your AWS CodeCommit repository. They can be summed up as follows:

cd /path/to/your/local/repo git pull
git push sync --mirror

Where and how you save and schedule these commands depends on your operating system and tool(s). We’ve included just a few options/examples from a variety of approaches.

 

In Linux:

  1. At the terminal, run the crontab -e command to edit your crontab file in your default editor.
  2. Add a line for a new cron job that will change directories to your local clone repo, pull from your source repository, and mirror any changes to your AWS CodeCommit repository on the schedule you specify. For example, to run a daily job at 2:45 A.M. for a local repo named my-repo-replica in the ~/tmp directory where you nicknamed your remote (the AWS CodeCommit repository) sync, your new line might look like this:
45 2 * * * cd ~/tmp/my-repo-replica && git pull && git push sync --mirror
  1. Save the crontab file and exit your editor.

 

In Windows:

  1. Create a batch file that contains the command to change directories to your local clone repo, pull from your source repository, and mirror any changes up to your AWS CodeCommit repository. For example, if you created your local repo my-repo-replica in a c:\temp directory, and you nicknamed your remote (the AWS CodeCommit repository) sync, your file might look like this:
cd /d c:\temp\my-repo-replica
git pull
git push sync --mirror
  1. Save the batch file with a name like my-repo-backup.bat.
  2. Open Task Scheduler. (Not sure how? The simplest way is to open a command line and run msc.)
  3. In Actions, choose Create Basic Task, and then follow the steps in the wizard.

 

In macOS:

  1. Create a shell script that contains the command to change directories to your local clone repo, pull from your source repository, and mirror any changes up to your AWS CodeCommit repository. For example, if you created your local repo my-repo-replica in a ~/Documents directory, and you nicknamed your remote (the AWS CodeCommit repository) sync, your file might look like this:
cd ~/Documents/my-repo-replica
git pull
git push sync --mirror
  1. Save the shell script with a name like my-repo-backup.sh.
  2. Create a launchd property list file that runs the shell script on the schedule you specify. For example, if you stored my-repo-backup.sh in ~/Documents, to run the script daily at 2:45 A.M., your plist file might look like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.example.codecommit.backup</string>
    <key>ProgramArguments</key>
    <array>
        <string>~/Documents/my-repo-backup.sh</string>
    </array>
    <key>StartCalendarInterval</key>
    <dict>
        <key>Minute</key>
        <integer>45</integer>
        <key>Hour</key>
        <integer>2</integer>
    </dict>
</dict>
</plist>
  1. Save your plist file in ~/Library/LaunchAgents, /Library/LaunchAgents, or /Library/LaunchDaemons folder, depending on the definition you want for the job.
  2. Run the launtchctl command to load your job. For example, if you want to load a plist file named codecommit.sync.plist in ~/Library/LaunchAgents, your command might look like this:
launchctl load ~/Library/LaunchAgents/codecommit.sync.plist

 

For Jenkins:

  1. Open Jenkins.
  2. Create a new job as a Freestyle project.

codecommit_replicate_new_project

  1. In the Build Triggers section, select Build periodically, and set up a schedule for the task. Jenkins uses cron expressions to run periodic tasks. For more information, see the Jenkins documentation for the syntax of cron.

If you are replicating a GitHub or BitBucket repository, you can also set the task to build when the Git hook is triggered.

The following example builds once a day between midnight and 1 A.M.

codecommit_replicate_build_triggers

  1. In the Build section, add a build step and choose Execute Windows batch command or Execute Shell. Then write a script and implement the Git operations:
cd /path/to/your/local/repo git pull
git push sync --mirror

Note: Jenkins may require the full path for Git.

The following example is a Windows batch command file, with the full path for Git on the host.

codecommit_replicate_build

  1. Save the configuration for the task.

 

Your AWS CodeCommit replica repository will now be automatically updated with any changes to your source repository as scheduled.

We hope you’ve enjoyed this blog post. If you have questions or suggestions for future blog post, please leave it in the comments below or visit our user forum!