AWS DevOps Blog

Build Serverless AWS CodeCommit Workflows using Amazon CloudWatch Events and JGit

Sam Dengler is a Solutions Architect at Amazon Web Services

Summary

Amazon CloudWatch Events now supports AWS CodeCommit Repository State Changes event types for activities like pushing new code to a repository. Using these new event types, customers can build Amazon CloudWatch Event rules to match AWS CodeCommit events and route them to one or more targets like an Amazon SNS Topic, AWS Step Functions state machine, or AWS Lambda function to trigger automated workflows to process repository changes.

In this blog, I will provide three examples for using AWS Lambda and JGit to build cost-effective serverless solutions to securely process AWS CodeCommit repository state changes:

  • Replicate CodeCommit Repository
  • Enforce Git Commit Message Policy
  • Backup Git Archive to Amazon S3

Source code and Amazon CloudFormation templates for the examples are located in the following GitHub repository: https://github.com/awslabs/serverless-codecommit-examples.

AWS CodeCommit CloudWatch Events

Amazon CloudWatch Events delivers a near real-time stream of system events that describe changes in AWS resources. Below is an example Amazon CloudWatch Event for one of the new AWS CodeCommit Repository State Changes event types, referenceUpdated. Any change to the repository will trigger a referenceUpdated event, however triggers for particular branches can be filtered using the referenceType and referenceName fields in the event details.

{
    "version": "0",
    "id": "01234567-0123-0123-0123-012345678901",
    "detail-type": "CodeCommit Repository State Change",
    "source": "aws.codecommit",
    "account": "123456789012",
    "time": "2017-06-12T10:23:43Z",
    "region": "us-east-1",
    "resources": [
        "arn:aws:codecommit:us-east-1:123456789012:myRepo"
    ],
    "detail": {
        "event": "referenceUpdated",
        "repositoryName": "myRepo",
        "referenceType": "head",
        "referenceName": "myBranch",
        "commitId": "3e5983EXAMPLE",
        "oldCommitId": "1a7813EXAMPLE"
    }
}

We will use the Amazon CloudWatch Event’s fields to create a pattern to match the events for which we will trigger a target, in this case an AWS Lambda function.

Accessing AWS CodeCommit using HTTPS URLs

The HTTPS URL method for accessing an AWS CodeCommit repository is particularly suited to a serverless solution because an Amazon Lambda execution container already provides temporary AWS IAM key credentials associated to the function’s AWS IAM Execution Role. The function’s Execution Role is associated to one or more AWS IAM Policies, in which you specify permissions allowing the function to access AWS resources. For each function, we will limit the AWS IAM polices to only the AWS CodeCommit repository, Amazon S3 bucket, or Amazon SNS topics, following the AWS best practice to grant least privileged access.

For example, the AWS IAM Policy snippet below restricts the Amazon Lambda function to pull from the source repository and push to the target repository.

Policies:
  - Version: '2012-10-17'
    Statement:
      - Effect: Allow
        Resource: !Sub 'arn:aws:codecommit:${AWS::Region}:${AWS::AccountId}:${SourceRepositoryName}'
        Action:
          - 'codecommit:GetRepository'
          - 'codecommit:GitPull'
      - Effect: Allow
        Resource: !Sub 'arn:aws:codecommit:${TargetRepositoryRegion}:${AWS::AccountId}:${TargetRepositoryName}'
        Action:
          - 'codecommit:GetRepository'
          - 'codecommit:GitPush'

When using the HTTPS URL access method, a credential helper is configured for the Git client, which executes the “aws codecommit credential-helper” command to provide a SigV4 compatible user name and password using AWS IAM credentials (see more). When using JGit as the Git client, a CredentialsProvider can be supplied to Git commands to achieve the same result.

The Spring Cloud Config project provides an implementation of the JGit CredentialsProvider for AwsCodeCommit (source), which conveniently uses the AWS DefaultAWSCredentialsProviderChain to discover AWS credentials in the standard priority order supported by Amazon Lambda. The AwsCodeCommit.calculateCodeCommitPassword method is particularly interesting to review as an example of SigV4 transformation logic.

Cloning a repository is repeated across examples, and the functionality has been delegated to a supporting CloneCommandBuilder Class below.

public class CloneCommandBuilder {

    private File directory;

    public CloneCommandBuilder() throws IOException {
        directory = Files.createTempDirectory(null).toFile();
    }

    public CloneCommand buildCloneCommand(String sourceUrl) {
        return buildCloneCommand(sourceUrl, new AwsCodeCommitCredentialProvider());
    }

    public CloneCommand buildCloneCommand(String sourceUrl,
            AwsCodeCommitCredentialProvider credentialsProvider) {

        return new CloneCommand().setDirectory(directory)
                .setURI(sourceUrl)
                .setCredentialsProvider(credentialsProvider)
                .setBare(true);
    }
}

Next we’ll look at some examples to process the repository events using Amazon Lambda.

Example 1: Replicate CodeCommit Repository

Customers often need to replicate commits from one repository to another to support disaster recovery or cross region CI/CD pipelines. In this example, the Amazon Lambda function will clone a repository from the source and push to the target. This example is intended to update an existing target repository, which should not be empty before configuring replication prior to configuring replication.

Please note, Amazon Lambda functions are limited to 1.5GB memory, 512MB ephemeral disk capacity (“/tmp” space), and a 5 minute execution time. If your repository is unable to be processed within these limits, please see the Replicating and Automating Sync-Ups for a Repository with AWS CodeCommit blog article for an alternative approach to replicate repositories using an Amazon EC2 instance.

Let’s take a look at some code!

public class ReplicateRepositoryHandler
        implements RequestHandler<CodeCommitEvent, HandlerResponse> {

    private static Logger logger = Logger.getLogger(ReplicateRepositoryHandler.class);
    private final String targetUrl;
    private final AwsCodeCommitCredentialProvider credentialsProvider;

    public ReplicateRepositoryHandler() {
        String targetName = System.getenv("TARGET_REPO_NAME");
        String targetRegion = System.getenv("TARGET_REPO_REGION");

        CodeCommitMetadata target = new CodeCommitMetadata(targetName, targetRegion);
        targetUrl = target.getCloneUrlHttp();
        credentialsProvider = new AwsCodeCommitCredentialProvider();
    }

    // ...

On instantiation, the Amazon Lambda function discovers the target AWS CodeCommit repository HTTPS URL by querying the repository metadata using the target repository name and region. This discovery process is repeated across the examples, and the code has been delegated to the CodeCommitMetadata class below.

public class CodeCommitMetadata {

    private RepositoryMetadata repositoryMetadata;

    public CodeCommitMetadata(String repoName, String repoRegion) {
        AWSCodeCommitClientBuilder builder = AWSCodeCommitClientBuilder.standard();
        AWSCodeCommit client = builder.withRegion(repoRegion).build();

        GetRepositoryRequest request = new GetRepositoryRequest();
        request.withRepositoryName(repoName);

        GetRepositoryResult result = client.getRepository(request);
        repositoryMetadata = result.getRepositoryMetadata();
    }

    public String getCloneUrlHttp() {
        return repositoryMetadata.getCloneUrlHttp();
    }
}

When the Amazon Lambda function is triggered by the AWS CloudWatch Event, the source repository name and region in the event are used to discover the source repository HTTPS URL. We use JGit to clone the source repository from this URL into a local repository stored in a temporary directory in the Amazon Lambda execution container.

public HandlerResponse handleRequest(CodeCommitEvent event, Context context) {
    try {
        String sourceName = event.getDetail().getRepositoryName();
        String sourceRegion = event.getRegion();
   
        // clone source repository
        CodeCommitMetadata source = new CodeCommitMetadata(sourceName, sourceRegion);
        String sourceUrl = source.getCloneUrlHttp();
        Git git = new CloneCommandBuilder().buildCloneCommand(sourceUrl).call();

        // ...

Once we’ve cloned the local repository to the Amazon Lambda execution container, the last step is to set the target AWS CodeCommit repository as a new remote location and push the local references using the reference specification “+refs/*:refs/*”.

// push target repository
git.push().setCredentialsProvider(credentialsProvider)
          .setRemote(targetUrl)
          .setRefSpecs(new RefSpec("+refs/*:refs/*"))
          .call();
// ...

In the next example, we’ll review how we can build a Lambda function to enforce commit message policies.

Example 2: Enforce Git Commit Message Policy

Some customers choose to enforce policies on a Git repository to maintain code quality. In this example, we use the same tools described above to clone a repository and validate the commit messages from the Git log using a regular expression.

public class PolicyEnforcerHandler implements RequestHandler<CodeCommitEvent, HandlerResponse> {

    private static Logger logger = LoggerFactory.getLogger(ArchiveRepositoryHandler.class);

    private final String mainBranch;
    private final String snsTopicArn;
    private final Pattern pattern;
    private final AmazonSNS snsClient;

    public PolicyEnforcerHandler() {
        mainBranch = System.getenv("MAIN_BRANCH_NAME");
        snsTopicArn = System.getenv("SNS_TOPIC_ARN");

        String messageRegex = System.getenv("MESSAGE_REGEX");
        pattern = Pattern.compile(messageRegex);

        String snsRegion = snsTopicArn.split(":")[3];
        snsClient = AmazonSNSClientBuilder.standard().withRegion(snsRegion).build();
    }

    // ...

On instantiation, the Amazon Lambda function compiles the regular expression for message validation and creates an Amazon SNS client to send notifications.

@Override
public HandlerResponse handleRequest(CodeCommitEvent event, Context context) {
    String sourceName = event.getDetail().getRepositoryName();
    String sourceRegion = event.getRegion();
    String commitId = event.getDetail().getCommitId();
    String oldCommitId = event.getDetail().getOldCommitId();

    try {
        // clone source repository
        CodeCommitMetadata source = new CodeCommitMetadata(sourceName, sourceRegion);
        String sourceUrl = source.getCloneUrlHttp();
        Git git = new CloneCommandBuilder().buildCloneCommand(sourceUrl).call();

        // ...

When the Amazon Lambda function is triggered by the AWS CloudWatch Event, the process to clone the repository is the same, discovering the AWS CodeCommit HTTPS URL from the AWS CloudWatch Event and cloning a bare Git repository to the Amazon Lambda execution container.

// use the OldCommitId, or default to the main branch
String toGitReference = Optional.ofNullable(oldCommitId).orElse(mainBranch);
Repository repository = git.getRepository();
ObjectId to = repository.resolve(toGitReference);
ObjectId from = repository.resolve(commitId);

// ...

JGit RevWalk is used to determine the range of commits over which to validate the message policy. When commits are added to an existing branch, AWS CodeCommit will emit an referenceUpdated event, which includes commitId and oldCommitId fields that establish the range of commits.

When commits are added to a new branch, AWS CodeCommit will emit a referenceCreated event, which includes a commitId but not the oldCommitId. In this case, we will use the main branch name to determine the common ancestry of the commit chains, called the merge base, in order to establish the range of commits.

// create a RevWalk and set the range of commits
try (RevWalk walk = new RevWalk(repository)) {
    walk.markStart(walk.parseCommit(from));
    walk.markUninteresting(walk.parseCommit(to));

    // iterate the list of commits and validate each message
    for (RevCommit commit : walk) {
        Matcher matcher = pattern.matcher(commit.getShortMessage());

        // publish a message to the topic if the message does not match
        if (!matcher.find()) {
            String message = buildMessage(commit);
            logger.info(message);
            snsClient.publish(snsTopicArn, message);
        }
    }

    walk.dispose();
}

// ...

Once the range has been established, we iterate the list of commit messages, testing each against the message policy regular expression. If the message does not match the regular expression, then it is out of compliance from the policy, and a message is published to the Amazon SNS topic for notification.

In the next example, I’ll review how to backup an archive of the files in a Git repository.

Example 3: Backup Git Archive to Amazon S3

The previous examples have focused on the bare Git repository objects, however there are some use cases for processing the files in the Git repository at a particular reference. In this example, I’ll build a Lambda function to create a zip of the files in the repository and store it in Amazon S3 as a backup.

public class ArchiveRepositoryHandler
        implements RequestHandler<CodeCommitEvent, HandlerResponse> {

    private static Logger logger = LoggerFactory.getLogger(ArchiveRepositoryHandler.class);

    private final String ZIP_FORMAT = "zip"
    private final String targetS3Bucket;
    private final AmazonS3 s3Client;

    public ArchiveRepositoryHandler() {
        targetS3Bucket = System.getenv("TARGET_S3_BUCKET");
        s3Client = AmazonS3ClientBuilder.defaultClient();
        ArchiveCommand.registerFormat(ZIP_FORMAT, new ZipFormat());
    }

    // ...

On instantiation, the Amazon Lambda function creates an Amazon S3 client and registers the ZipFormat with JGit.

@Override
public HandlerResponse handleRequest(CodeCommitEvent event, Context context) {
    String sourceName = event.getDetail().getRepositoryName();
    String sourceRegion = event.getRegion();
    String commitId = event.getDetail().getCommitId();

    try {
        // clone source repository
        CodeCommitMetadata source = new CodeCommitMetadata(sourceName, sourceRegion);
        String sourceUrl = source.getCloneUrlHttp();
        Git git = new CloneCommandBuilder().buildCloneCommand(sourceUrl).call();

        // ...

When the Amazon Lambda function is triggered by the AWS CloudWatch Event, the process to clone the repository is the same, discovering the AWS CodeCommit HTTPS URL from the AWS CloudWatch Event and cloning a bare Git repository to the Amazon Lambda execution container.

// create and upload archive for commitId
File file = Files.createTempFile(null, null).toFile();
try (OutputStream out = new FileOutputStream(file)) {
    ObjectId objectId = git.getRepository().resolve(commitId);
    git.archive().setTree(objectId)
                 .setFormat(ZIP_FORMAT)
                 .setOutputStream(out)
                 .call();

    String key = sourceName + "." + commitId + "." + ZIP_FORMAT;
    s3Client.putObject(targetS3Bucket, key, file);
}

// ...

Once the repository has been cloned, we use a JGit ArchiveCommand to create a zip artifact representing the working files of repository at the commit triggering the event. The generated zip artifact is then uploaded to Amazon S3 using the repository name and commit shortId as the key.

Conclusion

AWS CloudWatch Event’s support for AWS CodeCommit Repository State Changes event types opens possibilities to build event-driven source code workflow automation using the same AWS CloudWatch Events service that acts as an event bus across many AWS services. Combining this new capability with Amazon Lambda, the JGit client, and AWS IAM policy controls provides builders with a set of tools to build serverless solutions that securely access AWS resources, scale on demand, and are cost effective.

In this blog, I’ve demonstrated three example solutions built using these tools, however AWS CodeCommit’s integration with AWS CloudWatch Events allows you to integrate with other AWS CloudWatch Events targets, like Amazon SQS or AWS Step Functions.

I encourage you to visit the GitHub repository (https://github.com/awslabs/serverless-codecommit-examples), which has instructions to launch these examples in your own AWS account. Please share your ideas and questions in the comments below, or submit pull requests and issues to the GitHub repository!