AWS DevOps & Developer Productivity Blog
Build Serverless AWS CodeCommit Workflows using Amazon CloudWatch Events and JGit
Sam Dengler is a Solutions Architect at Amazon Web Services
Summary
Amazon CloudWatch Events now supports AWS CodeCommit Repository State Changes event types for activities like pushing new code to a repository. Using these new event types, customers can build Amazon CloudWatch Event rules to match AWS CodeCommit events and route them to one or more targets like an Amazon SNS Topic, AWS Step Functions state machine, or AWS Lambda function to trigger automated workflows to process repository changes.
In this blog, I will provide three examples for using AWS Lambda and JGit to build cost-effective serverless solutions to securely process AWS CodeCommit repository state changes:
- Replicate CodeCommit Repository
- Enforce Git Commit Message Policy
- Backup Git Archive to Amazon S3
Source code and Amazon CloudFormation templates for the examples are located in the following GitHub repository: https://github.com/awslabs/serverless-codecommit-examples.
AWS CodeCommit CloudWatch Events
Amazon CloudWatch Events delivers a near real-time stream of system events that describe changes in AWS resources. Below is an example Amazon CloudWatch Event for one of the new AWS CodeCommit Repository State Changes event types, referenceUpdated. Any change to the repository will trigger a referenceUpdated event, however triggers for particular branches can be filtered using the referenceType and referenceName fields in the event details.
We will use the Amazon CloudWatch Event’s fields to create a pattern to match the events for which we will trigger a target, in this case an AWS Lambda function.
Accessing AWS CodeCommit using HTTPS URLs
The HTTPS URL method for accessing an AWS CodeCommit repository is particularly suited to a serverless solution because an Amazon Lambda execution container already provides temporary AWS IAM key credentials associated to the function’s AWS IAM Execution Role. The function’s Execution Role is associated to one or more AWS IAM Policies, in which you specify permissions allowing the function to access AWS resources. For each function, we will limit the AWS IAM polices to only the AWS CodeCommit repository, Amazon S3 bucket, or Amazon SNS topics, following the AWS best practice to grant least privileged access.
For example, the AWS IAM Policy snippet below restricts the Amazon Lambda function to pull from the source repository and push to the target repository.
When using the HTTPS URL access method, a credential helper is configured for the Git client, which executes the “aws codecommit credential-helper” command to provide a SigV4 compatible user name and password using AWS IAM credentials (see more). When using JGit as the Git client, a CredentialsProvider can be supplied to Git commands to achieve the same result.
The Spring Cloud Config project provides an implementation of the JGit CredentialsProvider for AwsCodeCommit (source), which conveniently uses the AWS DefaultAWSCredentialsProviderChain to discover AWS credentials in the standard priority order supported by Amazon Lambda. The AwsCodeCommit.calculateCodeCommitPassword method is particularly interesting to review as an example of SigV4 transformation logic.
Cloning a repository is repeated across examples, and the functionality has been delegated to a supporting CloneCommandBuilder Class below.
Next we’ll look at some examples to process the repository events using Amazon Lambda.
Example 1: Replicate CodeCommit Repository
Customers often need to replicate commits from one repository to another to support disaster recovery or cross region CI/CD pipelines. In this example, the Amazon Lambda function will clone a repository from the source and push to the target. This example is intended to update an existing target repository, which should not be empty before configuring replication prior to configuring replication.
Please note, Amazon Lambda functions are limited to 1.5GB memory, 512MB ephemeral disk capacity (“/tmp” space), and a 5 minute execution time. If your repository is unable to be processed within these limits, please see the Replicating and Automating Sync-Ups for a Repository with AWS CodeCommit blog article for an alternative approach to replicate repositories using an Amazon EC2 instance.
Let’s take a look at some code!
On instantiation, the Amazon Lambda function discovers the target AWS CodeCommit repository HTTPS URL by querying the repository metadata using the target repository name and region. This discovery process is repeated across the examples, and the code has been delegated to the CodeCommitMetadata class below.
When the Amazon Lambda function is triggered by the AWS CloudWatch Event, the source repository name and region in the event are used to discover the source repository HTTPS URL. We use JGit to clone the source repository from this URL into a local repository stored in a temporary directory in the Amazon Lambda execution container.
Once we’ve cloned the local repository to the Amazon Lambda execution container, the last step is to set the target AWS CodeCommit repository as a new remote location and push the local references using the reference specification “+refs/*:refs/*”.
In the next example, we’ll review how we can build a Lambda function to enforce commit message policies.
Example 2: Enforce Git Commit Message Policy
Some customers choose to enforce policies on a Git repository to maintain code quality. In this example, we use the same tools described above to clone a repository and validate the commit messages from the Git log using a regular expression.
On instantiation, the Amazon Lambda function compiles the regular expression for message validation and creates an Amazon SNS client to send notifications.
When the Amazon Lambda function is triggered by the AWS CloudWatch Event, the process to clone the repository is the same, discovering the AWS CodeCommit HTTPS URL from the AWS CloudWatch Event and cloning a bare Git repository to the Amazon Lambda execution container.
JGit RevWalk is used to determine the range of commits over which to validate the message policy. When commits are added to an existing branch, AWS CodeCommit will emit an referenceUpdated event, which includes commitId and oldCommitId fields that establish the range of commits.
When commits are added to a new branch, AWS CodeCommit will emit a referenceCreated event, which includes a commitId but not the oldCommitId. In this case, we will use the main branch name to determine the common ancestry of the commit chains, called the merge base, in order to establish the range of commits.
Once the range has been established, we iterate the list of commit messages, testing each against the message policy regular expression. If the message does not match the regular expression, then it is out of compliance from the policy, and a message is published to the Amazon SNS topic for notification.
In the next example, I’ll review how to backup an archive of the files in a Git repository.
Example 3: Backup Git Archive to Amazon S3
The previous examples have focused on the bare Git repository objects, however there are some use cases for processing the files in the Git repository at a particular reference. In this example, I’ll build a Lambda function to create a zip of the files in the repository and store it in Amazon S3 as a backup.
On instantiation, the Amazon Lambda function creates an Amazon S3 client and registers the ZipFormat with JGit.
When the Amazon Lambda function is triggered by the AWS CloudWatch Event, the process to clone the repository is the same, discovering the AWS CodeCommit HTTPS URL from the AWS CloudWatch Event and cloning a bare Git repository to the Amazon Lambda execution container.
Once the repository has been cloned, we use a JGit ArchiveCommand to create a zip artifact representing the working files of repository at the commit triggering the event. The generated zip artifact is then uploaded to Amazon S3 using the repository name and commit shortId as the key.
Conclusion
AWS CloudWatch Event’s support for AWS CodeCommit Repository State Changes event types opens possibilities to build event-driven source code workflow automation using the same AWS CloudWatch Events service that acts as an event bus across many AWS services. Combining this new capability with Amazon Lambda, the JGit client, and AWS IAM policy controls provides builders with a set of tools to build serverless solutions that securely access AWS resources, scale on demand, and are cost effective.
In this blog, I’ve demonstrated three example solutions built using these tools, however AWS CodeCommit’s integration with AWS CloudWatch Events allows you to integrate with other AWS CloudWatch Events targets, like Amazon SQS or AWS Step Functions.
I encourage you to visit the GitHub repository (https://github.com/awslabs/serverless-codecommit-examples), which has instructions to launch these examples in your own AWS account. Please share your ideas and questions in the comments below, or submit pull requests and issues to the GitHub repository!