Blue Green Deployments with the ECS External Deployment Controller

Introduction

Continuous integration (CI) and continuous delivery (CD) are vital practices in modern software development. They streamline the delivery of software to deliver business value quickly. Along with rapid delivery, current business environment also demands near zero down time for their apps. Blue/green deployments offer a solution that not only enables teams to deliver fast, it also helps eliminate “maintenance windows” that were considered essential for releasing features. On a high level, the approach requires that a new application stack (green) be deployed in parallel to the production version (blue). Initially, this green stack is isolated from production load and is put through various tests to ensure stability. Once it is deemed stable, the full production traffic is moved over, from the blue stack, to the green stack. Canary release is a variation of the technique where traffic is moved over to green version slowly over time. This technique will be covered in a future post.

For Amazon ECS, the b/g deployments require less custom work when an engineering team uses AWS CodeDeploy for their deployments. However, for teams utilizing other tools for their CD needs, b/g deployments remain a challenge. One of the solutions used by such teams requires deploying separate services for blue and green version of the app in to separate target groups behind a load balancer. A switch from blue to green requires a target group swap on the listener rule. This method, although practical, was less than ideal as it required creating separate ECS services for the blue and green version. With the introduction of ECS task sets, external controller, and enhanced request routing for ALBs, this blog aims to showcase how an external tool, Jenkins, could be used to build CD pipelines that implement the b/g and canary pattern.

First, let’s define the key components of this solution.

Solution components

Task set

A task set contains information that allows an ECS service to run multiple versions of an application via variations in the task definition associated with it. Logically, it represents a unit of deployment with parameters for task count, network configuration, and LB settings, among others. This, essentially, allows changing significant configurations and affect how a new version of a task operates, without effecting the production version. The status parameter represents which task set is currently serving production, Primary, and which one is not, “Active”. For AWS CodeDeploy, a task set relates 1:1 with a deployment and carries the deployment id in the externalId field. Once defined, a task set only permits changes to the scale parameter. This update constraint, truly, relays the intent of a task set as an isolation boundary. Changes to any of the other parameters may cause disruptions and, hence, require creation of a new task set, which is marked “Active” when it is first provisioned.

External controller

Out of the various deployment types offered by ECS, an external deployment allows full control over the deployment process. This deployment process is executed by a third-party deployment controller of choice; in this case, Jenkins. The use of a custom deployment process and a third-party controller is specified using a value of “EXTERNAL” for the deploymentController parameter in the service definition.

Jenkins

Jenkins is a very popular CI/CD server with wide adoption in the industry. The plugin and extensibility model makes it really versatile and the OSS nature allows for the availability of plugins for all different types of languages and frameworks.

An engineering team creates pipelines for building the delivery unit(s) of their system. The pipelines are composed of stages that perform, advisably, small and granular steps, like dependency resolution, compiling, linking, versioning etc., to build the system. Once built, a different set of stages, in the same pipeline (or different), deploys the artifacts to the appropriate environment.

For this blog, we’ll be using Jenkins as the “External Controller” to orchestrate the deployment for an ECS service. With support for Groovy scripting, the stages in the pipeline implement custom logic to create new task definition, task set, and other steps to deploy the new version with a b/g or canary strategy.

Using the sample pipeline

Prerequisites

For the pipeline to run, a Jenkins agent with the following dependencies is required:

Docker
GNU Make
Python
Pip
- AWS CLI
Jenkins plugins
- Git plugin
- Pipeline utility steps

Furthermore, the AWS CLI is expected to be configured with proper credentials. The profile name assigned to the AWS CLI configuration is provided as a parameter in the Jenkins pipeline.

Sample repository details

For the example, we are going to use a simple flask based API, available on GitHub. The repository contains the application code, infrastructure as code assets (AWS CloudFormation), and the build pipeline (Jenkinsfile). For building the application, a Makefile is used to reflect common industry practice. The Makefile.env file contains all the relevant variables that are required by the pipeline steps. Where appropriate, for example the value for ALB ARN, the values in the file must be updated to the output of the CloudFormation stacks, deployed using 1-InfraStack.yaml and 2-AppStack.yaml.

To run this sample using your own setup, please, follow the following steps:

Fork the GitHub repository.
Using your own AWS account, deploy 1-InfraStack.yaml and 2-AppStack.yaml stacks located in the infrastructure folder.
Update the following values in the Makefile.env from the output section of the two stacks created in step two.
- REGION
- SERVICE_ARN
- CLUSTER_ARN
- TASK_ROLE_ARN
- EXECUTION_ROLE_ARN
- BLUE_TARGET_GROUP_ARN
- GREEN_TARGET_GROUP_ARN
- LIVE_LISTENER_ARN
- TEST_LISTENER_ARN
- LOG_GROUP
- TASK_SECURITY_GROUPS
- TASK_SUBNETS
Commit and push the changes into your forked repository.

Pipeline setup

After the Jenkins agent has been set up with the appropriate dependencies, the following steps would be used to set up the pipeline in Jenkins.

Create pipeline

1. From the dashboard, click the “New Pipeline” button.

2. Click “Git” for “Where do you store your code?”

3. Provide the repo URL, of your fork, in “Connect to Git repository.”

4. Click “Create Pipeline.”

Jenkins is going to create the pipeline and scan all branches to locate Jenkinsfile. In this case, it’ll locate the Jenkinsfile present in our master branch and would start a build. Unfortunately, the first build is going to fail as the value for awsAccountNumber and awsProfile parameters were not provided in the initial run.

Pipeline implementation details

Let’s take a look at how the pipeline achieves the blue/green deployment for the sample application.

Deployment stages

For an existing “blue” deployment created as part of the initial stack provisioning, the following steps are executed to deploy a new, “green,” version.

Update application code

First, the application code is updated with the required changes. Next, the environment variable NEXT_ENV is set to green. These changes are, then, committed and pushed to the repo. To start a new build, click on the pipeline and then select the “Branches” tab. Click on the “Run” button to trigger the build.

A pop-up opens requesting parameters for “awsAccountNumber” and “awsProfile.” Provide these values and click “Run.”

The build starts and proceeds through the following stages to expose the new version (green) through the “Test” listener.

Build image

The first stage is to build the container image. The pipeline relies upon the GNU Make to perform the build using the specification in the Makefile. The sample app, though simple, uses this mechanism to show a common real-world build system. The image is built and tagged with the Git hash of the current commit. This tagging allows the use of specific tags to refer to container images instead of the “latest” tag and is a best practice.

build-image:
docker build -t $(APP_NAME):$(GIT_COMMIT) $(APP_DIR)
docker tag $(APP_NAME):$(GIT_COMMIT) $(AWS_ACCOUNT_NUMBER).dkr.ecr.$(REGION).amazonaws.com/$(REPO_NAME):$(GIT_COMMIT)

Push to Amazon ECR

The new image is then pushed to ECR. With the latest image tagged with only the latest Git hash, the use of the “latest” tag is not possible.

push-image:
@docker push $(AWS_ACCOUNT_NUMBER).dkr.ecr.$(REGION).amazonaws.com/$(REPO_NAME):$(GIT_COMMIT)

Register task definition

With the new application image available in ECR, a new task definition is registered referring to the latest image. This task definition would allow ECS to launch tasks using the latest application code. The build step uses a task definition template and sets the values in the template to the ones provided by Makefile.env. It then uses AWS CLI to register the task definition.

// Read template and update values
def taskDefinitionTemplate = readJSON(file: templateFile)
taskDefinitionTemplate.family = taskFamily
taskDefinitionTemplate.taskRoleArn = env.TASK_ROLE_ARN
taskDefinitionTemplate.executionRoleArn = env.EXECUTION_ROLE_ARN
…

// Uses the updated template as input to register the TaskDefinition.
def registerTaskDefinitionOutput = sh (
script: "aws ecs register-task-definition --cli-input-json file://${taskDefFile}",
returnStdout: true
).trim()

Create task set

In this step, a new task set, deployment, is created, referring to the task definition created in the previous step. This step also uses a base task set template and supplies values to reflect a new deployment. After this task set is created, ECS would provision the supplied number of tasks and registers them with the target group specified in the template.

def taskSetTemplateFile = env.TEMPLATE_BASE_PATH + '/' + env.TASK_SET_TEMPLATE_FILE
def taskSetTemplateJson = readJSON(file: taskSetTemplateFile)
taskSetTemplateJson.taskDefinition = registerTaskDefinitionOutput.taskDefinition.taskDefinitionArn
…

def createTaskSetOutput = sh (
script: "aws ecs create-task-set --service $SERVICE_ARN --cluster $CLUSTER_ARN --cli-input-json file://${taskSetFile}",
returnStdout: true
).trim()

Enable test listener

With the task set created, it is time to adjust the “Test” listener to direct traffic to the green target group. The stage achieves this by setting the weight, to 100, on the “Test” listener. This change makes the listener forward 100% of the traffic it receives to the green target group.

greenTG = ["Weight": 100, "TargetGroupArn": env.GREEN_TARGET_GROUP_ARN]
…

def modifyTestListenerResult = sh (
script: "aws elbv2 modify-listener --listener-arn $GREEN_LISTENER_ARN --cli-input-json file://${testDefaultActionsFile}",
returnStdout: true
).trim()

From the perspective of the ALB and the listeners, the following diagram shows the setup at this point.

At this point, the pipeline pauses for user input prior to proceeding:

Validation

With the new version of the application available via the “Test” listener, the pipeline stops and awaits user input to proceed. This step represents a validation and/or approval step in the deployment workflow. The new version is validated and if deemed fit, the pipeline would proceed to swap the “Live Listener” to point to this version.

Swap live listener

Post validation and approval, the pipeline now moves to adjust the “Live” listener to send traffic, it receives, to the green target group. This step actually achieves the green deployment and the new version of the application is now available for live traffic. This step also moves the previous deployment (blue) to the “Test” listener. From an ECS perspective, at this stage, there are two deployments, essentially, doubling the number of running tasks.

Logically, the view from ALB and listener looks like:

The build pipeline pauses again for user input. As the next stages would delete the previous deployment, it is imperative that the new version is considered fully stable. If an issue is detected in the current deployment, a reversion back to the previous version is as simple as setting the weight on the live listener back to the blue target group.

Cleanup

Based on user input, the pipeline runs through the final stages to delete the previous deployment and marks the task definition inactive. This, in turn, brings the number of tasks down to the intended level, as well.

Delete previous task set.
Mark the previous task definition inactive.

Conclusion

AWS CodeDeploy has enabled teams to adopt blue/green deployments for ECS services with little custom work. Prior to the introduction of “external controller,” teams using other CI/CD tools had to introduce an additional ECS service to achieve the same goal. This blog post covers how an “external controller” can be built using Jenkins, giving teams the flexibility to use their existing CI/CD tools.

Containers