Create a pipeline with canary deployments for Amazon EKS with AWS App Mesh

NOTICE: October 04, 2024 – This post no longer reflects the best guidance for configuring a service mesh with Amazon EKS and its examples no longer work as shown. Please refer to newer content on Amazon VPC Lattice.

——–

In this post, we will demonstrate how customers can leverage different AWS services in conjunction with AWS App Mesh to implement a canary deployment strategy for applications running on Amazon Elastic Kubernetes Service (Amazon EKS).

As stated in the post “Getting started with App Mesh and EKS”, many customers are currently implementing microservices in a way where they can build scalable and resilient applications while reducing time to market.

By making use of container orchestrators such as Amazon EKS and Amazon ECS, customers across the globe are able to run millions of containers. However, when leveraging the power of these tools, customers still need to think about managing the connectivity, operations, and security between microservices in these distributed architectures to be able to run highly decentralized groups of services.

AWS App Mesh is a service mesh that makes it easy to monitor and control services. App Mesh standardizes the way your services communicate, giving you end-to-end visibility, and helping to ensure high availability for your applications. App Mesh gives you consistent visibility and network traffic controls for every service in an application.

1. Canary deployments

The process of deploying monolithic applications is often painful and risky given that all the business needs exist in a single piece of software and the infrastructure provisioned to run these applications needs to be updated often. This approach can cause many problems when the deployments are not successful and the old versions of the applications doesn’t exist anymore.

There are, of course, ways to deal with this, such as backups and “blue green deployments,” where a whole new environment is created and the traffic shifts to the new version of the application while keeping the old environment up and running. Container technology also helps here, playing an important role around the packaging of all the assets needed to run the applications, but one thing that is not addressed by these approaches is the user experience, and here is where the concept of canary deployments starts making more sense.

With canary deployment/release, you switch the traffic in small percentage increments after deploying a new version of a given application. It can also monitor the health of the new version, so if there is an issue, your canary deployment can automatically switch the traffic back to the old version decreasing drastically the impact of bugs in new application versions. This approach helps not only with brand new implementations, but also addresses the need for testing around complex and distributed microservices where you can send a specific amount of traffic to newer versions in a controlled manner:
Gif that shows canary deployment taking place

2. Architecture overview

The architecture diagram below represents an overview of the AWS services that will be used to create the pipeline:

Architecture diagram of the canary deployment pipeline

To demonstrate this architecture, we will be using an application called Yelb. Yelb allows users to vote on a set of alternatives like restaurants and dynamically updates pie charts based on the votes. Additionally, Yelb keeps track of the number of page views and prints the hostname of the yelb-appserver instance serving the API request upon a vote or a page refresh. Yelb components include:

A frontend called yelb-ui that is responsible for vending the JS code to the browser.
An application server named yelb-appserver, a Sinatra application that reads and writes to a cache server (redis-server) and a Postgres backend database (yelb-db).
Redis stores the number of page views and Postgres stores the votes.

Yelb’s architecture looks like this:

Architecture diagram of the yelb application
Yelb configuration uses ephemeral disks for all the containers. Running databases in this way is only done for demonstration purposes.

To keep up with the latest Yelb version, the script provided in this blog gets the latest main branch of Yelb from GitHub via a git clone. To avoid errors due to the Docker Hub Rate Limiting policy, we place a copy of all the required source images in Amazon ECR for the build stages in AWS CodeBuild.

3. Set up the infrastructure

During the next steps of the post, we will be using the AWS Oregon Region.

To follow along, you will need to have an environment with some tooling. We have used an AWS Cloud9 instance to run this tutorial. If you want to create a Cloud9 instance in your account, follow the steps in the EKS workshop from chapter “Create a Workspace” to “Update IAM Settings for your Workspace”.

3.1 Requirements

There are some requirements to be installed and configured before you can create the pipeline. Start by cloning the GitHub repository:

git clone https://github.com/aws/aws-app-mesh-examples.git
cd aws-app-mesh-examples/blogs/eks-canary-deployments-pipeline/

For your convenience, we created some scripts that are placed inside of the setup folder of the resources you just downloaded. These scripts were created to be used with an AWS Cloud9 instance. Feel free to open and read them if you want to know what exactly they do.

You will need jq, eksctl, kubectl, aws-cli, and helm installed during the execution of the next steps:

./setup/install_dependencies.sh

You will also have to export some environment variables (you might customize these variables to your needs):

./setup/export_environment_variables.sh && source ~/.bash_profile

3.2 Amazon EKS cluster

Amazon Elastic Kubernetes Service (Amazon EKS) is a managed service that makes it easy for you to run Kubernetes on AWS without needing to stand up or maintain your own Kubernetes control plane. Kubernetes is an open-source orchestration system for automating the deployment, scaling, and management of containerized applications.

Now let’s create an EKS cluster to be used during the next steps:

./setup/create_eks_cluster.sh

It takes around 15 minutes to create the cluster, feel free to grab a coffee! For more details on how to create the cluster, you can refer to this documentation. Once completed, you can test the cluster connectivity like so:

$ kubectl get svc
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   172.20.0.1   <none>        443/TCP   14m

3.3 AWS App Mesh controller for Kubernetes

The App Mesh controller for Kubernetes lets you manage App Mesh resources, such as meshes, virtual services, virtual nodes, virtual routers, and routes through Kubernetes. You also automatically add the App Mesh sidecar container images to Kubernetes pod specifications. For more details you can refer to this documentation.

If you are using an existing EKS cluster, you will have to confirm that there is no pre-release version installed:

curl -o pre_upgrade_check.sh https://raw.githubusercontent.com/aws/eks-charts/master/stable/appmesh-controller/upgrade/pre_upgrade_check.sh
chmod +x pre_upgrade_check.sh && ./pre_upgrade_check.sh

It has to return “Your cluster is ready for upgrade. Please proceed to the installation instructions“.

Then install the App Mesh controller:

./setup/install_appmesh_controller.sh

3.4 Amazon CloudWatch Container Insights

CloudWatch Container Insights enable you to collect, aggregate, and summarize metrics and logs from your containerized applications and microservices. The metrics include utilization for resources such as CPU, memory, disk, and network. Container Insights also provide diagnostic information, such as container restart failures, to help you isolate issues and resolve them quickly. For more details you can refer to this documentation.

./setup/install_cloudwatch_container_insights.sh

4. Creating the pipeline

Now that you have all requirements installed and configured, you can move forward with the creation of the pipeline.

4.1 AWS Lambda layer

A Lambda layer that has kubectl, awscli, helm, and jq will be used on our pipeline to communicate with the Amazon EKS cluster. For more information regarding the Lambda layer, used please access this aws-samples.

./setup/aws_lambda_layer.sh

Only move to the next step if the last command has returned CREATE_COMPLETE.

4.2 Shared AWS CloudFormation stack

The shared AWS CloudFormation stack contains multiple Lambda functions and one AWS Step Functions state machine that will be responsible for the deploy stage of the pipeline. As it accepts all configuration parameters as input you can use the same resources for all deployments.

./setup/create_shared_cloudformation_stack.sh

Only move to the next step if the last command has returned CREATE_COMPLETE.

After the AWS CloudFormation stack is created you will need to add the AWS Lambda IAM Role from AWS CloudFormation outputs to Kubernetes RBAC to allow deployments into the Kubernetes cluster.

./setup/add_iam_role_to_rbac.sh

4.3 Pipeline AWS CloudFormation stack

The pipeline AWS CloudFormation stack was designed to be deployed for each microservice you have. It will create one AWS CodeCommit repository and the pipeline that will be triggered by new commits into the main branch of that repository.

For demonstration purposes, you will create four stacks with the Yelb microservices. You can change the environment variable called USE_SAMPLE_MICROSERVICES to 'False' and it will create all the needed resources for the pipeline with an empty AWS Code Commit repository to be used with your own source code.

Before you can deploy the microservices, you will need a Kubernetes Namespace and an AWS App Mesh mesh for the microservices. Let’s create a namespace and mesh called yelb:

NAMESPACE=yelb MESH=yelb envsubst < setup/namespace_and_mesh.yml | kubectl apply -f -

Now we can create the microservices. Because the Yelb architecture contains four microservices, we have created a script that will create all the needed AWS CloudFormation stacks (you can open the script to see its details):

./setup/create_microservice_pipeline_stack.sh

Only move to the next step if the last command has returned All four stacks created successfully!.

5. Testing the pipeline

Now that you have completely created the architecture you can start to look into it to see how it works.

5.1 Configuration Files

First let’s take a look into the AWS CodeCommit repositories, you will find a folder called specfiles on each repository with 3 files in it.

build.yml has the steps to build the Docker image and save it in Amazon ECR. It also adds two additional parameters to deploy.json:
1. container_image is added to the deploy.json file with the image URI after the docker image is pushed to Amazon ECR.
2. config_file is added to the deploy.json file with the content of base64 encoded kubernetes_template.yml.
deploy.json has the configuration parameters to deploy the microservice.
1. cluster_name is the name of the EKS cluster you want to deploy into.
2. kubernetes_namespace is the name of the Kubernetes namespace you want to deploy into.
3. microservice_name is the name of the microservice.
4. percentage_step is the amount of traffic switched to the new version in increments.
5. wait_time represent the time in seconds to wait between each traffic switch.
6. failure_threshold_value is optional and used to specify the maximum allowed 5xx HTTP response code before triggering an automatic rollback. (default = 0)
7. failure_threshold_time is optional and used to specify the time range in seconds to count the amount of 5xx HTTP response code. The minimum recommended value is 60 because the used CloudWatch Metric Filter aggregates and reports every minute. (default = 600)
kubernetes_template.yml is a template of the specfile for kubectl to create the new microservice. You will notice that it has some variables:
1. KUBERNETES_NAMESPACE will be substituted by the value of kubernetes_namespace from the deploy.json file.
2. MICROSERVICE_NAME will be substituted by the value of microservice_name from the deploy.json file.
3. CONTAINER_IMAGE will be substituted by the new container image built during the pipeline execution and stored inside Amazon ECR.
4. CANARY_VERSION will be substituted by the new version during deployment.
5. CANARY_ROUTES will be substituted by the traffic switch increments during deployment.

Everything else inside of the AWS CodeCommit repositories are source code from the Yelb application.

5.2 Pipeline resources

The easiest way to look into the resources created is to open AWS CodePipeline. There you will see the four pipelines you created earlier:

AWS CodePipeline pipelines

You can open one of them and see the steps Source, Build and Deploy. These steps are all customizable. You could add before deploying, for example, security checks or manual approvals.

AWS CodePipeline pipeline 1. The Source step monitors if there are any changes to the main branch of the AWS CodeCommit repository. You can get to the repository by clicking on the AWS CodeCommit link inside of the source step.

2. The Build step builds the Docker image with AWS CodeBuild and stores it in Amazon ECR. You can see the build logs by clicking in the Details link from the build step.

3. The Deploy step triggers the AWS Step Functions state machine created by the shared AWS CloudFormation stack. You see its status by clicking on the Details link.

Click on the Details link from the Deploy step and you can check the Input and Output of each step as well as the AWS Lambda functions and the execution logs:

AWS Step Functions state machine

A successful deployment that switched 100% of the traffic to the new version and passed all health checks looks like following images:

5.3 Deploy a new version

Now let’s see the Yelb application working and make some changes to see how the pipeline behaves.

echo "http://$(kubectl get -nyelb service/yelb-ui-1 -o json | jq -r '.status.loadBalancer.ingress[0].hostname')"

Open the URL returned by the command above in your preferred browser. You should see a page similar to the following (it might take some minutes for the DNS name to propagate):

Yelb application frontend

Click in some vote buttons and see how the values are updated. Did you notice something wrong while voting?

We intentionally added an issue to the yelb-appserver microservice so that it increments by two each time there is one vote. Now let’s fix that issue and see how it gets applied.

Open AWS CodeCommit and navigate to the yelb-appserver repository, open the file modules/restaurantsdbupdate.rb and click on the Edit button to fix the issue. Change the line:

con.prepare('statement1', 'UPDATE restaurants SET count = count +2 WHERE name = $1')

to:

con.prepare('statement1', 'UPDATE restaurants SET count = count +1 WHERE name = $1')

and commit the changes.

Go to AWS CodePipeline and after some seconds you will see that yelb-appserver-pipeline is In progress. Open it to see the progress of the deployment. Wait until it gets to the Deploy stage and then refresh the browser tab with the Yelb application a few times. You will see that the App Server version (as shown in the image below) will be switching between yelb-appserver-1 and yelb-appserver-2, that is the canary deployment taking place.

Yelb application during canary deployment

Open the AWS App Mesh virtual router for the yelb-appserver microservice and you will be able to see how are the weights at this moment. You can also open the deploy.json file in the yelb-appserver AWS CodeCommit repository to see the percentage_step and wait_time parameters to know how long it will take to switch all the traffic. For this example, the values percentage_step: 10 and wait_time: 60 were used and will take a total of 10 minutes to switch all the traffic.

After the deployment completes, you can try to vote on the Yelb application again and see that now it increments by one.

5.4 Deploy a version that triggers rollback

Open the yelb-appserver.rb file from the yelb-appserver AWS CodeCommit repository and change the port for the application under the production configuration (line 33) from 4567 to 4568 and commit the changes.

That will trigger yelb-appserver-pipeline and deploy a new version of the yelb-appserver microservice that does not work. Open yelb-appserver-pipeline in AWS CodePipeline and wait until the Deploy stage gets In Progress. Then click in Details under the Deploy stage. In this page, you will see the visual workflow during the deployment.

Try to refresh the Yelb application a few times. You will see that sometimes when you refresh there is no vote data returned and you are not able to vote. That’s because you are being redirected to the new version that is not working.

Wait a few minutes and you will see in the visual workflow that a rollback was triggered because the new version did not pass the health check. It’s important to note that the health check is a AWS Lambda function that can be customized if needed.

Deployment that triggered rollback

You can now refresh the Yelb application a few times again and it will work properly using previous (yelb-appserver-2) version.

6. Cleanup

As you can see in this deployment, there are many AWS services interacting like Docker images, S3 buckets, Code Pipelines, Lambda functions, CloudWatch events, and SSM parameters to name a few. We provide a convenient script to clean this up in the folder cleanup.

./cleanup/cleanup_env.sh

7. Conclusion

In this post, we demonstrated how you can leverage AWS App Mesh and implement a canary deployment strategy in conjunction with other AWS services such as AWS Code Pipeline and AWS Step Functions.

Another way to have the Canary Deployments approach with AWS App Mesh and Amazon EKS is to leveraging Weave Flagger. Flagger allows you to promote canary deployments using AWS App Mesh. It uses Prometheus metrics to determine canary deployment success or failure and uses the App Mesh routing controls to shift traffic between the current and canary deployment.

Further, some useful links if you want to dive deeper into AWS App Mesh:

Check out the AWS App Mesh official documentation.
Learn more about the AWS App mesh capabilities in the AWS App Mesh workshop.

You can track upcoming features via the App Mesh roadmap and experiment with new features using the App Mesh preview channel. You can also join us on the App Mesh Slack community to share experiences and discuss with the team and your peers.

Containers