Kubernetes cluster upgrade: the blue-green deployment strategy

This article was co-written by Sébastien Allamand (Sr. Solution Architect Specialist, Containers) and Michael Marie Julie and Quentin Bernard from TheFork, one of the leading online restaurant booking and discovery platforms in Europe and Australia.

In loving memory of our dear colleague Olivier Lebhard.

Introduction

Context

Kubernetes has become a new standard in our industry, with great built-in features and an incredible abstraction model. The standardization brought by Kubernetes is considerable, but its changing ecosystem requires constant adaptability. The Helm2 chart repository deprecation, 14 months of support for each Amazon Elastic Kubernetes Service (Amazon EKS) version, Istio’s move from a micro-services architecture to a monolith are a few examples of changes of this service.

An essential component of maintaining Kubernetes is keeping your cluster up-to-date, which requires defining an efficient process that reduces downtime and risk. This post will take a look at how we update our Kubernetes clusters while reducing risk to the platform at TheFork.

The following diagram illustrates the Amazon EKS Kubernetes release calendar:

The illustration shows the Amazon EKS Kubernetes version and upstream release, Amazon EKS release, and Amazon EKS end of support calendar details.

A new upgrade path

The upgrading process initiated in December 2020, which was almost a year after the upgrades occurred. Planning scenarios revealed that upgrading Amazon EKS from version 1.15 to 1.18 and Istio from version 1.4 to 1.8 is complicated because of the compatibility matrix and breaking changes in both products. Consequently, upgrades can’t occur in-place by skipping major versions and the compatibility matrix forces the sequences of upgrades. This complicated upgrading process is necessary and must be followed, even when Kubernetes is several versions behind the current version.

The two solutions for upgrading Kubernetes are shown in the following illustration:

Solution One: Upgrade the Amazon EKS + Istio versions by following matrix compatibility, which uses the following upgrade process:

The previous illustration shows that seven upgrades are required for each of our seven Amazon EKS clusters, which is a cumulative number of 49 upgrades! This number of upgrades is high risk and time-consuming.
Solution Two: This solution was proposed by Sebastien Allamand (Containers Specialist Solution Architect at AWS) in December 2020. Sebastien asked, “Why don’t you create a new cluster with the target version, and roll out your deployment on this new cluster?” This blue-green deployment strategy is the way to go because we can’t afford close to 50 upgrades!

Solution Overview

The blue-green deployment strategy

Since we decided to create a brand new cluster to upgrade our Amazon EKS and Istio versions, we needed to rethink all of our processes.

Our initial upgrade process for Amazon EKS and Istio are described as:

For the Amazon EKS: we upgrade the control plane using the AWS Management Console and trigger an instance refresh on our nodes. The only exception to these refreshes would be for our statefulsets, when a manual action occurs depending on the application.
For Istio: follow the upgrade process by using the official documentation, which is based on the Helm and istioctl versions.

For this upgrade and for future upgrades, we distinguish the «non-breaking changes» as upgrades that occur when the planning phase does not reveal changes that can cause downtime; and «breaking changes» as upgrades that occur when Kubernetes or Istio upgrades can cause downtime. We then use a different deployment strategy according to this:

For the «non-breaking changes» upgrades, we do a standard upgrade where we upgrade our control plane before launching a rolling update on our nodes.
For the «breaking changes» upgrades, we do a blue-green deployment, where we create a second cluster with the latest version of Amazon EKS and Istio. We deploy our applications, test them before switching the traffic on them, and finalize the upgrade with the shutdown of our old cluster.

To facilitate our blue-green switch, we are introducing a new naming convention for our clusters, called spoon and knife. Our cluster naming convention uses the following pattern: aws-environment-{spoon,knife}. Only one of our clusters handles our production traffic; however, several clusters exist for dedicated purposes (e.g., tooling, staging, etc.). The addition of this naming convention adds this reference at the end of our cluster name and helps to identify them when creating new clusters during our blue-green upgrade.

This blue-green deployment strategy provides additional flexibility, but but moving our applications to a new cluster takes a lot of time. Thus, we only use this strategy when it is necessary, as it is more time-consuming compared to a standard upgrade.

We can represent this workflow with the following illustration:

The illustration shows the workflow used by TheFork to upgrade the EKS version from 1.13 to 1.21 on their cluster.

Completing a blue-green deployment takes more time than a standard upgrade (e.g., 1-2 months for a blue-green deployment upgrade and 2-4 weeks for a classic upgrade). More investment is required for upgrades because we had to setup the new clusters and redeploy all our applications. Consequently, we prefer the standard upgrade when possible. The blue-green deployment strategy is available to help upgrade a few versions or to improve our confidence with upgrades that contains breaking changes. To confirm if breaking changes exist we need to thoroughly read all the changelogs of the components that will be upgraded.

Additional guidance on upgrades is provided in the following details:

All our environments are isolated (except our tooling cluster that can reach them all and our integration cluster that can join the staging one). This topic is described in detail later in this post.
We are not using the “LoadBalancer” ingress type in our Kubernetes Historically, we avoided Kubernetes managing our AWS resources in order to spread the responsibilities of our platform: Our Terraform repository handles our AWS resources, not Kubernetes. This separation has many benefits, including: we are in complete control of our AWS resources and we can migrate our cluster without any impact on our customers.
Our architecture is similar in each environment. We use the Paris (eu-west-3) Region. An internet-facing Application Load Balancer (ALB) will receive the HTTP traffic and route it to the private-facing applications via the Kubernetes Istio ingress instances. All applications use Amazon RDS Databases, Amazon Elastic File System (Amazon EFS), Amazon Simple Storage Service (Amazon S3), and Amazon ElastiCache using Memcached or Redis.

Prerequisites

The first thing to do is to read the Istio and Kubernetes changelog. These changelogs are usually very complete and will give you an overview of the changes introduced by the latest versions.

Once we have an idea about the changes brought by these versions, we can check the compatibility of our current objects. Fortunately, many tools are available to help during our quest:

For Kubernetes: we use a helpful tool named Kube-No-Trouble (kubent). You can find more details on its usage here: https://blog.doit-intl.com/kubernetes-how-to-automatically-detect-and-deal-with-deprecated-apis-f9a8fc23444c
For Istio: we download the latest version of istioctl locally and executing “istioctl analyse” and “istioctl experimental precheck” on our running cluster to get an overview of the changes introduced in the latest version.

If the changes are compatible with our actual running cluster, we fix them directly to iterate and reduce the risk of any changes not tested (especially for the deprecation of the application programming interface (API) version).

Fortunately, most of our applications were already compatible and the few that needed an update were easily done by upgrading their Helm charts.

Kubent output:

__________________________________________________________________________
>>> Deprecated APIs removed in 1.16  <<<
------------------------------------------------------------------------------------------
KIND         NAMESPACE     NAME         API_VERSION
DaemonSet    kube-system   kube-proxy   extensions/v1beta1
Deployment   kube-system   coredns      extensions/v1beta1

Here only kube-proxy and coredns use outdated API, but as the new Amazon EKS cluster will come with the proper version of them, we don’t have to worry.

Validate the upgrade process

One of our environments (called integration) is used to deploy instances of our front-end applications on-demand, connected to our staging APIs. This cluster is the perfect candidate to validate our upgrade process (logging, monitoring, error rate, etc.).

The upgrade process is as follows:

Creation of the new cluster inside the same Virtual Private Cloud (VPC)
Bootstrap of cluster
Deploy Istio and validate its installation
Update our deployment process to deploy on both clusters, and redeploy all our applications deployed in the first deployment
Do the traffic switch

The cluster creation is relatively simple: a Pull Request in our Terraform repo is enough. We have to reproduce the code of the first cluster in new files, change the naming of the resources, and that’s it! At the end of the upgrade, we will remove the files containing the code of the old cluster, and thanks to Terraform, it will remove all the no more needed resources.

It is important that an adequate naming convention is used for the Terraform code. Ensure that there is enough code separation in multiple files that ease this kind of operation. We had introduced the usage of Terraform modules since December 2021, but we exclude our staging and production for the moment. We prefer to have more insights before using modules in such critical environments.

For more information on the cluster bootstrap step can be found in this article: https://medium.com/thefork/our-kubernetes-journey-at-thefork-d0964ec275f3.

The reference to follow for Istio is its official documentation. This includes an update of our manifests, a format change of our Envoy Filter, and the validation of our application behaviors (such as checking real client IP, sidecar injection, etc.).

Deploying the applications on two clusters in parallel is an easy process for our deployment workflow. We rely on Jenkins and Helmfile to do our deployments, and the changes needed is just a new item in an array followed by the redeployment of all the applications already deployed in the initial cluster.

To migrate without downtime, we will use the flexibility of the Amazon Application Load Balancer (Amazon ALB) and Target Groups. As you probably saw in our AWS article https://medium.com/thefork/creating-our-piece-of-cloud-in-aws-fd4e30571682, we use an Amazon ALB in front of our Amazon EKS cluster in each environment. Behind it, we send the traffic to a target group composed of the ingress instances.

When we first create the cluster, the new ingress instances register to the same target group. Because Istio set up is afterward, the targets are not healthy (such as no health check responses are present) and so the new instances won’t receive any traffic. To avoid any issues with the setup of the new cluster, we also un-register these instances manually from the target group.

When the new cluster is ready, we register on new ingress instance in the target group. A percentage of the traffic will go to the new cluster and we can see if it functions appropriately. If it is the case, we un-register one old ingress. We continue one-by-one through the clusters until there are only the new ingress instances in the target group. For example, with three ingress per cluster the traffic grows in the new cluster like this: 25%, then 33%, then 50%, then 66%, then 75% and finally 100%.

We use the auto scaling feature to assess the instances. As the traffic arrives progressively in our new cluster, the Horizontal Pod Autoscaler (HPA) reacts correctly and scales our applications on the fly! (We have to make the switch progressively to wait for the detection threshold, which is set at five minutes.)

This is summarized by this illustration:

The illustration shows how the Application Load Balancer perform this operation

Dealing with statefulset

Unfortunately, we cannot deploy our statefulsets in both clusters at the same time to do the migration because of their backend storage (Amazon EBS). Indeed, an Amazon EBS volume can only be mounted on one instance, so we defined another strategy for such applications.

There is no magic formula here, because it depends on your application.

At TheFork, we have four main statefulset:

Elasticsearch: our logging stack
Prometheus: for the monitoring part
Rundeck: to schedule some jobs
Solr: used by our search engine

Each one has its own migration process. To avoid going into too much detail, we will only detail the workflow for our Elasticsearch stack.

The architecture of the Elasticsearch stack is the following:

Filebeat is in charge of collecting logs from all containers and sending them to our Kafka. We use a daemonset so that it is available on all our servers.
Our Kafka cluster is in charge of storing our logs and acts as a buffer (We are using Amazon MSK inside the Tools environment).
Logstash, with pods running in the Amazon EKS Tools cluster, takes the logs from Kafka, processes them (we only use simple filters and some grok configurations), and send them to Elasticsearch.
Elasticsearch cluster consists of 3 Master Nodes, 12 Data Nodes, and 4 Coordinate Nodes. Each node is a pod in the Amazon EKS Tools cluster. Our masters and data nodes have an Amazon EBS as a storage backend to store data.

The plan to migrate the cluster is:

1. Stop the logstash instances to stop indexing logs in our Elasticsearch cluster. We will continue to ingest our logs into Kafka, so once logstash is up, it will catch up on its lag. Stopping the indexation will also ease our Elasticsearch rollout.

kubectl scale statefulset logstash-kubernetes -n logstash --replicas=0

2. Important step: Disable shard allocation to ensure that the cluster will not try to rebalance shard during the restart. The first thing to do is to launch a port-forward on one of the master nodes, to be able to join the cluster directly from localhost:

kubectl port-forward pod/elastic-master-1  -n elasticsearch  9200:9200

We disable the shard allocation:

curl -XPUT -H "Content-Type: application/json" "http://127.0.0.1:9200/_cluster/settings?pretty" -d '{"persistent": {"cluster.routing.allocation.enable": "primaries"}}';

curl -XPOST -H "Content-Type: application/json" "http://127.0.0.1:9200/_flush/synced";

3. Manually scale down our instances using “kubectl scale” command in the “previous cluster” and set our autoscaling group to 0 once the cluster is fully down. It will also release the Amazon EBS so we can start the Elasticsearch on the new cluster.

4. Deploy our Helm release in the new cluster.

You can check the status by using:

curl -H "Content-Type: application/json" 

"http://127.0.0.1:9200/_cluster/health";

Thanks to «shard allocation deactivation» the data nodes find all the indexes in the Amazon EBS and become “Yellow.” As soon as every node is stable in the number of a shard, we can start the shard allocation:

curl -XPUT -H "Content-Type: application/json" "http://127.0.0.1:9200/_cluster/settings?pretty" -d '{"persistent": {"cluster.routing.allocation.enable": null}}';

The cluster will be ready as soon as the state goes to “Green.”

5. Start the logstash in the new cluster to resume the logs indexation. Depending on the instance used, the bandwidth, and the number of logs, it can take some time for them and the Elasticsearch cluster to catch up with the delay. We have access to our application logs using kubectl and New Relic to help us investigate if an operational event occurred during our migration.

Conclusion

At the beginning of this upgrade journey, we thought that carrying out such an operation would be a tall mountain to climb. Being able to upgrade seven clusters from an older version to the current version with the daily issues to troubleshoot, appeared to be a very ambitious task. However, thanks to Sebastien Allamand’s advice and the team’s planning and preparatory efforts, we applied a consistent strategy to efficiently upgrade cluster by cluster, without any downtime for our customers.

To perform this kind of upgrade, we needed adequate preparation, time, and tremendous patience. These efforts resulted in a significant cost compared to a simple Kubernetes upgrade, with the latter requiring relatively minimal effort in comparison (such as, only requiring a restart of the instances). The initial upgrade in December 2020 was accomplished by us in approximately three months. Our latest upgrade occurred at the beginning of 2022 and was accomplished in less than two months. The documentation written during the first blue-green deployment upgrade was very helpful in these efforts!

Implementations of this magnitude and complexity confirmed that our technological choices were accurate. We recreated a complete infrastructure from scratch very quickly and anticipate that the next versions won’t have any breaking changes that complicate the upgrade process. We expect to safely do a rolling update that should last less than one month.

Containers