Containers

Seamlessly migrate workloads from EKS self-managed node group to EKS-managed node groups

Amazon Elastic Kubernetes Service (Amazon EKS) managed service makes it easy to run Kubernetes on AWS without needing to install, operate, and maintain your own Kubernetes control plane. When Amazon EKS was made generally available in 2018, it supported self-managed node groups. With self-managed node groups, customers are responsible for configuring the Amazon Elastic Compute Cloud (Amazon EC2) instances, attaching these to the EKS cluster, and managing the lifecycle of these worker nodes. In 2019, support for managed node groups with EKS-optimized AMIs was announced, and with this, the provisioning and management of worker nodes is managed by EKS. To further simplify the use of managed node groups, support for custom AMIs with managed node groups was announced in 2020.

Many customers are currently running their workloads on EKS self-managed node groups, and they want to migrate their workloads to EKS managed node groups to further simplify their operations. In this blog post, I am going to discuss:

  • Benefits of using EKS managed node groups.
  • Zero-downtime migration from self-managed to managed node groups
  • Simplified EKS upgrade process with managed node groups.

Benefits of using EKS managed node groups

  1. Automates the provisioning and lifecycle management of EKS managed nodes. The managed nodes are always deployed with an Amazon EC2 Auto Scaling group and launch template. This provides the flexibility to use either Amazon EKS-optimized AMI or a custom AMI.
  2. Customers can create their own hardened AMIs based on their security requirements and then use Launch Templates to specify their custom AMIs in their managed node groups.
  3. If the Amazon EKS-optimized AMI is used with managed node groups, when an updated version of kubectl or any other critical components are available, it would first trigger an update of the EKS-optimized AMI. And once the updated AMI is available, EKS will alert the customer through the EKS console, and the customer can choose when to apply the update to their worker nodes.
  4. If Spot Instances are used with managed node groups, EKS handles the termination of Spot Instances gracefully. When Spot Instances receive a two-minute termination notice, EKS will reallocate the pods that are running to other active instances. This will avoid abrupt termination of pods when Spot Instances are terminated.

Zero-downtime migration from self-managed to managed node groups

One of the common architectures of EKS is one in which customers are running stateless workloads on EKS self-managed node groups and using the AWS Load Balancer Controller to expose the required applications through an Application Load Balancer (ALB). The following diagram depicts the snapshot of an EKS cluster with two EC2 instances in self-managed node groups.

EKS Cluster with two EC2 instances, along with the core components of the nodes

Figure 1: EKS Cluster with two EC2 instances, along with the core components of the nodes

Name Description Type
Application pod Containerized applications that are deployed on EKS Deployment
core-dns DNS service for EKS Deployment
aws-node Amazon VPC Container Network Interface (CNI) plugin DaemonSet
kube-proxy Enables network communication for the pods DaemonSet
ALB Controller Manages AWS Elastic Load Balancers for a Kubernetes cluster Deployment

Zero-downtime migration of the workload to managed node groups can be achieved using the following steps:

1 Create a managed node group. When it is created, the daemon sets (aws-node and kube-proxy) will be available by default.

2. Taint all the nodes in the existing self-managed node group with “NoSchedule.” This will instruct the EKS scheduler not to schedule new pods onto these nodes. The existing pods will continue to run and serve traffic.

Self-managed nodes will have the name of the node group as a label. For applying the taint for all the nodes that are part of the self-managed node group, you can use the following command:

kubectl taint node -l "alpha.eksctl.io/nodegroup-name"="<<SELF-MANAGED-NODE-GROUP-NAME>>" key=value:NoSchedule

3. Scale the application to increase the number of replicas so that the application pods will be scheduled on the new managed nodes. Since the self-managed nodes are tainted in the previous step, when the number of replicas is increased, the new pods will be deployed to the new managed nodes. This will ensure the application is running and serving the requests during the migration process. After the number of replicas is increased, the application will have its pods running in both self-managed and managed node groups and will be serving the requests from either of these. Kubernetes service will be load balancing the requests between all the pods.

The number of replicas to increase will depend on the application and the traffic pattern during the migration process. For production applications, it is recommended to increase the replicas by 100 percent during the migration process so that the users of the application are not impacted by a reduced number of pods serving the requests.

kubectl scale deployments/<<DEPLOYMENT-NAME>> --replicas=10

4. We now have to scale the deployments in “kube-system” so that they are available in managed nodes. The two deployments of interest in this setup are “core-dns” and “aws-load-balancer-controller.” In the initial setup, both these deployments have two replicas. Increasing the number of replicas will schedule these in managed nodes. Now, all the components that are required for the application to run successfully are scheduled on managed nodes.

kubectl scale deployments/aws-load-balancer-controller --replicas=4  -n kube-system

kubectl scale deployments/coredns --replicas=4 -n kube-system

5. Verify the new pods of the application, coredns and aws-load-balancer-controller, are running

kubectl get replicasets -A

6. Drain the self-managed nodes and remove the self-managed node groups.

kubectl drain -l "alpha.eksctl.io/nodegroup-name"="<<SELF-MANAGED-NODE-GROUP-NAME>>" --ignore-daemonsets --delete-emptydir-data  

7. Scale-in the deployments to the required capacity.

kubectl scale deployments/<<DEPLOYMENT-NAME>> --replicas=5

kubectl scale deployments/aws-load-balancer-controller --replicas=2  -n kube-system

kubectl scale deployments/coredns --replicas=2 -n kube-system

Simplified EKS upgrade process with managed node group

When you initialize the upgrade process for the managed node group, EKS will gracefully upgrade the nodes. If the managed node groups are using the EKS-optimized AMI, EKS automatically applies the operating system updates and security patches to your nodes.

The complete EKS upgrade is a two-step process:

  1. EKS cluster control plane components will be upgraded to the latest version.
  2. EKS worker node in the managed node group will be updated.

When a newer version of Kubernetes is available for your EKS cluster, you will be notified in the console. During the update process, EKS will launch new control plane nodes with the updated Kubernetes version. It also synchronizes, for example, “etcd” data to the new nodes. Once the new control plane components are launched, EKS performs readiness health checks on these nodes to verify they are working as expected. If any of these health checks fail, EKS reverts the update process, and the cluster remains on the previous Kubernetes version. The workloads that are running inside the EKS cluster will not be impacted by the upgrade process. However, newer Kubernetes versions might have significant changes. Based on this, the application manifest files might have to be updated to be compliant with the newer Kubernetes versions.

You can initiate the upgrade process in the console or through the CLI.

Open the EKS console home page and choose the cluster to upgrade:

Screenshot of UI homepage choosing cluster

Select the Update button.

Screenshot of UI showing updating to version 1.21

This will provide you the option to select the Kubernetes version to upgrade the cluster. Because EKS runs a highly scalable control plane, you can update only one minor version at a time. Note that the upgrade process cannot be reversed. Select Upgrade and the upgrade process will initiate.

From the CLI:

aws eks update-cluster-version --name <<CLUSTER-NAME>> --kubernetes-version 1.21

Once the upgrade process begins, the status of the EKS cluster will change from Active to Updating. Wait for the upgrade process to complete, and the status of the cluster will revert to Active.

The second step of the upgrade process is to upgrade the EKS managed node groups. After the EKS cluster control plane components have been updated, you will observe the update notification in the managed node groups.

The node groups can also be updated either from the console or from the command line.

When updating from the console, select Update, which will open a window as in the following image, then choose Update.

Screenshot showing updating from the console

When updating from the CLI, execute the following command:

aws eks update-nodegroup-version --cluster-name <<CLUSTER-NAME>>
--nodegroup-name <<MANAGED-NODE-GROUP-NAME>> --release-version 1.21 --force

When you choose the Rolling Update strategy, EKS upgrades the managed nodes by incrementally replacing the old nodes with the new ones that have the updated AMI. Amazon EKS attempts to drain the nodes gracefully and will fail if it is unable to do so. You can force the update (by using the –force flag) if Amazon EKS is unable to drain the nodes as a result of a pod disruption budget issue. During the update process, EKS will update the Auto Scaling group to use the latest launch template with the new AMI and increment the Auto Scaling group maximum size and desired size. Once the new managed nodes are spun up, the EKS pods from old nodes are evicted and scheduled on new nodes. The old managed nodes are deleted after all the pods running in these nodes are evicted. The pods will now be running on the new nodes that are part of the managed node group. While EKS manages the worker node upgrade process, you are responsible for testing the process to ensure availability of your applications during a rolling update.

Summary

As we have observed in this blog post, using managed node groups removes the undifferentiated heavy lifting of managing and updating the worker nodes. Managed node groups automate the provisioning and lifecycle management of worker nodes in an EKS cluster, which greatly simplifies operational activities such as new Kubernetes version deployments and rolling updates for new AMIs.

A GitHub repository that contains all the sample code for migrating to EKS managed node groups will be made available soon.