Containers

How to upgrade Amazon EKS worker nodes with Karpenter Drift

Introduction

Karpenter is an open-source cluster autoscaler that provisions right-sized nodes in response to unschedulable pods based on aggregated CPU, memory, volume requests, and other Kubernetes scheduling constraints (e.g., affinities and pod topology spread constraints), which simplifies infrastructure management. When using Cluster Autoscaler as an alternative autoscaler, all Kubernetes nodes in a node group must have the same capacity (vCPU and memory) for autoscaling to work effectively. This results in customers having many node groups of different instance sizes, each backed by an Amazon EC2 Auto Scaling group, to meet the requirements of their workload. As a workload continually evolves overtime, the changing resource requirements mean picking the right-sized Amazon Elastic Compute Cloud (Amazon EC2) instances can be challenging. In addition, as Karpenter doesn’t orchestrate capacity management with external infrastructure like node groups and Amazon EC2 auto scaling groups, it introduces a different perspective to operational processes to keep worker node components and operating systems up to date with the latest security patches and features.

In this post, we’ll describe the mechanism for patching Kubernetes worker nodes provisioned with Karpenter through a gated Karpenter feature called Drift. If you have many worker nodes across multiple Amazon EKS clusters, then this mechanism can help you continuously patch at scale.

Solution overview

Karpenter node patching mechanisms

When Amazon EKS supports a new Kubernetes version, you can upgrade your Amazon Elastic Kubernetes Service (Amazon EKS) cluster control plane to the next version with a single API call. Upgrading the Kubernetes data plane involves updating the Amazon Machine Image (AMI) for the Kubernetes worker nodes. AWS releases AMIs for new Kubernetes versions as well as patches and CVEs(Common Vulnerabilities and Exposures). You can choose from a wide variety of Amazon EKS-optimized AMIs. Alternatively, you can also use your own custom AMIs. Currently, Karpenter in the AWSNodeTemplate resource supports amiFamily values AL2, Bottlerocket, Ubuntu, Windows2019, Windows2022 and Custom. When an amiFamily of Custom is chosen, then an amiSelector must be specified that informs Karpenter on which custom AMIs are to be used. If no amiFamily is defined, then Karpenter sets the default amiFamily to AL2,and uses the Amazon EKS-optimized Linux AMI.

Karpenter uses Drift to upgrade Kubernetes nodes and upgrades the nodes following a rolling deployment. As nodes are de-provisioned, nodes are cordoned to prevent new pods scheduling and pods are evicted using the Kubernetes Eviction API. The Drift mechanism is as follows:

Drift

For Kubernetes nodes provisioned with Karpenter that have drifted from their desired specification, Karpenter provisions new nodes first, evicts pods from the old nodes, and then terminates. At the time of writing this post, the Drift interval is set to 5 minutes. However, if the Provisioner or AWSNodeTemplate is updated, then the Drift check is triggered immediately. Drift for AMIs has two behaviors: one when an AMI is provided by a user and one without.

Drift with specified AMI values

You may consider this approach to control the promotion of AMIs through application environments for consistency. If you change the AMI(s) in the AWSNodeTemplate for a provisioner or associate a different node template with the provisioner, Karpenter detects that the existing worker nodes have drifted from the desired setting.

To trigger the upgrade, associate the new AMI(s) in the node template and Karpenter upgrades the worker nodes via a rolling deployment. AMIs can be specified explicitly by AMI ID, AMI names, or even specific tags. If multiple AMIs satisfy the criteria, then the latest AMI is chosen. Customers can track which AMIs are discovered by the AWSNodeTemplate from the amis value(s) under status field in AWSNodeTemplate. One way of getting the status would be by running kubectl describe on the AWSNodeTemplate. In certain scenarios, if both the old and the new AMIs are discovered by the AWSNodeTemplate, then the running nodes with old AMIs won’t be drifted, though the new nodes are provisioned using the new AMI. To learn more about selecting AMIs in the node template, refer here.

apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
  name: amiwithid
spec:
  amiSelector:
    aws::ids: "ami-123"

Example 1 – Select AMIs by IDs

apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
  name: amiwithname
spec:
  amiSelector:
    aws::name: appA-ami
    aws::owners: ownerAccountID

Example 2 – Select AMIs where Name tag has the value appA-ami, in the application account ownerAccountID

Drift with no AMI specified

If there is no amiSelector specified in the AWSNodeTemplate, then Karpenter monitors the SSM parameters published for the Amazon EKS-optimized AMIs. You can either specify an amiFamily (e.g., AL2, Bottlerocket, Ubuntu, etc.) for Karpenter to consider a specific AMI family, or leave it blank to default to AL2 AMI (i.e., Amazon EKS-optimized Amazon Linux AMI). Karpenter detects when a new AMI is released for the version of the Kubernetes cluster and drifts the existing nodes. AWSNodeTemplate amis value under status field reflect the newly discovered AMI. Those nodes are de-provisioned and replaced with worker nodes with the latest AMI. With this approach, the nodes with older AMIs are recycled automatically (e.g., when there is a new AMI available or after a Kubernetes control plane upgrade). With the previous approach of using amiSelector, you have more control when the nodes are upgraded. Consider the difference and select the approach suitable for your application. Karpenter currently doesn’t support custom SSM parameters.

Walkthrough

We’ll walk through the following scenarios:

  1. Enabling the Karpenter Drift feature gate
  2. Automation of node upgrade with Drift
  3. Node upgrade with controlling promotion of AMIs

Prerequisites

You’ll need the following to complete the steps in this post:

  1. An existing Amazon EKS cluster. If you don’t have one, please follow any one method described here to create a cluster.
  2. An existing latest Karpenter deployment. Please follow the getting started with Karpenter guide listed here to install Karpenter.

We’ll first export the Amazon EKS cluster name to proceed the walkthrough.

export CLUSTER_NAME=<your EKS cluster name>

Step 1. Enabling the Karpenter Drift feature gate

Drift is currently under feature gates. You can enable this in the karpenter-global-settings ConfigMap.

$ kubectl edit configmap -n karpenter karpenter-global-settings

Let’s find featureGates.driftEnabled and change the value from false to true.

To apply the changed karpenter-global-settings ConfigMap configuration, you need to restart the Karpenter deployment:

$ kubectl rollout restart deploy karpenter -n karpenter

Step 2. Automate the worker node upgrade with Drift

First, we’ll create a default Karpenter provisioner with a default AWSNodeTemplate with no amiSelector and amifamily specified. When amiSelector isn’t specified, Karpenter matches the worker node AMI version to the Amazon EKS Kubernetes control plane version. As described in the overview section, when amiFamily isn’t selected, the Amazon EKS Optimized Amazon Linux AMI is used.

mkdir -p ~/environment/karpenter
cd ~/environment/karpenter

cat <<EoF> basic.yaml
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  providerRef:
    name: default
    
  labels:
    team: my-team

  requirements:
    - key: "karpenter.k8s.aws/instance-category"
      operator: In
      values: ["c", "m", "r"]
    - key: "karpenter.k8s.aws/instance-generation"
      operator: Gt
      values: ["5"]

  limits:
    resources:
      cpu: "1000"
    
  ttlSecondsAfterEmpty: 60  
    
---
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
  name: default
spec:
  subnetSelector:
    alpha.eksctl.io/cluster-name: $CLUSTER_NAME
  securityGroupSelector:
    alpha.eksctl.io/cluster-name: $CLUSTER_NAME
  tags:
    managed-by: "karpenter"
    intent: "apps"
EoF

kubectl apply -f basic.yaml

Note: Select your own subnets and security groups if your Amazon EKS cluster isn’t provisioned by eksctl. Refer to this page for more details in discovering subnets and security groups with Karpenter AWSNodeTemplate.

Let’s deploy a sample deployment, named inflate to scale the worker nodes:

cd ~/environment/karpenter

cat <<EoF> sample-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 2
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
          resources:
            requests:
              cpu: 1
              memory: 128Mi
            limits:
              memory: 128Mi
      nodeSelector:
        team: my-team
EoF

kubectl apply -f sample-deploy.yaml

You can check the Karpenter logs to see that Karpenter found unscheduable (i.e., provisionable) pods, selected a number of Amazon EC2 instances, and passed to the Amazon EC2 Fleet API to create a new worker node:

$ kubectl -n karpenter logs -l app.kubernetes.io/name=karpenter

2023-08-03T00:25:19.114Z        INFO    controller.provisioner  found provisionable pod(s)      {"commit": "34d50bf-dirty", "pods": 2}
2023-08-03T00:25:19.114Z        INFO    controller.provisioner  computed new machine(s) to fit pod(s)   {"commit": "34d50bf-dirty", "machines": 1, "pods": 2}
2023-08-03T00:25:19.130Z        INFO    controller.provisioner  created machine {"commit": "34d50bf-dirty", "provisioner": "default", "requests": {"cpu":"2125m","pods":"4"}, "instance-types": "c6a.2xlarge, c6a.xlarge, c6i.2xlarge, c6i.xlarge, c6id.2xlarge and 27 other(s)"}
2023-08-03T00:25:19.286Z        DEBUG   controller.machine.lifecycle    created launch template {"commit": "34d50bf-dirty", "machine": "default-v45qb", "provisioner": "default", "launch-template-name": "karpenter.k8s.aws/4979454979243700410", "id": "lt-080a99f7b8d64f581"}
2023-08-03T00:25:21.125Z        INFO    controller.machine.lifecycle    launched machine        {"commit": "34d50bf-dirty", "machine": "default-v45qb", "provisioner": "default", "provider-id": "aws:///us-west-2a/i-071302eeaa77d43f6", "instance-type": "c6a.xlarge", "zone": "us-west-2a", "capacity-type": "on-demand", "allocatable": {"cpu":"3920m","ephemeral-storage":"17Gi","memory":"6584Mi","pods":"58"}}
2023-08-03T00:25:41.044Z        DEBUG   controller.machine.lifecycle    registered machine      {"commit": "34d50bf-dirty", "machine": "default-v45qb", "provisioner": "default", "provider-id": "aws:///us-west-2a/i-071302eeaa77d43f6", "node": "ip-192-168-27-17.us-west-2.compute.internal"}
2023-08-03T00:25:56.341Z        DEBUG   controller.machine.lifecycle    initialized machine     {"commit": "34d50bf-dirty", "machine": "default-v45qb", "provisioner": "default", "provider-id": "aws:///us-west-2a/i-071302eeaa77d43f6", "node": "ip-192-168-27-17.us-west-2.compute.internal"}

Next, check the AMI version of a newly deployed node. In this demonstration environment, an AMI version is v1.24:

$ kubectl get nodes -l team=my-team

NAME                                          STATUS   ROLES    AGE     VERSION
ip-192-168-27-17.us-west-2.compute.internal   Ready    <none>   4m41s   v1.24.15-eks-a5565ad

Now let’s check the Amazon EKS control plane version. We’re assuming the control plane version is equivalent to the node version:

$ kubectl version --short

We’ll now upgrade the Amazon EKS control plane and validate if the worker node(s) are automatically updated to the new version that matches the control plane version. You can use your own preferred way to upgrade it but we’ll use AWS Command Line Interface (AWS CLI) as an example here. Replace the region-code with your own. Replace 1.25 with the Amazon EKS-supported version number that you want to upgrade your cluster to. For best practices on Amazon EKS cluster upgrades see the clusters upgrade section of the Amazon EKS best practices guide.

$ aws eks update-cluster-version --region <region-code> --name $CLUSTER_NAME --kubernetes-version 1.25

Monitor the status of your cluster update with the following command. Use the update ID that the previous command returned and replace the <update-id> with that value in the following command. When a Successful status is displayed, the upgrade is complete.

$ aws eks describe-update --region <region-code> --name $CLUSTER_NAME --update-id <update-id>

After the status changes to Active, let’s check the Karpenter logs. You can check that Karpenter detected a drift and start deprovisioning node via drift and replaces with a new node.

kubectl -n karpenter logs -l app.kubernetes.io/name=karpenter | grep -i drift
 
2023-08-03T01:20:15.725Z        DEBUG   controller.machine.disruption   marking machine as drifted      {"commit": "34d50bf-dirty", "machine": "default-v45qb"}
2023-08-03T01:20:17.052Z        INFO    controller.deprovisioning       deprovisioning via drift replace, terminating 1 machines ip-192-168-27-17.us-west-2.compute.internal/c6a.xlarge/on-demand and replacing with on-demand machine from types m6a.2xlarge, r6a.2xlarge, r6id.2xlarge, c6in.xlarge, r6idn.2xlarge and 23 other(s)      {"commit": "34d50bf-dirty"}

Let’s check the AMI version of the node:

$ kubectl get nodes -l team=my-team

You’ll see a v1.24 node status is Ready, SchedulingDisabled and a newly deployed v1.25 node is NotReady yet.

$ kubectl get nodes -l team=my-team

NAME STATUS ROLES AGE VERSION
ip-192-168-27-17.us-west-2.compute.internal Ready,SchedulingDisabled <none> 55m v1.24.15-eks-a5565ad
ip-192-168-41-50.us-west-2.compute.internal NotReady <none> 13s v1.25.11-eks-a5565ad

After few seconds, you can run $ kubectl get nodes -l team=my-team again to check the new v.1.25 node is ready and the previous v1.24 node is terminated.

$ kubectl get nodes -l team=my-team 
NAME                                          STATUS   ROLES    AGE   VERSION
ip-192-168-41-50.us-west-2.compute.internal   Ready    <none>   82s   v1.25.11-eks-a5565ad

Note: The actual amount of time for node upgrade varies by the environment.

Step 3. Node upgrade with controlling promotion of AMIs

As we just saw, Karpenter Drift automatically upgrades the node AMI version when the Amazon EKS control plane is upgraded with an Amazon EKS-optimized Amazon Linux AMI by default when amiSelector and amiFamily aren’t set. However, there are use-cases (e.g., prompting AMIs through environments) that you want to have more controls on when to initiate the AMI update with a specific AMI. For that, if you specify the AMI in the amiSelector (under AWSNodeTemplate) nodes will only be updated when you explicitly change the AMI without following the control plane update.

For this example, we used Ubuntu, but you could consider a purpose built OS for running containers like Bottlerocket. You can retrieve the AMI id in the following link from Canonical.

Please note that the AMI id could vary depending on the Kubernetes version and the AWS Region being used.

cd ~/environment/karpenter

cat << EOF > ubuntu-nt.yaml
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
  name: ubuntu-nt
spec:
  amiFamily: Ubuntu
  amiSelector:
    aws::ids: ami-0e4d51e5c9f7336cd
  subnetSelector:
    alpha.eksctl.io/cluster-name: $CLUSTER_NAME
  securityGroupSelector:
    alpha.eksctl.io/cluster-name: $CLUSTER_NAME
  tags:
    managed-by: "karpenter"
    intent: "apps"
EOF

kubectl -f ubuntu-nt.yaml create

Note: Select your own subnets and security groups if your Amazon EKS cluster isn’t provisioned by eksctl. Refer to this page for more details in discovering subnets and security groups with Karpenter AWSNodeTemplate.

Now, let’s edit the default provisioner to use this newly created AWSNodeTemplate, ubuntu-nt.

$ kubectl edit provisioner default

Search providerRef under specifications and change the name value from default to ubuntu-nt:

....
spec:
  labels:
    team: my-team
  limits:
    resources:
      cpu: "10"
  providerRef:
    name: ubuntu-nt

Let’s check the Karpenter logs. You can check that Karpenter detected the drift and deprovisioning node via drift replace with a new node:

kubectl -n karpenter logs -l app.kubernetes.io/name=karpenter | grep -i drift

2023-08-03T04:15:43.499Z        DEBUG   controller.machine.disruption   marking machine as drifted      {"commit": "34d50bf-dirty", "machine": "default-5jdcs"}
2023-08-03T04:15:53.480Z        INFO    controller.deprovisioning       deprovisioning via drift replace, terminating 1 machines ip-192-168-41-50.us-west-2.compute.internal/c6a.xlarge/on-demand and replacing with on-demand machine from types m6a.2xlarge, r6id.xlarge, r6a.xlarge, r6id.2xlarge, r6i.xlarge and 27 other(s)    {"commit": "34d50bf-dirty"} 

Let’s check the AMI version of the node:

$ kubectl get nodes -l team=my-team

You’ll see an existing Amazon EKS-optimized Linux v1.25 AMI (v1.25.11-eks-a5565ad) status is Ready, SchedulingDisabled and a newly deployed ubuntu v.1.25 node (v1.25.10) is NotReady yet.

$ kubectl get nodes -l team=my-team

NAME                                          STATUS                     ROLES    AGE    VERSION
ip-192-168-41-50.us-west-2.compute.internal   Ready,SchedulingDisabled   <none>   176m   v1.25.11-eks-a5565ad
ip-192-168-7-119.us-west-2.compute.internal   NotReady                   <none>   4s     v1.25.10

After few seconds, you can now check the new Ubuntu v.1.25 node is ready and the previous Amazon EKS-optimized Linux AMI v1.25 node is terminated.

$ kubectl get nodes -l team=my-team

NAME                                          STATUS   ROLES    AGE   VERSION
ip-192-168-7-119.us-west-2.compute.internal   Ready    <none>   44s   v1.25.10 

When using Karpenter, there are some additional design considerations that can help you achieve continuous operations:

  • Use Pod Topology Spread Constraints to spread workloads across fault domains for high availability – Similar to pod anti-affinity rules, pod topology spread constraints allow you to make your application available across different failure (or topology) domains like hosts or availability zones.
  • Consider Pod Readiness Gates – For workloads that ingress via an Elastic Load Balancer (ELB) to validate whether workloads are successfully registered to target groups, consider using Pod readiness gates. See the Amazon EKS best practices guide for more information.
  • Consider Pod Disruptions Budgets – Use Pod disruption budgets to control the termination of pods during voluntary disruptions. Karpenter respects Pod disruption budgets (PDBs) by using a backoff retry eviction strategy.
  • Consider whether automatic AMI selection is the right approach – It is recommended to consider the latest and greatest Amazon EKS optimized AMIs; however, if you would like to control the roll out of AMIs across environments then think whether you’d let Karpenter pick the latest AMI or you’d specify your own AMI. By specifying your own AMI, you can control promotion of AMIs through application environments.
  • Consider setting sh/do-not-evict: “true”– For workloads that might not be interruptible (e.g., long running batch jobs without checkpointing), consider annotating pods with the do not evict annotation. By opting pods out of eviction, you are telling Karpenter that it shouldn’t voluntarily remove nodes containing this pod. However, if a do-not-evict pod is added to a node while the node is draining, then the remaining pods still evict. In that case, that pod blocks termination until it is removed. In either case, the node is cordoned to prevent additional work from scheduling.

Cleaning up

To clean up the resources created, you can execute the following steps:

  1. Delete the Karpenter provisioner to deprovision nodes, cleanup the node templates, and sample application:
    1. kubectl delete -f basic.yml
    2. kubectl delete -f ubuntu-nt.yaml
    3. kubectl delete -f sample-deploy.yaml
  2. If you created a new Amazon EKS cluster for the walkthrough, then don’t forget to clean up any resources or you incur costs.

Conclusion

For customers with many Kubernetes clusters and node groups, adopting Karpenter simplifies the infrastructure management. In this post, we described approaches on how to upgrade and patch Kubernetes nodes when using a Karpenter feature called Drift. These patching strategies can reduce your undifferentiated heavy lifting, which help you patch worker nodes at scale by moving from a point-in-time strategy to a continuous mechanism. The Karpenter Drift feature is still evolving and for the latest up to date information, checkout the Karpenter documentation.

Note: Karpenter is preparing to update to the v1beta1 API, described here. The Drift behavior itself won’t change with v1beta1 but there will be more capabilities with CRD name changes, which will be updated in this post accordingly after the first publish.

If you would like to learn more, then come and discuss Karpenter in the #karpenter channel in the Kubernetes slack or join the Karpenter working group calls.

To get hands-on information, then checkout the Karpenter workshop.

Rajdeep Saha

Rajdeep Saha

Rajdeep Saha is a Principal Solutions Architect for Serverless and Containers at Amazon Web Services (AWS). He helps customers design scalable and secure applications on AWS. Rajdeep is passionate about helping and teaching newcomers about cloud computing. He is based out of New York City.

Ratnopam Chakrabarti

Ratnopam Chakrabarti

Ratnopam Chakrabarti is a Specialist Solutions Architect for Containers and Infrastructure modernization at Amazon Web Services (AWS). In his current role, Ratnopam helps AWS customers accelerate their cloud adoption and run scalable, secure and optimized container workloads at scale. You can connect with him on LinkedIn athttps://www.linkedin.com/in/ratnopam-chakrabarti/.

Chance Lee

Chance Lee

Chance Lee is a Sr. Container Specialist Solutions Architect at AWS based in the Bay Area. He helps customers architect highly scalable and secure container workloads with AWS container services and various ecosystem solutions. Prior to joining AWS, Chance was an IBM Lab Services consultant.

Robert Northard

Robert Northard

Robert Northard is a Sr. Containers Specialist Solutions Architect at AWS. He has expertise in Container Technologies and DevOps practices.