Managing Pod Scheduling Constraints and Groupless Node Upgrades with Karpenter in Amazon EKS

Feb 2024: This blog has been updated for Karpenter version v0.33.1 and v1beta1 specification.

About Karpenter

Karpenter is an open-source node lifecycle management project built for Kubernetes. It observes the aggregate resource requests of unschedulable pods and makes decisions to launch new nodes and terminate them to reduce scheduling latencies and infrastructure costs sending commands to the underlying cloud provider. Karpenter launches the nodes with minimal compute resources to fit the unschedulable pods for efficient bin-packing and it works in tandem with the Kubernetes scheduler to bind the unschedulable pods to the new nodes that are provisioned.

Diagram of pods using Karpenter to optimize capacity

Why Karpenter

Kubernetes users needed to dynamically adjust the compute capacity of their clusters to support applications using Amazon EC2 Auto Scaling groups and the Kubernetes Cluster Autoscaler before the launch of Karpenter. Some of the challenges with Cluster Autoscaler include significant deployment latency because many pods must wait for a node to scale up before they can be scheduled. Nodes can take multiple minutes to become available as Cluster Autoscaler does not bind pods to nodes and scheduling decisions are made by the kube-scheduler which results in longer wait for the Nodes to become available and it can increase pod scheduling latency for critical workloads.

One of the main objectives of Karpenter is to simplify the management of capacity. If you are familiar with other Auto Scalers, you will notice Karpenter takes a different approach referred as group-less auto scaling. Traditionally we have used the concept of a node group as the element of control that defines the characteristics of the capacity provided (i.e: On-Demand, EC2 Spot, GPU Nodes, etc) and that controls the desired scale of the group in the cluster. In AWS, the implementation of a node group matches with Auto Scaling groups. Over time, clusters using this paradigm, that run different type of applications requiring different capacity types, end up with a complex configuration and operational model where node groups must be defined and provided in advance.

Configuring Nodepools

Karpenter’s job is to add nodes to handle unschedulable pods (pods with the status condition Unschedulable=True set by the kube-scheduler), schedule pods on those nodes, and remove the nodes when they are not needed. To configure Karpenter, you create nodepools that define how Karpenter manages unschedulable pods and expires nodes.

NodePool sets constraints on the nodes that can be created by Karpenter and the pods that can run on those nodes.

Additionally, it also allows the pods to request nodes based on instance types, architectures, OS or other attributes by adding specifications to Kubernetes pod deployments, so that the Pod scheduling constraints like Resource requests, Node selection, Node affinity, Topology spread fall within nodepool constraints for the Pods to get deployed on the Karpenter provisioned Nodes if not then the pods will not deploy.

In many scenarios a single nodepool can satisfy all the requirements and can use the Scheduling Constraints with nodepool and pods by that it helps in achieving the use case of different teams having different constraints for running their workloads (such as one team can use only nodes in specific AZ and other teams can use Arm64 hardware nodes) , for billing purposes, having different de-provisioning requirements, etc.

Use cases for Nodepool Constraints

With Karpenter layered constraints, you can be sure that the precise type and amount of resources needed are available to your pods.

However, for specific requirement of choosing an instance type or availability zones etc we can tighten the constraints defined in a nodepool by defining additional scheduling constraints in the pod spec.

Below are some of use cases for using nodepool scheduling constraints or use of specific requirements in the nodepool and binding the unschedulable pods to Nodes via Karpenter.

  1. Needing to run in specific instance type on zones where dependent applications or storage are available
  2. Requiring certain kinds of processors or other hardware

Upgrading nodes

A straight-forward way to upgrade nodes is to set spec.disruption.expireAfter. Nodes will be terminated after a set period of time and will be replaced with newer nodes. The recommended method to patch your Kubernetes worker nodes is using Drift, please refer the Blog on How to upgrade Amazon EKS worker nodes with Karpenter Drift . Also, you can read on Karpenter Disruption for more details.


In this section, you will provision an EKS cluster, deploy Karpenter, deploy a sample application, and demonstrate Node scaling with Karpenter and process of deploying constraints with Pods in line to requirements of nodepool for different application workloads or different teams needing different instance capacity for their application.



Karpenter Deployment Tasks

1) Set the following environment variables:

export KARPENTER_NAMESPACE=kube-system
export KARPENTER_VERSION=v0.33.1
export K8S_VERSION=1.27

export AWS_PARTITION="aws" # if you are not using standard partitions, you may need to configure to aws-cn / aws-us-gov
export CLUSTER_NAME="karpenter-demo"
export AWS_DEFAULT_REGION="us-west-2"
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"
export TEMPOUT=$(mktemp)


2) Create an Amazon EKS Cluster and IAM Role for KarpenterController

  • Create a cluster with eksctl. This example configuration file specifies a basic cluster with one initial node and sets up an IAM OIDC provider for the cluster to enable IAM roles for pods
curl -fsSL"${KARPENTER_VERSION}"/website/content/en/preview/getting-started/getting-started-with-karpenter/cloudformation.yaml  > $TEMPOUT \
&& aws cloudformation deploy \
  --stack-name "Karpenter-${CLUSTER_NAME}" \
  --template-file "${TEMPOUT}" \
  --capabilities CAPABILITY_NAMED_IAM \
  --parameter-overrides "ClusterName=${CLUSTER_NAME}"

eksctl create cluster -f - <<EOF
kind: ClusterConfig
  name: ${CLUSTER_NAME}
  version: "${K8S_VERSION}"
  tags: ${CLUSTER_NAME}

  withOIDC: true
   - namespace: "${KARPENTER_NAMESPACE}"
     serviceAccountName: karpenter
     roleName: ${CLUSTER_NAME}-karpenter
     - arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:policy/KarpenterControllerPolicy-${CLUSTER_NAME}

## Optionally run on fargate or on k8s 1.23
# Pod Identity is not available on fargate  
# iam:
#   withOIDC: true
#   serviceAccounts:
#   - metadata:
#       name: karpenter
#       namespace: "${KARPENTER_NAMESPACE}"
#     roleName: ${CLUSTER_NAME}-karpenter
#     attachPolicyARNs:
#     - arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:policy/KarpenterControllerPolicy-${CLUSTER_NAME}
#     roleOnly: true

- arn: "arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}"
  username: system:node:{{EC2PrivateDNSName}}
  - system:bootstrappers
  - system:nodes
  ## If you intend to run Windows workloads, the kube-proxy group should be specified.
  # For more information, see
  # - eks:kube-proxy-windows

- instanceType: m5.large
  amiFamily: AmazonLinux2
  name: ${CLUSTER_NAME}-ng
  desiredCapacity: 2
  minSize: 1
  maxSize: 10

 - name: eks-pod-identity-agent

## Optionally run on fargate
# fargateProfiles:
# - name: karpenter
#  selectors:
#  - namespace: "${KARPENTER_NAMESPACE}"

export CLUSTER_ENDPOINT="$(aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.endpoint" --output text)"

  1. Install Karpenter Helm Chart
# Logout of helm registry to perform an unauthenticated pull against the public ECR
helm registry logout

helm upgrade --install karpenter oci:// --version "${KARPENTER_VERSION}" --namespace "${KARPENTER_NAMESPACE}" --create-namespace \
  --set "settings.clusterName=${CLUSTER_NAME}" \
  --set "settings.interruptionQueue=${CLUSTER_NAME}" \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi \

Deploy the nodepool and application pods with layered constraints

Deploy the below Karpenter nodepool spec that has the following requirements:

  1. Architecture type (arm64 & amd64)
  2. Capacity type (Spot & On-demand)
cat <<EOF | envsubst | kubectl apply -f -
kind: NodePool
  name: default
        - key:
          operator: In
          values: ["amd64", "arm64"]
        - key:
          operator: In
          values: ["linux"]
        - key:
          operator: In
          values: ["spot", "on-demand"]
        name: default
    cpu: 1000
    consolidationPolicy: WhenUnderutilized
    expireAfter: 720h # 30 * 24h = 720h
kind: EC2NodeClass
  name: default
  amiFamily: AL2 # Amazon Linux 2
  role: "KarpenterNodeRole-${CLUSTER_NAME}" # replace with your cluster name
    - tags: "${CLUSTER_NAME}" # replace with your cluster name
    - tags: "${CLUSTER_NAME}" # replace with your cluster name

Run the application deployment on a specific capacity, instance type, hardware and availability zone using Pod scheduling constraints

  • Below sample deployment defines the nodeSelector with for choosing a specific Availability zone, on-demand arm64 instance with & arm64 and specific instance type so that new Nodes can be launched by Karpenter using the below Pod scheduling constraints.
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
  name: inflate
  replicas: 0
      app: inflate
        app: inflate
      nodeSelector: r6gd.xlarge on-demand us-west-2a arm64
      terminationGracePeriodSeconds: 0
        - name: inflate
              cpu: 1
  • Scale the above deployment to see the Node scaling via the Karpenter and it would choose the above configuration from the EC2 fleet via the createFleet API for the application pods.
kubectl scale deployment inflate --replicas 3
  • Review the Karpenter pod logs for events and more details.
kubectl logs -f -n kube-system -l -c controller
  • Example snippet of the logs.
eksadmin:~/environment $ kubectl logs -f -n karpenter -l*=*karpenter -c controller
{"level":"INFO","time":"2024-01-25T07:20:01.654Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"a70b39e","pods":"default/inflate-79c97d78f9-84hsw, default/inflate-79c97d78f9-fjhz6, default/inflate-79c97d78f9-jt4hj","duration":"12.0951ms"}
{"level":"INFO","time":"2024-01-25T07:20:01.654Z","logger":"controller.provisioner","message":"computed new nodeclaim(s) to fit pod(s)","commit":"a70b39e","nodeclaims":1,"pods":3}
{"level":"INFO","time":"2024-01-25T07:20:01.668Z","logger":"controller.provisioner","message":"created nodeclaim","commit":"a70b39e","nodepool":"default","nodeclaim":"default-xqjhj","requests":{"cpu":"3150m","pods":"6"},"instance-types":"r6gd.xlarge"}
{"level":"INFO","time":"2024-01-25T07:20:04.328Z","logger":"controller.nodeclaim.lifecycle","message":"launched nodeclaim","commit":"a70b39e","nodeclaim":"default-xqjhj","provider-id":"aws:///us-west-2a/i-04a71190d4888e0e3","instance-type":"r6gd.xlarge","zone":"us-west-2a","capacity-type":"on-demand","allocatable":{"cpu":"3920m","ephemeral-storage":"17Gi","memory":"29258Mi","pods":"58","":"18"}}
{"level":"INFO","time":"2024-01-25T07:20:39.532Z","logger":"controller.nodeclaim.lifecycle","message":"initialized nodeclaim","commit":"a70b39e","nodeclaim":"default-xqjhj","provider-id":"aws:///us-west-2a/i-04a71190d4888e0e3","node":""}

Validate the application pods with below command and the same would be in Running state

kubectl get node -L,,

kubectl get pods -o wide
  • Example snippet of the Node output and Pods output.
eksadmin:~/environment $ kubectl get node -L,,
NAME                                           STATUS   ROLES    AGE    VERSION               INSTANCE-TYPE   ARCH    CAPACITY-TYPE   Ready    <none>   97s    v1.27.9-eks-5e0fdde   r6gd.xlarge     arm64   on-demand   Ready    <none>   102m   v1.27.9-eks-5e0fdde   m5.large        amd64   Ready    <none>   102m   v1.27.9-eks-5e0fdde   m5.large        amd64   
eksadmin:~/environment $ 
eksadmin:~/environment $ kubectl get pods -o wide
NAME                       READY   STATUS    RESTARTS   AGE     IP                NODE                                           NOMINATED NODE   READINESS GATES
inflate-79c97d78f9-84hsw   1/1     Running   0          2m15s   <none>           <none>
inflate-79c97d78f9-fjhz6   1/1     Running   0          2m15s   <none>           <none>
inflate-79c97d78f9-jt4hj   1/1     Running   0          2m15s   <none>           <none>
eksadmin:~/environment $

From the above demonstration we can see that Karpenter's ability to apply layered constraints that was used to launch nodes that satisfied Multiple scheduling constraints of a workload, like instance type, specific AZ and hardware architecture via Karpenter.

Group less Node upgrades

As mentioned in earlier section, when using the nodegroups (Self-managed or Managed) with EKS Cluster and as part of upgrade the Worker nodes to a newer version of Kubernetes, we would have to rely on either migrating to new nodegroup for Self-managed or launching a new Autoscaling group of Worker nodes for Managed nodegroup as mentioned in Managed nodegroup upgrade behaviour . Whereas, with Karpenter group less autoscaling the upgrade of nodes works with the Drift value.

Drift handles changes to the NodePool/EC2NodeClass. For Drift, values in the NodePool/EC2NodeClass are reflected in the NodeClaimTemplateSpec/EC2NodeClassSpec in the same way that they’re set. Karpenter uses Drift to upgrade Kubernetes nodes and upgrades the nodes rolling deployment. With Karpenter version v0.33.x Drfit feature gates is enabled by default and upgrade of nodes would be respect the Drift.

Note: Karpenter supports using custom AMI and you can specify amiSelectorTerms with EC2NodeClass, this will fully override the default AMIs that are selected on by your EC2NodeClass amiFamily

  • Validate the current EKS Cluster Kubernetes version with below command.
aws eks describe-cluster --name ${CLUSTER_NAME}  | grep -i version
  • Example snippet of the above command.
eksadmin:~/environment $ aws eks describe-cluster --name ${CLUSTER_NAME}  | grep -i version
        "version": "1.27",
        "platformVersion": "eks.11",
            "": "0.167.0",
eksadmin:~/environment $ 
  • Deploy PodDisruptionBudget for your Application deployment. PodDisruptionBudget (PDB) limits the number of Pods of a replicated application that are down simultaneously from voluntary disruptions.
cat <<EOF | kubectl apply -f -
apiVersion: policy/v1
kind: PodDisruptionBudget
  name: inflate-pdb
  minAvailable: 2
      app: inflate

Note : With PDB you can set minAvailable or maxUnavailable as integers or as a percentage. Please refer Kubernetes documentation about Poddisruptions, and how to configure them for more details.

Example snippet of above PDB and sample application deployment that was configured in earlier section

eksadmin:~/environment $ kubectl get pdb
inflate-pdb   2               N/A               1                     9s
eksadmin:~/environment $ 
eksadmin:~/environment $ kubectl get deploy inflate
inflate   3/3     3            3           12m
eksadmin:~/environment $ 
  • Upgrade the EKS Cluster to newer Kubernetes version via console or eksctl as mentioned in EKS documentation
    • We can see that cluster got upgraded successfully to 1.21.
eksadmin:~/environment $ aws eks describe-cluster --name ${CLUSTER_NAME}  | grep -i version
        "version": "1.28",
        "platformVersion": "eks.7",
            "": "0.167.0",
eksadmin:~/environment $ 
  • Validate the application pods with below commands and we can see that Karpenter Launched Nodes are upgraded to 28same as that of EKS Cluster Kubernetes version.
kubectl get node -L,,

kubectl get pods -o wide

kubectl get 

Checking our workload and Node drifted by Karpenter and we can see that new Nodes are of version 1.28 as the Karpenter used the latest version of the EKS optimized AMI based on the new EKS Cluster version i.e 1.28. We can observe the Drift events from the Karpenter controller logs

eksadmin:~/environment $ kubectl get node -L,,                                                                                                                                           
NAME                                           STATUS   ROLES    AGE    VERSION               INSTANCE-TYPE   ARCH    CAPACITY-TYPE   Ready    <none>   159m   v1.27.9-eks-5e0fdde   m5.large        amd64   Ready    <none>   159m   v1.27.9-eks-5e0fdde   m5.large        amd64   Ready    <none>   44m    v1.28.5-eks-5e0fdde   r6gd.xlarge     arm64   on-demand
eksadmin:~/environment $ 
eksadmin:~/environment $ kubectl get pods -o wide
NAME                       READY   STATUS    RESTARTS   AGE    IP               NODE                                           NOMINATED NODE   READINESS GATES
inflate-79c97d78f9-6csvw   1/1     Running   0          124m   <none>           <none>
inflate-79c97d78f9-9pjvc   1/1     Running   0          124m   <none>           <none>
inflate-79c97d78f9-w9fg2   1/1     Running   0          124m   <none>           <none>
eksadmin:~/environment $ 
eksadmin:~/environment $ kubectl get                                                                                                                                                                                                    
NAME            TYPE          ZONE         NODE                                           READY   AGE
default-lmxcb   r6gd.xlarge   us-west-2a   True    47m
eksadmin:~/environment $ 
  • Review the Karpenter controller pod logs for events and more details.
kubectl logs -f -n kube-system -l -c controller
  • Example snippet of the logs.
{"level":"INFO","time":"2024-01-25T07:34:23.935Z","logger":"controller.disruption","message":"disrupting via drift replace, terminating 1 candidates and replacing with on-demand node from types r6gd.xlarge","commit":"a70b39e"}
{"level":"INFO","time":"2024-01-25T07:34:24.003Z","logger":"controller.disruption","message":"created nodeclaim","commit":"a70b39e","nodepool":"default","nodeclaim":"default-lmxcb","requests":{"cpu":"3150m","pods":"6"},"instance-types":"r6gd.xlarge"}
{"level":"INFO","time":"2024-01-25T07:34:25.893Z","logger":"controller.nodeclaim.lifecycle","message":"launched nodeclaim","commit":"a70b39e","nodeclaim":"default-lmxcb","provider-id":"aws:///us-west-2a/i-07c00e23630c2bf58","instance-type":"r6gd.xlarge","zone":"us-west-2a","capacity-type":"on-demand","allocatable":{"cpu":"3920m","ephemeral-storage":"17Gi","memory":"29258Mi","pods":"58","":"18"}}
{"level":"INFO","time":"2024-01-25T07:35:00.487Z","logger":"controller.nodeclaim.lifecycle","message":"initialized nodeclaim","commit":"a70b39e","nodeclaim":"default-lmxcb","provider-id":"aws:///us-west-2a/i-07c00e23630c2bf58","node":""}
{"level":"INFO","time":"2024-01-25T07:35:10.361Z","logger":"controller.node.termination","message":"tainted node","commit":"a70b39e","node":""}
{"level":"INFO","time":"2024-01-25T07:35:20.387Z","logger":"controller.node.termination","message":"deleted node","commit":"a70b39e","node":""}
{"level":"INFO","time":"2024-01-25T07:35:20.744Z","logger":"controller.nodeclaim.termination","message":"deleted nodeclaim","commit":"a70b39e","nodeclaim":"default-xqjhj","node":"","provider-id":"aws:///us-west-2a/i-04a71190d4888e0e3"}

Note : In the above logs, we can see that Karpenter Drifted the Node to the latest version of the EKS optimized AMI for 1.28 and launched a New node for the workload. Later old Node was Cordoned, Drained and Deleted by Karpenter.

From the above demonstration we can see that Karpenter respected the PDB and its ability to apply Node Disruption Drift workflow for Upgrading of Nodes launched by Karpenter for a group-less management of worker nodes for Upgrades.

In general, you can configure Karpenter to disrupt Nodes through your NodePool in multiple ways by using spec.disruption.consolidationPolicy, spec.disruption.consolidateAfter or spec.disruption.expireAfter . You can use node expiry to periodically recycle nodes due to security concerns and then Drift to upgrade the nodes. Please refer to Karpenter Disruption for more details.


Delete all the nodepools (CRDs) that was created.

kubectl delete nodepool default

Remove Karpenter and delete the infrastructure from your AWS account.

helm uninstall karpenter --namespace kube-system

eksctl delete cluster --name ${CLUSTER_NAME}


In this blog, we demonstrated how the nodes can be scaled with different options for each use case using nodepool by leveraging the well known Kubernetes labels and taints and using the Pod scheduling constraints within the deployment so that Pods get deployed on the Karpenter provisioned Nodes. This demonstrates that we can run different types workloads on different capacity or requirements for each of its use cases. Further, we also see the Upgrade nodes behavior for the Nodes launched by Karpenter by leveraging the option of Drift with the nodepool.

Gowtham S

Gowtham S

Gowtham S is a Container Specialist Technical Account Manager at AWS, based out of Bengaluru. Gowtham works with AWS Enterprise Support customers helping them to optimize Kubernetes workloads through pro-active operations reviews. He is passionate about Kubernetes and open-source technologies.

Abhishek Nanda

Abhishek Nanda

Abhishek is a Containers Specialist Solutions Architect at AWS based out of Bengaluru, Karnataka, India with over 7 years of IT experience. He is passionate designing and architecting secure, resilient and cost effective containerized infrastructure and applications.