Maximize Amazon EKS efficiency: How Auto Mode, Graviton, and Spot work together

Amazon Elastic Kubernetes Service (Amazon EKS) Auto Mode streamlines the operation of your Amazon EKS clusters by automating key infrastructure components. This automation streamlines various operational tasks, allowing for more efficient resource allocation and management. By reducing the manual effort required to maintain the infrastructure, Amazon EKS Auto Mode enables teams to focus on higher-level strategic initiatives and application development.

While our previous blog covered the core concepts of Amazon EKS Auto Mode, this blog post dives deeper into optimizing Amazon EKS Auto Mode clusters using AWS Graviton and Amazon EC2 Spot instances. AWS customers adopt AWS Graviton instances to achieve up to 40% higher price-performance ratio and up to 60% less energy to meet their sustainability goals. Additionally, AWS customers use Amazon EC2 Spot instances for eligible workloads to save up to 90% on Amazon Elastic Compute Cloud (Amazon EC2) On-Demand costs.

Solution overview

We will cover AWS Graviton and Amazon EC2 Spot implementations on Amazon EKS Auto Mode through the following two scenarios:

Deploy the retail store application (referenced in the previous blog) using exclusively AWS Graviton (ARM64) instances.
Deploy the retail store application using a mix of Spot and On-Demand Amazon EC2 instances with the following considerations:
1. Self-managed MySQL, a stateful application using persistent volumes, must run on On-Demand instances as it’s not suitable for Spot instances. While running a Relational Database Management System (RDBMS) in Kubernetes is not recommended, we’re using it here solely to demonstrate a stateful workload example.
2. All other applications in the retail store application are eligible to run on Amazon EC2 Spot instances.
3. All the applications in the retail store application can run on AMD64, ARM64, or a mix of both architectures.

Getting started

Follow these steps sequentially from the previous blog:

Complete prerequisites
Create cluster
Deploy Ingress Class by applying ingress.yaml (ALB Ingress for the retail store app)
Deploy EBS Storage Class by applying ebs-sc.yaml (Persistent Volume claim for the self-managed MySQL RDBMS)

Steps for scenario 1: Adopting AWS Graviton instances:

Create custom NodePool: After completing the “Common Steps” section, create a custom NodePool named graviton-ondemand in your Amazon EKS Auto Mode cluster. While the predefined system NodePool supports ARM64 architecture and could theoretically be used (with tolerations to the taints) to deploy applications on AWS Graviton based Amazon EC2 instances, this approach is not recommended. Instead, follow these best practices:
1. Reserve the system NodePool exclusively for:
  1. Critical cluster add-ons
  2. System-level components
  3. Core infrastructure services
2. Create dedicated NodePools for:
  1. Specific architecture requirements (like AWS Graviton’s ARM64)
  2. Different performance or scaling needs
  3. Prioritize Savings Plans and/or Reserved Instances
3. This separation helps maintain:
  1. Clear operational boundaries
  2. Better resource allocation
  3. Improved capacity availability
  4. Easier maintenance and troubleshooting

The graviton-ondemand NodePool will provide a dedicated environment for your exclusive ARM64-based workloads.

cat << EOF > graviton-ondemand.yaml 

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: graviton-ondemand
spec:
  template:
    spec:
      nodeClassRef:
        group: eks.amazonaws.com
        kind: NodeClass
        name: default
      expireAfter: 336h
      terminationGracePeriod: 24h
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
        - key: eks.amazonaws.com/instance-category
          operator: In
          values:
            - c
            - m
            - r
        - key: eks.amazonaws.com/instance-generation
          operator: Gt
          values:
            - "6"
        - key: kubernetes.io/arch
          operator: In
          values:
            - arm64
        - key: eks.amazonaws.com/instance-size
          operator: NotIn
          values: [nano, micro, small]
  limits:
    cpu: 1000
    memory: 1000Gi
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s
    budgets:
    - nodes: 10%
  weight: 10

EOF

kubectl apply -f graviton-ondemand.yaml

Two important attributes of the above NodePool that differentiate it from the general-purpose NodePool created during cluster provisioning are:

ARM64 is the only supported architecture in this NodePool:

- key: kubernetes.io/arch
  operator: In
  values:
    - arm64

This NodePool has a weight of 10, whereas the general-purpose and system NodePools provisioned during Amazon EKS Auto mode cluster creation had zero weight. This higher weight gives this NodePool priority over the general-purpose and system NodePools.

weight: 10

Deploy the retail store application using Helm. Create the values.yaml file to define the customization parameters for your Helm chart and execute the helm install command to deploy the application to your Amazon EKS cluster:

Create this values.yaml:

cat << EOF > values.yaml

catalog:
  mysql:
    secret:
      create: true
      name: catalog-db
      username: catalog
    persistentVolume:
      enabled: true
      accessMode:
        - ReadWriteOnce
      size: 30Gi
      storageClass: eks-auto-ebs-csi-sc

ui:
  endpoints:
    catalog: http://retail-store-app-catalog:80
    carts: http://retail-store-app-carts:80
    checkout: http://retail-store-app-checkout:80
    assets: http://retail-store-app-assets:80
  autoscaling:
    enabled: true
    minReplicas: 5
    maxReplicas: 10
    targetCPUUtilizationPercentage: 50
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        app: ui
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        app: ui
  ingress:
    enabled: true
    className: eks-auto-alb
    annotations:
      alb.ingress.kubernetes.io/healthcheck-path: /actuator/health

checkout:
  endpoints:
    orders: http://retail-store-app-orders:80

EOF

Execute Helm install command to deploy the application in Amazon EKS cluster:

helm install -f values.yaml retail-store-app oci://public.ecr.aws/aws-containers/retail-store-sample-chart --version 0.8.5

Wait a few minutes until all pods reach Running status:

kubectl get pods

NAME                                               READY   STATUS    RESTARTS        AGE
retail-store-app-assets-ff99f9c64-p95cx            1/1     Running   0               3m55s
retail-store-app-carts-6dc4cd6b79-btdj2            1/1     Running   0               3m55s
retail-store-app-carts-dynamodb-5958cf99cb-8dsh9   1/1     Running   0               3m55s
retail-store-app-catalog-5f44f6f487-6wrbx          1/1     Running   0               3m55s
retail-store-app-catalog-mysql-0                   1/1     Running   0               3m55s
retail-store-app-checkout-8448fb4cff-298tt         1/1     Running   0               3m55s
retail-store-app-checkout-redis-6977ff5b75-mvxb4   1/1     Running   0               3m55s
retail-store-app-orders-58cddb8dfb-s5mhx           1/1     Running   0               3m55s
retail-store-app-orders-postgresql-0               1/1     Running   0               3m55s
retail-store-app-orders-rabbitmq-0                 1/1     Running   0               3m55s
retail-store-app-ui-5c856459f-7n6xn                1/1     Running   0               3m55s
retail-store-app-ui-5c856459f-8frbb                1/1     Running   0               3m40s
retail-store-app-ui-5c856459f-95hpb                1/1     Running   0               3m40s
retail-store-app-ui-5c856459f-9sx87                1/1     Running   0               3m40s
retail-store-app-ui-5c856459f-kjwvt                1/1     Running   0               3m40s

Get the LoadBalancer URL:

kubectl get ingress retail-store-app-ui -o jsonpath="{.status.loadBalancer.ingress[*].hostname}"

Access the application using the URL: http://[result-from-above-command]/

Verify that all worker nodes are AWS Graviton instances:

kubectl get nodes -L kubernetes.io/arch -L node.kubernetes.io/instance-type -L karpenter.sh/capacity-type

NAME                  STATUS   ROLES    AGE     VERSION               ARCH    INSTANCE-TYPE   CAPACITY-TYPE
i-0a47a83dc76f4e2a9   Ready    <none>   4m19s   v1.31.12-eks-e386d34  arm64   c7g.large       on-demand
i-0a7b99c1942edfc46   Ready    <none>   4m11s   v1.31.12-eks-e386d34  arm64   c7g.large       on-demand

Reset Amazon EKS Auto Mode cluster before proceeding to scenario 2

Execute the below commands sequentially to delete the retail-store-app suite and remove the PVCs (Persistent Volume Claims) associated with the stateful sets, which will also delete the underlying Amazon Elastic Block Store (Amazon EBS) volumes. This returns the Amazon EKS Auto Mode cluster to its original state before proceeding to scenario 2.

helm uninstall retail-store-app
kubectl delete pvc/data-retail-store-app-catalog-mysql-0

Before proceeding to scenario 2, let’s recap what we accomplished in scenario 1.

We created a new node pool exclusively for AWS Graviton (ARM64) and assigned it a weight of 10. This will make sure that any eligible workloads deployed in this Amazon EKS cluster will prioritize AWS Graviton instances first for better price-performance. If a suitable AWS Graviton node cannot be found, the workload will fall back to the x86_64 options defined in the default “general-purpose” node pool.

Steps for scenario 2: Adopting spot instances and handling workload restrictions:

Create a custom node pool named spot in the Amazon EKS Auto Mode cluster.

cat << EOF > spot.yaml

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: spot
spec:
  template:
    spec:
      nodeClassRef:
        group: eks.amazonaws.com
        kind: NodeClass
        name: default
      expireAfter: 336h
      terminationGracePeriod: 24h
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: eks.amazonaws.com/instance-category
          operator: NotIn
          values:
            - t
        - key: eks.amazonaws.com/instance-generation
          operator: Gt
          values:
            - "4"
        - key: kubernetes.io/arch
          operator: In
          values:
            - arm64
            - amd64
        - key: eks.amazonaws.com/instance-size
          operator: NotIn
          values: [nano, micro, small]
  limits:
    cpu: 1000
    memory: 1000Gi
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s
    budgets:
    - nodes: 10%
  weight: 20

EOF

kubectl apply -f spot.yaml

Key features of this NodePool:

Spot Pricing Options Support:

        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]

Multi-Architecture Support:

        - key: kubernetes.io/arch
          operator: In
          values:
            - arm64
            - amd64

Priority Weighting: The NodePool has a weight of 20, giving it priority over the “general-purpose” (0 weight), “system”(0 weight), and “graviton-ondemand”(10 weight) NodePools .

weight: 20

Deploy the retail store app using Helm. Create the values.yaml file to define the customization parameters for your Helm, and then execute the helm install command to deploy the application to your Amazon EKS cluster.

Create the values_spot.yaml creation with customization parameters.

cat << EOF > values_spot.yaml

catalog:
  mysql:
    nodeSelector:
      karpenter.sh/capacity-type: on-demand
    secret:
      create: true
      name: catalog-db
      username: catalog
    persistentVolume:
      enabled: true
      accessMode:
        - ReadWriteOnce
      size: 30Gi
      storageClass: eks-auto-ebs-csi-sc

ui:
  endpoints:
    catalog: http://retail-store-app-catalog:80
    carts: http://retail-store-app-carts:80
    checkout: http://retail-store-app-checkout:80
    assets: http://retail-store-app-assets:80
  autoscaling:
    enabled: true
    minReplicas: 5
    maxReplicas: 50
    targetCPUUtilizationPercentage: 80
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        app: ui
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        app: ui
  ingress:
    enabled: true
    className: eks-auto-alb
    annotations:
      alb.ingress.kubernetes.io/healthcheck-path: /actuator/health

checkout:
  endpoints:
    orders: http://retail-store-app-orders:80

EOF

Execute the Helm install command to deploy the application to Amazon EKS.

helm install -f values_spot.yaml retail-store-app oci://public.ecr.aws/aws-containers/retail-store-sample-chart --version 0.8.5

Key configuration aspects:

We will look into the key configuration aspects in the values_spot.yaml .

MySQL Workload Restriction: Restricts stateful MySQL workloads to “on-demand” instances only:

catalog:
  mysql:
    nodeSelector:
      karpenter.sh/capacity-type: on-demand

Verification steps:

Confirm that everything is in Running state. This step takes a couple of minutes.

kubectl get pods -o wide

NAME                                               READY   STATUS    RESTARTS      AGE   IP                NODE                  NOMINATED NODE   READINESS GATES
retail-store-app-assets-78d4fd49cf-w67cs           1/1     Running   0             69s   192.168.112.224   i-071f5f98fa309963e   <none>           <none>
retail-store-app-carts-947659c4-zd2sc              1/1     Running   0             33m   192.168.187.65    i-00222cd2a51981fee   <none>           <none>
retail-store-app-carts-dynamodb-58f675c5c8-k7tfp   1/1     Running   0             33m   192.168.187.67    i-00222cd2a51981fee   <none>           <none>
retail-store-app-catalog-56bd6bbbd-bpzk6           1/1     Running   0             69s   192.168.112.226   i-071f5f98fa309963e   <none>           <none>
retail-store-app-catalog-mysql-0                   1/1     Running   0             33m   192.168.187.71    i-00222cd2a51981fee   <none>           <none>
retail-store-app-checkout-696f448554-8lwl8         1/1     Running   0             33m   192.168.187.64    i-00222cd2a51981fee   <none>           <none>
retail-store-app-checkout-redis-6f5947f4d8-7sbb6   1/1     Running   0             33m   192.168.187.68    i-00222cd2a51981fee   <none>           <none>
retail-store-app-orders-979cb5b4c-zrxtf            1/1     Running   0             69s   192.168.112.225   i-071f5f98fa309963e   <none>           <none>
retail-store-app-orders-postgresql-0               1/1     Running   0             68s   192.168.112.228   i-071f5f98fa309963e   <none>           <none>
retail-store-app-orders-rabbitmq-0                 1/1     Running   0             33m   192.168.187.66    i-00222cd2a51981fee   <none>           <none>
retail-store-app-ui-79d8cf795b-sqvj9               1/1     Running   0             33m   192.168.187.70    i-00222cd2a51981fee   <none>           <none>
retail-store-app-ui-79d8cf795b-tmqhw               1/1     Running   0             21m   192.168.187.73    i-00222cd2a51981fee   <none>           <none>
retail-store-app-ui-79d8cf795b-vnpds               1/1     Running   0             33m   192.168.187.69    i-00222cd2a51981fee   <none>           <none>
retail-store-app-ui-79d8cf795b-zhtkj               1/1     Running   0             21m   192.168.187.72    i-00222cd2a51981fee   <none>           <none>
retail-store-app-ui-79d8cf795b-zhwfv               1/1     Running   0             69s   192.168.112.227   i-071f5f98fa309963e   <none>           <none>

Get the LoadBalancer URL: kubectl get ingress retail-store-app-ui -o jsonpath="{.status.loadBalancer.ingress[*].hostname}"

Access the application using the URL: http://[result-from-above-command]/

Verify the node configuration.

k get nodes -L kubernetes.io/arch -L node.kubernetes.io/instance-type -L karpenter.sh/capacity-type

NAME                  STATUS   ROLES    AGE   VERSION                ARCH    INSTANCE-TYPE   CAPACITY-TYPE
i-00222cd2a51981fee   Ready    <none>   33m   v1.31.12-eks-e386d34   arm64   c7g.large       on-demand
i-071f5f98fa309963e   Ready    <none>   94s   v1.31.12-eks-e386d34   arm64   c8gn.large      spot

Verifying workload placement

MySQL stateful application verification: Verify that the MySQL stateful application is running on “on-demand” capacity type as specified in values_spot.yaml:

kubectl get pods -l=app.kubernetes.io/component=mysql -o wide   
                                                                                  
NAME                               READY   STATUS    RESTARTS   AGE   IP               NODE                  NOMINATED NODE   READINESS GATES
retail-store-app-catalog-mysql-0   1/1     Running   0          34m   192.168.187.71   i-00222cd2a51981fee   <none>           <none>

UI application distribution: You may notice that the UI application pods are distributed across mixed architectures (AMD64 andARM64) and capacity types (on-demand and spot). Unlike Amazon DynamoDB Local or MySQL, UI pods have no specific restrictions, demonstrating how Amazon EKS Auto Mode efficiently bin-packs pods across available nodes according to scheduling rules:

Note: If you don’t see UI pods having the OS architecture diversification, try to increase the number of pods with the below command:

kubectl scale deployments.apps retail-store-app-ui --replicas 70

This step still doesn’t guarantee that you will get a mixed architecture. Amazon EKS Auto Mode always provisions cost-efficient nodes based on your node pool configuration.

kubectl get pods -l=app.kubernetes.io/name=ui -o wide 

NAME                                   READY   STATUS    RESTARTS   AGE     IP                NODE                  NOMINATED NODE   READINESS GATES
retail-store-app-ui-79d8cf795b-sqvj9   1/1     Running   0          35m     192.168.187.70    i-00222cd2a51981fee   <none>           <none>
retail-store-app-ui-79d8cf795b-tmqhw   1/1     Running   0          23m     192.168.187.73    i-00222cd2a51981fee   <none>           <none>
retail-store-app-ui-79d8cf795b-vnpds   1/1     Running   0          35m     192.168.187.69    i-00222cd2a51981fee   <none>           <none>
retail-store-app-ui-79d8cf795b-zhtkj   1/1     Running   0          23m     192.168.187.72    i-00222cd2a51981fee   <none>           <none>
retail-store-app-ui-79d8cf795b-zhwfv   1/1     Running   0          3m21s   192.168.112.227   i-071f5f98fa309963e   <none>           <none>

In scenario 2, we created a node pool exclusively for Amazon EC2 Spot instances that supports both AMD64 and ARM64 CPU architectures. We assigned it a weight of 20, to make sure that stateless workloads are first scheduled on worker nodes using Spot pricing. Only when Spot capacity is unavailable do workloads fall back to the ‘graviton-ondemand’ node pool, followed by the ‘general-purpose’ node pool.

Cleaning up

To avoid incurring future charges run the below steps sequentially to clean up the resources that were used for this blog post:

helm uninstall retail-store-app
kubectl delete pvc/data-retail-store-app-catalog-mysql-0
eksctl delete cluster --name eks-auto-mode-demo

Conclusion

Although AWS Graviton Compute and Spot pricing options are not the default choices defined in the “general-purpose” NodePool, customers can seamlessly add their own Custom NodePool by using one of the strategies outlined in this blog. By incorporating AWS Graviton and Amazon EC2 Spot, customers can achieve improved compute efficiency and cost optimization. Additionally, Amazon EKS Auto Mode has a built-in Spot Interruption handler and node consolidation mechanism to further enhance these optimizations.

For more information on Amazon EKS Auto Mode capabilities, visit the Amazon EKS documentation.

About the authors

Muru Bhaskaran is a Senior Solutions Architect at AWS, specializing in Graviton adoption and migration strategies. He partners with customers to achieve maximum compute optimization, delivering enhanced performance and cost savings through ARM64-based EC2 instances. A dedicated expert in EC2, EC2 Spot Instances, Amazon EKS, EKS Auto Mode, and Karpenter, Muru guides customers in harnessing these powerful AWS technologies to run their containerized workloads with greater efficiency and scale.

Zakiya Randall is a Senior Technical Account Manager at AWS and she helps customers optimize their cloud infrastructure. She specializes in containers, observability, and modernization. Outside of work, she loves to play golf and visit museums.

Containers