Run Kubernetes clusters for less

with Amazon Elastic Kubernetes Service and Spot Instances

Amazon Elastic Kubernetes Service (Amazon EKS) is a fully managed Kubernetes service, which runs upstream Kubernetes and is certified Kubernetes conformant so you can leverage all the benefits of open source tooling from the community. You can also easily migrate any standard Kubernetes application to EKS without needing to refactor your code.

Amazon EC2 Spot Instances let you take advantage of unused EC2 capacity in the AWS cloud. Spot Instances are available at up to a 90% discount compared to On-Demand prices.

Spot Instances are a great fit for your stateless containerized workloads running on your Kubernetes clusters because the approach to containers and Spot Instances are similar – ephemeral and autoscaled capacity. This means they both can be added and removed while adhering to SLAs and without impacting performance or availability of your applications.

In this tutorial you will learn how to add Spot Instances to your EKS clusters, while adhering to Spot Instance best practices that allow you to run your applications without compromising performance or availability. You will also deploy a sample Kubernetes deployment and autoscale it on your Spot Instance worker nodes by using Kubernetes Cluster-Autoscaler.

About this Tutorial
Time 30 minutes      
Cost Less than $5
Use Case Containers, Compute
Products Elastic Kubernetes Service, EC2 Spot Instances
Level 300
Last Updated April 20, 2020

Already have an account? Log in to your account

Step 1: Set up AWS CLI, and eksctl and kubectl

If you already have an EKS cluster that was started using eksctl, jump to Step 3 to add Spot Instances to your existing cluster.

1.1 — Install version 2 of the AWS CLI by running the following commands if you’re using Linux, or follow the instructions in the AWS CLI installation guide for different operating systems.

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

1.2 — Both eksctl and the AWS CLI require that you have AWS credentials configured in your environment. The aws configure command is the fastest way to set up your AWS CLI installation for general use. Run the command and follow the prompts. In this tutorial you will create various resources, so it’s recommended to have an Administrator IAM policy.

aws configure

1.3 — Install eksctl, the official EKS command line tool, which you will use to deploy your EKS cluster and node groups.

Download and extract the latest release of eksctl with the following command.

curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp

1.4 — Move the extracted binary to /usr/local/bin.

sudo mv /tmp/eksctl /usr/local/bin

1.5 — Test that your installation was successful with the following command.

eksctl version

1.6 — Install kubectl, the Kubernetes command line tool.
Follow the instructions in the official Kubernetes docs to get the latest version of kubectl here.

Step 2: Create Amazon EKS cluster with an On-Demand node group

In this step you will create an Amazon EKS cluster using the eksctl command line tool. This will create an EKS control plane, and one node group which will include two t3.medium instances.

2.1 — This step will take approximately 15 minutes

Eksctl automatically creates your kube config file, so after the cluster deployment step is complete, you can already use kubectl with your cluster without any further configurations.

eksctl create cluster --version=1.15 --name=eksspottutorial --nodes=2 --region=<your-desired-region> --node-type t3.medium --node-labels="lifecycle=OnDemand" --asg-access

Note that we are labelling our On-Demand node group with the appropriate Lifecycle label, as well as passing the “asg-access” parameter, so when we run the Kubernetes Cluster-Autoscaler on our On-Demand nodes, it will have IAM access to scale our EC2 Auto Scaling groups.

2.2 — Once complete, run the following command to verify that the cluster was started, the On-Demand node group with one instance was deployed, and that your kubectl tool is able to reach the cluster.

kubectl get nodes

Step 3: Add Spot node groups to your EKS cluster

In order to tap into multiple Spot capacity pools, which will increase the chances of achieving the desired scale and keeping it if some of the capacity pools get interrupted (when EC2 needs the capacity back), you will create two node groups, with each node group containing multiple instance types. Each node group (EC2 Auto Scaling group) will launch instances using Spot pools that are optimally chosen based on the available Spot capacity.

3.1 — Create the following file: spot_nodegroups.yaml. You will copy your node group configuration into this file in the next step.

Make sure you make the proper edits to the parameters in <>

3.2 — 

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
    name: <eksspottutorial or your existing cluster name>
    region: <AWS Region where you started your EKS cluster>
nodeGroups:
    - name: spot-node-group-2vcpu-8gb
      minSize: 3
      maxSize: 5
      desiredCapacity: 3
      instancesDistribution:
        instanceTypes: ["m5.large", "m5d.large", "m4.large","m5a.large","m5ad.large","m5n.large","m5dn.large"] 
        onDemandBaseCapacity: 0
        onDemandPercentageAboveBaseCapacity: 0
        spotAllocationStrategy: "capacity-optimized"
      labels:
        lifecycle: Ec2Spot
      iam:
        withAddonPolicies:
          autoScaler: true
    - name: spot-node-group-4vcpu-16gb
      minSize: 3
      maxSize: 5
      desiredCapacity: 3
      instancesDistribution:
        instanceTypes: ["m5.xlarge", "m5d.xlarge", "m4.xlarge","m5a.xlarge","m5ad.xlarge","m5n.xlarge","m5dn.xlarge"] 
        onDemandBaseCapacity: 0
        onDemandPercentageAboveBaseCapacity: 0
        spotAllocationStrategy: "capacity-optimized"
      labels:
        lifecycle: Ec2Spot
      iam:
        withAddonPolicies:
          autoScaler: true

Note the instance type selection – each node group includes instance types with the same amount of vCPU and memory, and although some will have performance variability, this is normally ok for many containerized applications.

Also, note the capacity-optimized allocation strategy used for the node group, that will instruct the EC2 Auto Scaling group to launch instances using Spot pools that are optimally chosen based on the available Spot capacity.

3.3 — After creating the file and modifying the name and region parameters as needed, deploy it using eksctl.

This step will take approximately 3 minutes.

eksctl create nodegroup -f spot_nodegroups.yaml

3.4 — After the previous step was completed, confirm that the new nodes were added to the cluster.

kubectl get nodes --show-labels --selector=lifecycle=Ec2Spot

3.5 — Use the AWS Management Console to inspect your newly deployed EC2 Auto Scaling groups (ASG), you can see that the instance types from your spot_nodegroups.yaml are configured in the ASG, and the Spot Allocation Strategy is set to Capacity-optimized.

step3.5
ASG configuration

3.6 — Congratulations! You now have Spot Instances connected to your EKS cluster, ready to run your containerized workloads at up to 90% discount compared to the On-Demand price.

Step 4: Install AWS Node Termination handler

This tool will be installed as a Daemonset on your Kubernetes worker nodes, and will be used to catch the EC2 Spot Instance 2-minute interruption notification, and gracefully drain the nodes which are soon to be interrupted. You can install the tool only on Spot Instances using a nodeSelector, however, it’s recommended to run it on your On-Demand Instances as well, as it does more than just handle Spot Interruptions. You can read more and check alternative installation methods (including Helm) here.

4.1 — Install the AWS Node Termination Handler by applying the yaml file from the official repo.

kubectl apply -f https://github.com/aws/aws-node-termination-handler/releases/download/v1.2.0/all-resources.yaml

4.2 — Verify that the Node Termination Handler is running on your worker nodes.

 

kubectl get daemonsets --all-namespaces

Step 5: (Optional) Deploy the Kubernetes Cluster-Autoscaler

The Kubernetes Cluster-Autoscaler automatically adjusts the number of nodes in your cluster when pods fail to launch due to lack of resources or when nodes in the cluster are underutilized and their pods can be rescheduled onto other nodes in the cluster. 

5.1 — Deploy the latest version of Kubernetes Cluster-Autoscaler.

curl -LO https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-multi-asg.yaml

5.2 — In order to instruct cluster-autoscaler to use our newly deployed node groups (EC2 Auto Scaling groups), open the file for editing and scroll down to the lines that start with:

- --nodes=

For both lines, replace the k8s-worker-asg-1 name with the actual name of your ASG from the AWS Management Console, and set the min:max parameters before the ASG name to 3:5, to match your ASG configuration. For example:

       - --nodes=3:5:eksctl-eksspotworkshop-eksctl-nodegroup-spot-node-group-2vcpu-8gb-NodeGroup-1BSKTK0MZEJY4

       - --nodes=3:5:eksctl-eksspotworkshop-eksctl-nodegroup-spot-node-group-4vcpu-16gb-NodeGroup-1F9E4PUSAQW9R

Modify the expander from least-waste to random, in order to diversify beteen the Auto Scaling groups that cluster-autoscaler will scale

      - --expander=random

Save the file.

5.3 — Deploy the Cluster Autoscaler.

kubectl apply -f cluster-autoscaler-multi-asg.yaml

5.4 — Here is a visual representation of multiple node groups, each with similarly sized instance types, and using the capacity-optimized allocation strategy for Spot Instances. This configuration focuses on increasing the resilience of your Spot worker nodes by tapping into multiple spare capacity pools, and allowing Kubernetes cluster-autoscaler to make the right scaling decisions

step5-4
step5-4

Step 6: Deploy sample app

6.1 — Create a new file nginx-to-scaleout.yaml, paste the following specification into it and save the file:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-to-scaleout
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        service: nginx
        app: nginx
    spec:
      containers:
      - image: nginx
        name: nginx-to-scaleout
        resources:
          limits:
            cpu: 1000m
            memory: 1024Mi
          requests:
            cpu: 1000m
            memory: 1024Mi

6.2 — Deploy the deployment file and confirm that it is deployed and running one replica of the Nginx web server.

kubectl apply -f nginx-to-scaleout.yaml
kubectl get deployment/nginx-to-scaleout

6.3 — Scale the deployment (increase the number of replicas).

kubectl scale --replicas=20 deployment/nginx-to-scaleout

6.4 — Check that some pods are in Status=Pending, since there’s no free vCPUs or RAM on your existing worker nodes.

kubectl get pods

6.5 — Check in the Cluster-Autoscaler logs that it has discovered the pending pods, and it is taking a scale up activity by increasing the size of a node group.

scale_up.go:263] Pod default/nginx-to-scaleout-84f9cdbd84-vn7px is unschedulable

Final scale-up plan:

[{eksctl-eksspottutorial-nodegroup-spot-node-group-4vcpu-16gb-NodeGroup-1B468RWE6OYH2 3->5 (max: 5)}]
 

kubectl logs -f deployment/cluster-autoscaler -n kube-system | grep -i scale_up

6.6 — Confirm in the AWS Management Console that one of the EC2 Auto Scaling groups have launched more Spot Instances.

6.7 — Confirm that all the pending pods have been scheduled, this will take 1-3 minutes.

 

kubectl get pods

Step 7: Cleanup

7.1 — Remove the AWS Node Termination Handler.

 

kubectl delete daemonset aws-node-termination-handler -n kube-system

7.2 — Remove the two Spot node groups (EC2 Auto Scaling group) that you deployed in the tutorial.

 

eksctl delete nodegroup spot-node-group-2vcpu-8gb --cluster eksspottutorial
eksctl delete nodegroup spot-node-group-4vcpu-16gb --cluster eksspottutorial

7.3 — If you used a new cluster for the tutorial and not your existing cluster, delete the EKS cluster.

eksctl will confirm the deletion of the cluster’s CloudFormation stack immediately but the deletion could take up to 15 minutes. You can optionally track it in the CloudFormation Console.

eksctl delete cluster --name eksspottutorial

Congratulations

Congratulations! In this tutorial you learned how to deploy an Amazon EKS cluster and run Kuberentes deployments and services on Spot Instances, while adhering to Spot Instances best practices. By creating multiple node groups (EC2 Auto Scaling groups) with multiple similarly performant instance types and using the Capacity-Optimized allocation strategy, you can increase the resilience of your Spot worker nodes and run your applications on Spot Instances while meeting your application's SLA without compromising performance or availability.

Was this tutorial helpful?

Thank you
Please let us know what you liked.
Sorry to disappoint you
Is something out-of-date, confusing or inaccurate? Please help us improve this tutorial by providing feedback.

Advanced Workshop

For a more advanced Kubernetes tutorial that focuses on Spot Instances, and dives deeper visit this workshop.

Hands-on Spot Instances Tutorial

If you want to learn how to run other workload types on Spot Instances, such as Web applications, big-data processing workloads with EMR, Machine Learning training with Sagemaker and others, visit the EC2 Spot workshops website for self-paced labs.

Other Amazon EKS Labs

To get hands-on experience with other Amazon EKS features and capabilities, visit the EKS labs webpage.