What are some best practices for using EC2 Spot Instances with Amazon EKS?
Last updated: 2021-12-22
I want to use Amazon Elastic Compute Cloud (Amazon EC2) Spot Instances with my Amazon Elastic Kubernetes Service (Amazon EKS). What are some best practices?
The following are some best practices for using Amazon EC2 Spot Instances with your Amazon EKS:
- Don't use Spot Instances for long-running jobs or stateful applications.
- Use managed node groups with Spot Instances.
- Add multiple instance types to node groups.
- Use the AWS Node Termination Handler for self-managed node groups.
Don't use Spot Instances for long-running jobs or stateful applications
The short lifespan of a Spot Instance can cause unwanted terminations to long-running jobs. It can also affect stateful applications because stateful applications cannot tolerate shutdowns. Instead, use On-Demand Instances for long-running jobs.
Use managed node groups with Spot Instances
You can set the capacity type of a managed node group as spot. The managed node group then configures an Auto Scaling group to use EC2 Auto Scaling Capacity Rebalancing. When the EC2 Auto Scaling Capacity Rebalancing feature is activated and a Spot node receives a rebalance recommendation, Amazon EKS attempts to replace the Spot node.
After the new Spot node is ready, Amazon EKS separates and drains the previous Spot node. This can help reduce the risk of corrupted Amazon Elastic Block Store (Amazon EBS) volumes or interrupted database connections.
Add multiple instance types to node groups
Every Spot Instance pool consists of an unused EC2 instance capacity for a specific instance type in a specific Availability Zone. When a node group tries to provision a new node, it uses one of the instance types that's defined in its configuration. If the instance types don't have Spot capacity in any of the Availability Zones, then the node group fails to scale and becomes degraded.
To avoid this issue, increase the number of similar instance types in the node group. For example, for a m5.large (2 vCPU/8 GiB RAM) instance type, add ones with the same vCPU and RAM values, such as m5a.large, m5n.large, and m4.large.
Use the AWS Node Termination Handler for self-managed node groups
The AWS Node Termination Handler (from the GitHub website) is deployed to an Amazon EKS cluster as a deployment or DaemonSet. The AWS Node Termination Handler adds capabilities to self-managed node groups that they lack. It helps self-managed node groups detect and appropriately respond to EC2 maintenance events, Spot interruption notices, Auto Scaling group scale-in events, and Availability Zone rebalances. Use the Queue Processor option to add every AWS Node Termination Handler feature to the self-managed node group.