Containers
Leveraging Amazon EKS managed node group with placement group for low latency critical applications
Our customers have been asking how to host their low-latency applications with high throughput such as stock-trading applications and financial market workloads on Amazon Elastic Kubernetes Service (Amazon EKS), particularly with the EKS managed node group offering.
In this blog post, we introduce the concept of Amazon Elastic Compute Cloud (Amazon EC2) placement groups, and demonstrate how to set up EKS managed node group with launch template to enable placement group. The blog post provides two ways to implement the solution: AWS Cloud Development Kit (AWS CDK) and Terraform, two very popular infrastructure as code (IaC) tools. Last but not the least, a performance test is performed to see the performance difference between Amazon EKS on Amazon EC2 placement group vs Amazon EKS on normal worker nodes.
Time to read | 8 minutes |
Time to complete | 45 minutes |
Cost to complete | $3 |
Learning Level | Advanced (300) |
Services used | Amazon EKS, Amazon EC2, AWS CDK |
Preview Changes (opens in a new tab)
What is low latency ?
Latency is the time that passes between a user action and the resulting response. There are many processing workloads (e.g., stock-tracking applications, financial market workloads) that require low-latency response time of less than one millisecond. Low-latency computing usually requires very fast inter-process communication (IPC) and inter-computer communications. In order to get quick response time, the application needs to have a high throughput as well as great computing capability.
Introduction of placement group
What is placement group?
When you launch a new EC2 instance, the Amazon EC2 service attempts to place the instance in such a way that all of your instances are spread out across underlying hardware to minimize correlated failures. You can use placement groups to influence the placement of a group of interdependent instances to meet the needs of your workload.
Types of placement group
There are three types of placement group:
- Cluster—packs instances close together inside an Availability Zone. The following image shows instances that are placed into a cluster placement group. This strategy enables workloads to achieve the low-latency network performance necessary for tightly coupled node-to-node communication that is typical of high performance computing (HPC) applications.
- Partition—spreads your instances across logical partitions, such that groups of instances in one partition do not share the underlying hardware with groups of instances in different partitions. When using partition placement groups, EC2 divides each group into logical segments called logical partitions. EC2 ensures that each partition within a placement group has its own set of racks. Each rack has its own network and power source. No two partitions within a placement group share the same racks, allowing you to isolate the impact of a hardware failure within your application. The following image shows instances that are placed into a partition placement group with three partitions: Partition 1, Partition 2, and Partition 3. Each partition comprises multiple instances which do not share racks with the instances in the other partitions. This strategy is typically used by large distributed and replicated workloads, such as Hadoop, Cassandra, and Kafka.
- Spread—strictly places a small group of instances across distinct underlying hardware to reduce correlated failures. A spread placement group is a group of instances that are each placed on distinct racks, with each rack having its own network and power source. The following image shows seven instances in a single Availability Zone that are placed into a spread placement group. The seven instances are placed on seven different racks. This strategy is recommended for applications that have a small number of critical instances that should be kept separate from each other.
For the use case mentioned above, we will choose cluster placement group for our demo solution. A cluster placement group can span peered VPCs in the same region. Instances in the same cluster placement group enjoy a higher per-flow throughput limit for TCP/IP traffic and are placed in the same high-bisection bandwidth segment of the network to ensure high inter-computer performance. Besides the low network latency and high network throughput required application, cluster placement group is also recommended when the majority of the network traffic is between the EC2 instances in the group, or in Kubernetes’s context, is the pod-to-pod communications.
To provide the lowest latency and the highest packet-per-second network performance for your placement group, we also recommend choosing an EC2 instance type that supports enhanced networking as your EKS cluster worker nodes. Enhanced networking provides higher bandwidth, higher packet per second (PPS) performance, and consistently lower inter-instance latencies. Please refer to our documentation for more details of enhanced networking.
Introduction of EKS managed node group with custom launch templates
Amazon EKS managed node groups automate the provisioning and lifecycle management of nodes (EC2 instances) for Amazon EKS Kubernetes clusters. With EKS managed node groups, you don’t need to separately provision or register the EC2 instances that provide compute capacity to run your Kubernetes applications. You can create, automatically update, or terminate nodes for your cluster with a single operation. Node updates and terminations automatically and gracefully drain nodes to ensure that your applications stay available. In short, AWS manages the EKS node groups for you instead of you managing them yourself.
In August 2020, Amazon EKS began supporting EC2 launch templates and custom AMIs for managed node groups. This enabled our customers to leverage the simplicity of EKS managed node provisioning and lifecycle management features while adhering to any level of customization, compliance, or security requirements. Given that placement group is a supporting feature of launch template, it makes placement group an available option for EKS managed node group.
Solution Overview
In this blog post, we create an Amazon EKS cluster with two managed node groups (one with placement group enabled and the other without placement group enabled). Each node group contains two c5.large instances. The EKS cluster is attached to a newly created VPC. All application workloads are running in the VPC’s public subnets for demo purposes. But in production workloads, we are recommending using private subnets to host the workloads. When you create a new cluster, Amazon EKS creates an endpoint for the managed Kubernetes API server that you use to communicate with your cluster. For your convenience, in this blog, we make the Amazon EKS control plane Kubernetes API server endpoint public so that it’s easier for you to validate the solution in your AWS account. For production workloads, we are recommending using a private-only endpoint for your EKS control plane. For more information, see our Best Practices Guide.
In the performance testing, we create two iperf3 deployments in two different node groups and test the throughput and latency performance between the two nodes within the same node group. The following diagram shows the high-level architecture. iperf
is a popular network bandwidth and performance-testing tool to perform bandwidth and throughput test.
As shown in the preceding diagram, pod01 and pod02 are created under deployment cluster-placementgroup-enabled
, and they are hosted on the two nodes with placement group enabled accordingly. pod03 and pod04 are created under deployment cluster-placementgroup-disabled
, and they are hosted on the two nodes (VM3, VM4) with placement group disabled accordingly. These are done via the use of podAntiAffinity
Rules and nodeSelector
described in the “Performance Testing” section. The performance testings (iperf3
and ping
) occur between pod01 and pod02, and pod03 and pod04 accordingly, to test the inter-node pod-to-pod throughput and latency.
Walkthrough
Here are the high-level deployment steps:
- Clone the code from the GitHub repo.
- If you are using
cdk
, please continue with steps 3, 4, and 6.
Terraform users, please continue with steps 5 and 6. - Run the
npm
command to compile the code if you are usingcdk
. - Run the
cdk deploy
command to deploy all components, including the AWS resources and Kubernetes. - If you are using
terraform
, runterraform init
,terraform plan
andterraform apply
. - Conduct performance testing.
Prerequisites
To deploy with AWS CDK/Terraform code, you need the following:
- A good understanding of Amazon EKS and Kubernetes. You also need basic knowledge of Amazon EC2, the AWS CDK, Typescript, or Terraform.
- An AWS account with the permissions required to create and manage the EKS cluster and Amazon EC2. All those resources will be created by AWS CDK/Terraform automatically.
- The AWS Command Line Interface (AWS CLI) configured. For information about installing and configuring the AWS CLI, see Installing, updating, and uninstalling the AWS CLI version 2.
- A current version of Node/Terraform; in this blog post, we use npm version 8.0.0 and Terraform version 1.0.8.
- The Kubernetes command-line tool,
kubectl
. For installation and setup instructions, see Install and Set Up kubectl.
AWS CDK Deployment Steps
The AWS CDK Toolkit, the AWS CLI command cdk
, is the primary tool for interacting with your AWS CDK app. The code will create a new VPC with two public subnets and an EKS cluster with two managed node groups, one with placement group enabled and the other without placement group enabled.
- For information about installing the cdk command, see AWS CDK Toolkit (cdk command). In the example, we are using AWS CDK 1.25.0 or above.
- Use the git clone command to clone the repository that contains all the AWS CDK code used in this blog.
- Install the required
npm
modules and then usenpm
commands to compile the AWS CDK code. - Use the
cdk deploy
command to deploy the AWS resources and Kubernetes workloads. It takes approximately 30 minutes to provision the cluster. - To set up
kubectl
to access the cluster, copy the output of thePlacementGroupDemoEKSConfigCommand53CB05BE
in step 4:
kubectl
command to check the pod status.
AWS CDK code deep dive
- In
lib/cdk-stack.ts
, an EKS cluster is created using@aws-cdk/aws-eks
library. It also creates a VPC with two public subnets and private subnets by default. For this demo, all worker nodes are placed in public subnets.
- Following the EKS cluster creation, two managed EKS worker node groups are provisioned. One node group will utilize launch template support for Amazon EKS to place worker nodes in a placement group with
strategy
set tocluster
.
- Both node groups have applied a Kubernetes node label:
placementGroup
, with eithertrue
orfalse
value to identify whether the node is placed in the cluster placement group or not. This label is later used for scheduling the performance testing pods to the relevant nodes.
Terraform deployment steps
The Terraform code will create a new VPC with two public subnets and an EKS cluster with two managed node groups, one with placement group enabled and the other without placement group enabled. All the nodes in the same node group will stay in the same Availability Zone for performance testing purposes.
- For information about installing the terraform command, see Terraform. In the example, we are using Terraform v1.0.8. Please follow the document to download the Terraform CLI. After installing the Terraform CLI, we will be able to see it as below:
- Use the git clone command to clone the repo that contains all the Terraform code used in this blog.
- Perform the
terraform init
to install the required Terraform packages. In this blog post example, the Terraform stateterraform.tfstate
will be installed locally. - Use the
terraform apply
command and key in the Region name to deploy the AWS resources and Kubernetes workloads. It takes approximately 30 minutes to provision the cluster. - After it has been successfully deployed, run the following command to get
kubectl
access to the EKS cluster.
Terraform code deep dive
- In the
launch_template.tf
, the placement group has been configured. - In the placement section of
aws_launch_template
, we set theavailability_zone
to the same Availability Zone, which refers to theaws_placement_group
withcluster
strategy.
- EKS managed node group uses the launch template created in
eks-cluster.tf
.
We create two managed node groups in this example. One is using the launch template created in the launch_template.tf
, and the other uses no launch templates but sticks with one Availability Zone. In addition, the labels placementGroup="true"
and placementGroup="false"
have been passed into both node groups and to the Kubernetes Nodes. This label allows us to deploy Kubernetes deployments into two different node groups.
Performance testing
The sample code mentioned previously creates a new VPC and deploys an EKS cluster with two node groups. Both node groups contain two c5.large instances, one with placement group of cluster
type and the other one without placement group enabled. iperf3, a popular network bandwidth and performance testing tool, is deployed into both node groups for evaluating the network performance. We also use ping
command to test the round-trip latency in these two scenarios. The following steps display the performance testing process.
Install iperf3
and ping
performance testing tools
The following actions will install iperf3
and ping
tool on the placement group enabled node group and the node group without placement group enabled but in the same Availability Zone.
Review the yaml manifest
Let’s take a look at the deployment of cluster-placementgroup-enabled
. The similar settings are enabled for the deployment of cluster-placementgroup-disabled
, as well.
- nodeSelector – the nodeSelector helps to deploy the two replicas to the nodes with label of
placementGroup=true
, which are the nodes with placement group enabled. - podAntiAffinity – the podAntiAffinity spec helps to make sure the two replicas are not hosted on the same node if possible.
With the preceding specs set up, the deployment pods are able to be placed into the right node for our performance testing.
Check the EKS Nodes
In this example, we can see the nodes with placement group enabled and the nodes without placement group enabled. They have true
and false
respectively as the value for the placementGroup
label.
- The EC2 nodes with placement group enabled
- The EC2 nodes WITHOUT placement group enabled
Return pods running in the default namespace
We can see the deployment cluster-placementgroup-enabled
is installed in the placement group enabled nodes, and cluster-placementgroup-disabled
is installed in the placement group disabled
nodes.
Run performance test from one pod to the other
In this exercise, we use kubectl exec
to get a shell to one of the pods to perform the simple test.
- Result with placement group configured
- Result without placement group but in the same Availability Zone
Performance testing summary
From the sample performance test result shown above, we can see that the placement group inter-node pod-to-pod throughput is approximately double the one without the placement group (9.50 Gbits/sec vs. 4.94 Gbit/sec), and the latency is around 66 percent lower (0.155ms vs. 0.457ms). This shows better performance in both throughput and latency with cluster
placement group.
- Note:
- There are chances that the inter-node pod to pod without placement group enabled can achieve the same level of performance as the one with placement group enabled, as it is possible that the two underlying EC2 nodes can sit in the close rack in the same Availability Zone. However, this cannot be guaranteed. In order to achieve high-consistency inter-node pod-to-pod performance, it is recommended that placement group be enabled for the underlying nodes in the Kubernetes cluster.
- For latency, during our tests, we found latency was around 24 to 66 percent lower with placement group enabled vs. placement group disabled.
Cleaning Up
To avoid ongoing charges to your account, run the following commands to clean up resources. The cleanup process takes approximately 30 minutes.
Amazon CDK
Terraform
Conclusion
Amazon EKS managed node group can support a variety of customized instance configurations after its official support of launch templates, including the placement group introduced in this blog post. With the benefits of lower latency and higher throughput, our customers can leverage the placement group option in their low-latency container applications running on EKS managed node group.