Automatically enable group metrics collection for Amazon EKS managed node groups
Amazon Elastic Kubernetes Service (Amazon EKS) managed node groups automate the provisioning and lifecycle management of Kubernetes nodes (Amazon Elastic Compute Cloud (Amazon EC2) instances) for Amazon EKS Kubernetes clusters.
Managed nodes are provisioned as part of an Amazon EC2 Auto Scaling group that’s managed for you by Amazon EKS. Amazon EKS doesn’t enable group metrics collection for Auto Scaling groups created for managed nodes.
In issue 762 on AWS Container Services roadmap, customers requested us to enable group metrics collection by default. This post provides a solution for enabling Auto Scaling group metrics collection using AWS Lambda and AWS CloudTrail.
Auto Scaling group metrics
Customers use Auto Scaling group metrics to track changes in an Auto Scaling group and to set alarms on threshold values. Auto Scaling group metrics are available in the Auto Scaling console or the Amazon CloudWatch console. Once enabled, the Auto Scaling group sends sampled data to Amazon CloudWatch every minute. There is no charge for enabling these metrics.
By enabling Auto Scaling group metrics collection you’ll be able to monitor the scaling of managed node groups. Auto Scaling group metrics report the minimum, maximum, and desired size of an Auto Scaling group. You can create an alarm if the number of nodes in a node group falls below the minimum size, which would indicate an unhealthy node group. Tracking node group size is also useful in adjusting the max count so your data plane doesn’t run out of capacity.
When you create a managed node group, AWS CloudTrail sends a
CreateNodegroup to Amazon EventBridge. By creating an Amazon EventBridge rule that matches the
CreateNodegroup event, you trigger an AWS Lambda function to enable group metrics collection for the Auto Scaling group associated with the managed node group.
The Amazon Cloud Development Kit (AWS CDK) code provided in this post creates an Amazon EventBridge rule that forwards the CreateNodegroup event to a AWS Lambda function. The function extracts the cluster name and managed node group name from the event to determine the associated Auto Scaling group using the Amazon EKS describe-nodegroup application programming interface (API). The function then enables group metrics collection on the Auto Scaling group.
The function looks for specific tags on the managed node group. By default, the function enables Auto Scaling group metrics collection when you create a new managed node group with the
ASG_METRICS_COLLLECTION_ENABLED tag set to
TRUE. You can customize the tag in the AWS Lambda function code.
You need the following to complete the steps in this post:
- AWS CDK 2.19 or later
- AWS Command Line Interface (AWS CLI) version 2 (for testing only)
- Python 3.7 or later
- An EKS cluster
Clone the code repository:
Bootstrap AWS CDK if this is your first time using it:
Deploy the stack to create the following resources
- An AWS Lambda function and a Amazon CloudWatch log group.
- An Amazon EventBridge rule that matches
CreateNodegroupand sends the event to the AWS Lambda function.
- An AWS Identity and Access Management (AWS IAM) role that allows the AWS Lambda function to write to Amazon CloudWatch, describes Amazon EKS nodegroups, and enables Auto Scaling group metrics collection.
Create an Amazon EKS managed node group and enable Auto Scaling group metrics collection using tags. You can use the reference AWS CLI following command to create a managed node group.
Wait about five minutes before verifying that the Auto Scaling group metrics are enabled. The Amazon EKS console shows Auto Scaling groups for managed node groups.
Navigate to the associated Auto Scaling group in the Amazon EC2 console and switch to the Monitoring tab. The option for Auto Scaling group metrics collection should now be enabled.
Enable Auto Scaling group metrics collection for existing managed node groups
The AWS Lambda function provided created by the CDK code enables Auto Scaling group metrics collection for Amazon EKS managed node groups created after deploying the stack. You can use the following Python script to enable Auto Scaling group metrics collection for existing managed node groups for all clusters in a Region.
Remove resources created in this post by running the following command:
Delete the node group created for testing:
In this post, I showed you how to enable Auto Scaling group metrics collection for Amazon EKS managed node groups. You can control group metrics collection for your managed node groups by adding a tag (
ASG_METRICS_COLLLECTION_ENABLED=TRUE) to your node groups.
You can track the development of this roadmap on AWS Container Services roadmap on Github.
This post includes contributions from Maksim Poletaev, Sr Solutions Architect, AWS.