Containers

End of support notifications and enhanced discoverability for Amazon EKS

This post was jointly authored by Praseeda Sathaye (Principal Solutions Architect, Containers & OSS), AJ Davis (AWS Enterprise Support) and Arvind Viswanathan (Principal Solutions Architect).

Introduction

In the rapidly evolving world of containerized applications, maintaining resilience and observability across Kubernetes environments has become a critical challenge. As organizations increasingly adopt Amazon Elastic Kubernetes Service (Amazon EKS) to manage their containerized workloads, the need for cluster version lifecycle management and discovery mechanisms becomes crucial. As Amazon EKS environments grow more complex and span multiple AWS Regions and accounts, users often struggle to track cluster versions, support lifecycles, and overall deployment status.

Proactive monitoring of EKS cluster lifecycles and end of support is crucial to making sure of the security, stability, and compliance of Kubernetes deployments. Furthermore, gaining visibility into EKS cluster deployments across an entire AWS Organization is essential for effective resource management, strategic planning, and maintaining an accurate inventory.

In this post, to address these pain points, we share two robust solutions that provide observability of EKS clusters:

  1. End of support notifications
  2. Discovery and reporting

The first solution uses AWS Health, Amazon EventBridge, and Amazon Simple Notification Service (Amazon SNS)/Amazon Simple Queue Service (Amazon SQS) to monitor Amazon EKS-specific events, particularly for clusters approaching end of support (standard and extended). Delivering early notifications when an EKS cluster is nearing the end of its support window allows this solution to empower you to proactively plan and update your cluster’s Kubernetes version.

Complementing this, the second solution is an automated discovery and reporting mechanism that identifies and aggregates detailed information about EKS clusters across all AWS Regions and accounts within your Organization. This comprehensive visibility into cluster versions, associated tags, and other key details facilitates compliance checks, accurate resource inventory management, and strategic upgrade planning.

Together, these two solutions provide a robust framework for effective EKS cluster lifecycle management, enabling organizations to stay ahead of potential issues, optimize resource usage, and make informed decisions that align with their long-term strategic goals.

Prerequisites

You need the following to complete the walkthrough:

Initial setup

The following steps guide you through the initial setup.

Enable AWS Health Organizational View within the management account

Enable Organizational View in AWS Health to obtain a centralized, aggregated view of AWS Health events across your entire Organization. You can verify that this is enabled through the console or by running the following command using the AWS Command Line Interface (AWS CLI): aws health describe-health-service-status-for-organization. You should see the following: {"healthServiceAccessStatusForOrganization": "ENABLED" }

A Business, Enterprise On-Ramp, or Enterprise Support plan from AWS Support is necessary to use the AWS Health API and to complete this step.

Delegate administration from management account to a central tooling account

Set up an AWS account within the Organization to be the tooling account for this solution. This account is used to centralize notifications and discovery.

From the management account, delegate AWS CloudFormation StackSets administration by following the steps described in this post: CloudFormation StackSets delegated administration.

The same result can also be achieved by running the following command from the management account. Replace 012345678901 with the AWS account ID of your tooling account.

aws organizations register-delegated-administrator \ 
--serviceprincipal=member.org.stacksets.cloudformation.amazonaws.com \
--account-id="012345678901"
JSON

This is the only time we need to access the management account. The remaining steps are completed from within the tooling account.

Bootstrap AWS CDK

Choose a primary Region where all the reporting and events are consolidated within the central tooling account. Set the AWS_DEFAULT_REGION variable to this primary Region.

For the discovery and reporting solution, you must bootstrap AWS CDK in this primary Region across the entire Organization. Moreover, AWS CDK must also be bootstrapped in all AWS Regions where EKS clusters are deployed to receive end of support notifications. To streamline this walkthrough, we demonstrate the deployment of the resources to only the primary Region you have chosen.

The steps to bootstrap AWS CDK across multiple AWS Regions and accounts are available in this post: Bootstrapping multiple AWS accounts for AWS CDK using CloudFormation StackSets.

Download the AWS CDK stacks

We provide AWS CDK stacks for you to quickly deploy the solution in your environment. Download the code from our GitHub repository and set up the environment by running the following commands within the cdk directory:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
JSON

Walkthrough

The following steps walk you through these solutions.

Solution 1: EKS cluster end of support notifications

Our first solution addresses the critical need for timely awareness of EKS cluster lifecycle events, particularly the approach of end-of-standard-support dates. Using AWS Health, EventBridge, and Amazon SNS (and optionally Amazon SQS) allowed us to create a centralized system that:

  • Monitors AWS Health events across multiple AWS Regions and accounts
  • Focuses on Amazon EKS-specific events, specifically the AWS_EKS_PLANNED_LIFECYCLE_EVENT
  • Provides early notifications when an EKS cluster is 180 days away from reaching the end of its standard support and extended support periods

This centralized approach makes sure that Amazon EKS users receive sufficient time to plan and execute version upgrades, maintaining the security and stability of their Kubernetes environments, as shown in the following figure.

Figure 1: Solution overview - end of support notifications

Figure 1: Solution overview – end of support notifications

Step 1: Deploy the eks-health-events AWS CDK stack

Deploy the eks-health-events AWS CDK stack to the central tooling account using the following command:

cdk deploy eks-health-events --app "python3 tooling_account.py" —require-approval never
JSON

This deploys the AWS CDK app in tooling_account.py, which provisions the following resources in the central tooling account:

  • Event bus
  • SNS topic and SQS queue to monitor events
  • EventBridge rule to forward planned lifecycle events for Amazon EKS to Amazon SNS
  • EventBridge rule to forward monitor planned lifecycle events for Amazon EKS to Amazon SQS
  • Resource policies for the event rules to publish to Amazon SNS and Amazon SQS

Step 2: Deploy the eks-health-events-stack-set AWS CDK stack

Deploy the eks-health-events-stack-set AWS CDK stack.

cdk deploy eks-health-events-stack-set --app "python stack_sets.py" —require-approval never
JSON

This uses CloudFormation StackSets to deploy the following resources to the chosen primary Region across all the accounts in the Organization besides the Management account:

  • Local event bus
  • EventBridge rule to forward planned lifecycle events for Amazon EKS to the central event bus that was provisioned in Step 2
  • Resource policies for the event rules to publish to the central event bus

Step 3: Configure SNS notifications

Browse to the Amazon SNS service named eks-health-events-EKSHealthEvents-<primary region> and create a subscription to the newly created topic (for example a group email address).

Step 4: Validate the solution

You can inspect and validate that the EventBridge rules, SQS queue, and SNS topic were created by the CloudFormation stacks named eks-health-events and eks-health-events-stack-set. From this point on, as your EKS clusters are 180 days away from reaching the end of support (standard and extended), the EventBridge rules apply and Amazon SNS and/or Amazon SQS is triggered, as shown in the following figure.

Figure 2: Validate EventBridge deployment

Figure 2: Validate EventBridge deployment

Figure 3: Validate SQS deployment

Figure 3: Validate SQS deployment

Figure 4: Validate SNS deployment

Figure 4: Validate SNS deployment

Figure 5: Sample end of support notification

Figure 5: Sample end of support notification

Solution 2: EKS cluster discovery and reporting

Complementing the EKS cluster end of support notifications solution, our second solution offers a comprehensive view of EKS clusters across an entire Organization. This solution:

  • Identifies EKS clusters in all AWS Regions and accounts within an Organization
  • Collects detailed information about each cluster, such as account details, region, cluster name, version, and associated tags
  • Aggregates data on cluster versions, providing insights into version distribution
  • Generates both detailed and summary reports, stored centrally for direct access

Providing this organization-wide visibility allows the solution to enable teams to maintain an accurate inventory of Amazon EKS resources, facilitate compliance checks, and support strategic upgrade planning, as shown in the following figure.

Figure 6: Solution overview - discovery and reporting

Figure 6: Solution overview – discovery and reporting

Step 1: Deploy the eks-discovery AWS CDK stack

Deploy the eks-discovery-lambda AWS CDK stack to the central tooling account using the following command:

cdk deploy eks-discovery-lambda —require-approval never
JSON

This deploys the AWS CDK stack named eks-discovery-lambda in tooling_account.py, which provisions the following resources in the central tooling account:

  • Lambda function to discover EKS clusters across all AWS Regions and accounts and
  • S3 bucket to store results
  • SNS topic for notifications
  • EventBridge scheduler for recurring execution
  • Necessary IAM roles and policies

The Lambda function collects cluster details, generates reports, and sends notifications.

Step 2: Modify the EventBridge scheduler as needed

If you would like to customize the EKS cluster discovery schedule, then navigate to EventBridge and under schedules find the newly created EKSDiscoveryWeeklySchedule. This is a cron-based scheduler, as shown in the following figure.

Figure 7: Customize schedule for cluster discovery

Figure 7: Customize schedule for cluster discovery

To receive notifications from Amazon SNS you must create a subscription to the topic. To do this, navigate to the Amazon SNS service, locate the newly created Topic named EKSDiscoverySNSTopic, and configure the protocol to meet your requirements (for example emailing to a group).

Step 3: Deploy the cross-account role that the Lambda function can assume to perform discovery

The Lambda function you deployed in Step 1 relies on a cross-account role in each of the accounts within the Organization to perform cluster discovery.

Deploy the eks-discovery-stack-set AWS CDK stack that rolls out this cross-account role.

cdk deploy eks-discovery-stack-set --app "python stack_sets.py" --require-approval never
JSON

Step 4: Validate the solution

To validate the solution, navigate to the newly created Lambda function and test with a new event and an empty JSON object. When the Lambda completes, verify that the S3 bucket receives the zip file and confirm that you received an SNS notification, as shown in the following figures.

Figure 8: Sample output of cluster discovery in S3 bucket

Figure 8: Sample output of cluster discovery in S3 bucket

Figure 9: Sample contents of output file

Figure 9: Sample contents of output file

Figure 10: Sample list of clusters

Figure 10: Sample list of clusters

Figure 11: Sample count of clusters by version

Figure 11: Sample count of clusters by version

Step 5: (Optional) Monitor the solution

You may want to monitor the solution. This can be done by setting up Amazon CloudWatch Alarms to monitor the Lambda function’s execution and any potential errors. Furthermore, regularly review the generated reports in the S3 bucket and periodically review and update the IAM permissions if needed.

Troubleshooting

  • Make sure that all IAM roles and policies are correctly set up and have the necessary permissions.
  • Check CloudWatch Logs for any error messages in the Lambda functions or EventBridge rules.

Security considerations

  • Review and adjust the IAM roles and policies to adhere to the principle of least privilege and your environment.
  • Regularly audit access to the centralized event management system.

Cleaning up

Run the following commands to clean up the resources provisioned:

cdk destroy --app "python stack_sets.py" --all --force
cdk destroy --all --force
JSON

The first command deletes the CloudFormation StackSets that were deployed throughout the Organization using the AWS CDK App named stack_sets.py. The second command cleans up the resources provisioned within the central tooling account using the AWS CDK App named tooling_account.py.

Conclusion

This guide can help you set up a robust system using AWS services to provide proactive end of standard support notifications. This enables timely planning for upgrades, mitigating risks from outdated clusters while maintaining security, stability, and compliance. Moreover, the Amazon EKS cluster discovery and reporting solution marks a significant step forward in managing complex, multi-account Kubernetes environments on AWS. The solution enhances visibility, streamlines compliance efforts, facilitates strategic planning, and supports informed decision-making for cluster upgrades and resource allocation.

As organizations continue to scale their containerized applications, these solutions become invaluable assets. They enable teams to maintain a clear overview of their Amazon EKS landscape, optimize resource usage, and make sure of consistent management practices across diverse deployments. Implementing these solutions allows you to take a significant step forward in managing the observability, resilience, and governance of your Amazon EKS environments. In turn, this makes sure of the long-term success and scalability of your Kubernetes initiatives on AWS.

As a final call to action, we recommend trying both solutions to begin enhancing your EKS cluster observability today!