AWS Cloud Operations Blog
Avoid patching failures due to low disk space with AWS Systems Manager Automation and CloudWatch alarms.
Every organization has to comply with keeping their fleet updated on patching and ensure that business and workloads are not affected due to patching. One of the challenges for the operations teams is to patch at scale without affecting production software. The most common reasons workloads patching fails are insufficient disk space, a spike in CPU usage, and high memory utilization. To avoid failures, you need to monitor the workloads while patching. You can achieve this with the combination of AWS Systems Manager Automation and Amazon CloudWatch alarms. Check this AWS Systems Manager adds CloudWatch Alarms to control tasks whats new post, which allows automation to stop when a CloudWatch alarm is activated. Multi-Region and Multi-Account configurations can use this new feature too.
In this post, you’ll learn how to use a CloudWatch alarm to check for available disk space before AWS Systems Manager Automation is executed on managed instances in your AWS account. If the CloudWatch alarm is activated, patching automation will not run, avoiding a possible full disk error state. I’ll demonstrate how to configure a CloudWatch alarm to monitor available disk space on the root volume. Then, you’ll reference the alarm in AWS System Manager Automation as a safety measure to prevent the execution of the automation when that CloudWatch alarm is active.
AWS Systems Manager Automation is a service that helps customers build automated solutions to deploy, configure, and manage AWS resources at scale. There are over 300+ pre-defined AWS automation runbooks and customers can create custom Automation runbooks. You can access these runbooks in the Systems Manager’s Automation console. Systems Manager Automation runbook reference is the list of predefined runbooks.
Amazon CloudWatch is a monitoring and observability service that allows customers to monitor their applications and cloud resources on AWS. CloudWatch alarms watches a metric over a specified period and performs one or more actions based on metric thresholds.
Solution overview
Here are the steps that I follow in the post.
- Step 1: Launch an Amazon Elastic Compute Cloud (Amazon EC2) instance with the Amazon CloudWatch Agent
- Step 2: Configure CloudWatch alram to monitor free disk space percentage
- Step 3: Adding CloudWatch alarm as safety control and patching from the Automation console
- Step 4: Test and Validate
Step 1: Launching an EC2 instance
Launch an EC2 instance in your AWS account. This post uses Amazon Linux 2 with a small instance (t4.small). You will encounter costs until it is terminated. https://aws.amazon.com/premiumsupport/knowledge-center/delete-terminate-ec2/ .
Step 2: Configure the CloudWatch alarm to monitor the disk usagef
You first configure a CloudWatch alarm to monitor disk space.
- Install the CloudWatch agent on the EC2 instance. The CloudWatch agent is available as a package in Amazon Linux 2. You can install the package by entering the following command.
For Windows
You can also install the CloudWatch agent using AWS Systems Manager Installing the CloudWatch agent using AWS Systems Manager.
- You must also ensure that the IAM role attached to the instance has the CloudWatchAgentServerPolicy managed policy. See Create IAM roles to use with the CloudWatch agent on Amazon EC2 instances, for more information.
- Once installed, create a configuration file manually or by using the wizard.
- Start the CloudWatch agent using the command line.
For Linux
For Windows
- Create a CloudWatch alarm LowDiskSpace for disk free space less than or equal to 10% (or a threshold for your use case).
Step 3: Adding CloudWatch alarm as an Automation safety control
- In the AWS Systems Manager console, select Automation under Change Management.
- Click the Execute Automation button and select patching from Document categories under the left-hand navigation.
- Select the AWS-PatchInstanceWithRollback Automation runbook and select Next.
- Choose Simple execution. You can also use the CloudWatch alarm safety control with other execution types.
- In the input parameters, select your instance(s).
- In the CloudWatch alarm section, select the LowDiskSpace alarm you created in Step 2.5.
- The CloudWatch alarm is not selectable if it’s in the alarm state. Select Execute and confirm it succeeds.
Step 4 Test and Validate
Use AWS Systems Manager State Manager to simulate a CloudWatch alarm safety control feature.
- Select State Manager under Node Management in the AWS Systems Manager console.
- Name the Association TestCloudWatchAlarm and select the AWS-PatchInstanceWithRollback runbook.
- Select the instance and a Role that allows Automation to perform actions on your behalf.
- Under specify Schedule, select Rate schedule builder and choose every 30 minutes.
- Choose the LowDiskSpace CloudWatch alarm and the Create Association button.
- The Association will run immediately and monitor the status until it succeeds.
- Now go to the CloudWatch alarm and set the threshold to 100% to force an alarm. The TestCloudWatchAlarm association will not run while the LowDiskAlarm is fired.
Cleanup
Remove the EC2 instance, CloudWatch alarm and State Manager association to prevent unwanted costs. Remove the EC2 instancel How do I delete or terminate my Amazon EC2 resources, remove the CloudWatch alarm Editing or deleting a CloudWatch alarm and remove the State Manager association Deleting an association.
Conclusion
You learned how to create a CloudWatch alarm for low disk space on EC2 instances and then integrate it as a safety control during the execution of AWS Systems Manager automation. This feature provides safety control for automation to run only if desired CloudWatch alarms are not activated. Multi-Region and Multi-Account configurations can use this new feature too. We are excited to see how you will use this new feature and solve business problems. Check AWS Systems Manager Automation to learn more about AWS Systems Manager Automation. Check Writing your own AWS Systems Manager documents for how to write Automation runbooks.
About the authors: