AWS Cloud Operations Blog

Avoid patching failures due to low disk space with AWS Systems Manager Automation and CloudWatch alarms.

Every organization has to comply with keeping their fleet updated on patching and ensure that business and workloads are not affected due to patching. One of the challenges for the operations teams is to patch at scale without affecting production software. The most common reasons workloads patching fails are insufficient disk space, a spike in CPU usage, and high memory utilization. To avoid failures, you need to monitor the workloads while patching. You can achieve this with the combination of AWS Systems Manager Automation and Amazon CloudWatch alarms. Check this AWS Systems Manager adds CloudWatch Alarms to control tasks whats new post, which allows automation to stop when a CloudWatch alarm is activated. Multi-Region and Multi-Account configurations can use this new feature too.

In this post, you’ll learn how to use a CloudWatch alarm to check for available disk space before AWS Systems Manager Automation is executed on managed instances in your AWS account. If the CloudWatch alarm is activated, patching automation will not run, avoiding a possible full disk error state. I’ll demonstrate how to configure a CloudWatch alarm to monitor available disk space on the root volume. Then, you’ll reference the alarm in AWS System Manager Automation as a safety measure to prevent the execution of the automation when that CloudWatch alarm is active.

AWS Systems Manager Automation is a service that helps customers build automated solutions to deploy, configure, and manage AWS resources at scale. There are over 300+ pre-defined AWS automation runbooks and customers can create custom Automation runbooks. You can access these runbooks in the Systems Manager’s Automation console. Systems Manager Automation runbook reference is the list of predefined runbooks.

Amazon CloudWatch is a monitoring and observability service that allows customers to monitor their applications and cloud resources on AWS. CloudWatch alarms watches a metric over a specified period and performs one or more actions based on metric thresholds.

Solution overview

Here are the steps that I follow in the post.

  1. Step 1: Launch an Amazon Elastic Compute Cloud (Amazon EC2) instance with the Amazon CloudWatch Agent
  2. Step 2: Configure CloudWatch alram to monitor free disk space percentage
  3. Step 3: Adding CloudWatch alarm as safety control and patching from the Automation console
  4. Step 4: Test and Validate

Step 1: Launching an EC2 instance

Launch an EC2 instance in your AWS account. This post uses Amazon Linux 2 with a small instance (t4.small). You will encounter costs until it is terminated.  https://aws.amazon.com/premiumsupport/knowledge-center/delete-terminate-ec2/ .

Step 2: Configure the CloudWatch alarm to monitor the disk usagef

You first configure a CloudWatch alarm to monitor disk space.

  1. Install the CloudWatch agent on the EC2 instance. The CloudWatch agent is available as a package in Amazon Linux 2. You can install the package by entering the following command.
sudo yum install amazon-cloudwatch-agent

For Windows

Invoke-WebRequest -Uri https://s3.amazonaws.com/amazoncloudwatch-agent/windows/amd64/latest/amazon-cloudwatch-agent.msi -OutFile amazon-cloudwatch-agent.msi
msiexec /i amazon-cloudwatch-agent.msi
msiexec /i amazon-cloudwatch-agent.msi

You can also install the CloudWatch agent using AWS Systems Manager Installing the CloudWatch agent using AWS Systems Manager.

  1. You must also ensure that the IAM role attached to the instance has the CloudWatchAgentServerPolicy managed policy. See Create IAM roles to use with the CloudWatch agent on Amazon EC2 instances, for more information.
  2. Once installed, create a configuration file manually or by using the wizard.
  3. Start the CloudWatch agent using the command line.

For Linux

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-
agent-ctl -a fetch-config -m ec2 -s -c file:configuration-file-
path

For Windows

& "C:\Program Files\Amazon\AmazonCloudWatchAgent\amazon-
cloudwatch-agent-ctl.ps1" -a fetch-config -m ec2 -s -c 
file:configuration-file-path
  1. Create a CloudWatch alarm LowDiskSpace for disk free space less than or equal to 10% (or a threshold for your use case).

Step 3: Adding CloudWatch alarm as an Automation safety control

  1. In the AWS Systems Manager console, select Automation under Change Management.
  2. Click the Execute Automation button and select patching from Document categories under the left-hand navigation.
  3. Select the AWS-PatchInstanceWithRollback Automation runbook and select Next.
  4. Choose Simple execution. You can also use the CloudWatch alarm safety control with other execution types.
Figure 1: Select Simple Execution on Execute automation runbook

Figure 1: Select Simple Execution on Execute automation runbook

  1. In the input parameters, select your instance(s).
  2. In the CloudWatch alarm section, select the LowDiskSpace alarm you created in Step 2.5.
Figure 2: Select LowDiskSpace alarm from the Alarm name dropdown list

Figure 2. Select LowDiskSpace alarm from the Alarm name dropdown list

  1. The CloudWatch alarm is not selectable if it’s in the alarm state. Select Execute and confirm it succeeds.

Step 4 Test and Validate

Use AWS Systems Manager State Manager to simulate a CloudWatch alarm safety control feature.

  1. Select State Manager under Node Management in the AWS Systems Manager console.
  2. Name the Association TestCloudWatchAlarm and select the AWS-PatchInstanceWithRollback runbook.
  3. Select the instance and a Role that allows Automation to perform actions on your behalf.
  4. Under specify Schedule, select Rate schedule builder and choose every 30 minutes.
  5. Choose the LowDiskSpace CloudWatch alarm and the Create Association button.
  6. The Association will run immediately and monitor the status until it succeeds.
  7. Now go to the CloudWatch alarm and set the threshold to 100% to force an alarm. The TestCloudWatchAlarm association will not run while the LowDiskAlarm is fired.
Figure 3. Automation skips execution due to an active alarm, as illustrated in Execution history.

Figure 3. Automation skips execution due to an active alarm, as illustrated in Execution history.

Cleanup

Remove the EC2 instance, CloudWatch alarm and State Manager association to prevent unwanted costs. Remove the EC2 instancel How do I delete or terminate my Amazon EC2 resources, remove the CloudWatch alarm Editing or deleting a CloudWatch alarm and remove the State Manager association Deleting an association.

Conclusion

You learned how to create a CloudWatch alarm for low disk space on EC2 instances and then integrate it as a safety control during the execution of AWS Systems Manager automation. This feature provides safety control for automation to run only if desired CloudWatch alarms are not activated.  Multi-Region and Multi-Account configurations can use this new feature too. We are excited to see how you will use this new feature and solve business problems. Check AWS Systems Manager Automation to learn more about AWS Systems Manager Automation. Check Writing your own AWS Systems Manager documents for how to write Automation runbooks.

About the authors:

Eric Logeson

Eric Logeson is a Senior Solutions Architect with the World-Wide Public Sector Federal Civilian Team. Eric specializes in enterprise management, cloud operations, and governance.

Yagya Vir Singh

Yagya Vir Singh is a Senior Technical Account Manager based in Nashville, Tennessee. He is passionate about AWS technologies and loves to help customers achieve their goals. Outside of the office, he loves to be with his friends and family and spend time outdoors.