AWS Cloud Operations Blog
Use Amazon EventBridge rules to run AWS Systems Manager automation in response to CloudWatch alarms
Since its launch in 2009, Amazon CloudWatch has become the cloud-native choice for a monitoring and observability service built for DevOps engineers, developers, site reliability engineers (SREs), and IT managers. CloudWatch provides you with data and actionable insights to monitor your applications, respond to system-wide performance changes, optimize resource utilization, and get a unified view of operational health that includes common metrics. Amazon EventBridge complements CloudWatch and provides real-time access to changes in data in AWS services, your own applications, and software as a service (SaaS) applications without writing code. In addition to providing native integration with other AWS services like AWS Systems Manager, EventBridge also integrates with many third-party SaaS platforms, making it a powerful instrument in the observability tool chest for customers.
Customers want to know how to get deeper insights into operating system (OS)-level process and service-level metrics and combine them with EventBridge to trigger auto-remediation at scale through Systems Manager.
In this post, we’ll show how you can use EventBridge rules to run Systems Manager automation at scale across your Amazon Elastic Compute Cloud (Amazon EC2) fleet in response to CloudWatch alarms that monitor OS-level process state change.
Workflow
Complete the steps in this post to create the following workflow:
- An EC2 instance running the Apache HTTP Server and the CloudWatch agent proctstat plugin that monitors the httpd process.
- When the httpd process is stopped, CloudWatch raises an alarm and sends an event to EventBridge.
- EventBridge receives the CloudWatch event that matches the event pattern in the predefined rule. EventBridge sends the event to the specified target (Systems Manager) and triggers the action defined in the rule.
- The executeAwsApi automation action calls the SendCommand API action that includes the EC2 instance ID and the SSM document (runbook) to the SSM Agent running on the EC2 instance.
- SSM Agent executes the automation (runbook) on the EC2 instance to restart the httpd process.
Figure 1 show the architecture and flow of the proposed solution.
Figure 1: Solution architecture
Deployment steps
- Set up an EC2 instance running Amazon Linux 2 with Apache HTTP Server and install and configure the procstat plugin.
- Create an IAM role to execute the EventBridge rule.
- Create a runbook to run executeAwsApi that uses SendCommand with the EC2 instance ID and the runbook to the SSM Agent to start and restart the httpd process.
- Create a CloudWatch alarm to monitor the httpd process state change (for example, from running to stopped).
- Integrate EventBridge with Systems Manager. Create an EventBridge rule to receive the event, trigger the action defined in the rule, and send the event to the target (Systems Manager).
Install and set up the SSM Agent, procstat plugin, and Apache HTTP Server on EC2
The SSM Agent that is required to use Systems Manager Automation is already installed, by default, on Amazon Linux 1 and 2 AMIs. On EC2 instances created from other Linux AMIs , you must install SSM Agent manually. For instructions, see Installing and configuring SSM Agent on EC2 instances for Linux in the Systems Manager User Guide.
The recommended way to install and configure the CloudWatch agent and procstat plugin is to use Systems Manager. For instructions, see the Detecting and remediating process issues on EC2 instances using Amazon CloudWatch and AWS Systems Manager blog post and Installing the CloudWatch agent on EC2 instances using your agent configuration in the CloudWatch User Guide. The process uses the AWS-ConfigureAWSPackage Automation document in a SSM Run Command. The SSM agent must already be installed.
To install Apache HTTP Server on EC2, see Create an EC2 instance and install a web server in the Amazon RDS User Guide.
Create an IAM role to execute the EventBridge rule
To execute the automation, you must attach an AWS Identity and Access Management (IAM) role with the AmazonSSMFullAccess policy to the EC2 instance. The role will be used to configure the EventBridge rule to run the SSM Automation document. The managed policy grants full access to the Systems Manager API and documents. As a best practice, always grant least privilege (that is, grant only the permissions required to perform a task).
We recommend that you create the role using an AWS CloudFormation template. Edit the trust relationships for the AutomationServiceRole to include events.amazonaws.com. See Figure 2.
Figure 2: AutomationServiceRole to execute the EventBridge rule
Create a SSM runbook
An Automation document (now referred to as a runbook) defines the actions that Systems Manager performs on managed instances and other AWS resources when an automation runs. A runbook contains one or more steps that run in sequential order. Each step is built around a single action. Output from one step can be used as input in a later step.
To create a runbook, see Creating a runbook using the Editor in the Systems Manager User Guide. Set the DocumentName
parameter to Monitor_Process_SSM_Document
.
Systems Manager can execute several Automation actions. aws:executeAwsApi calls and runs AWS API operations that, in turn, trigger SendCommand to restart the EC2 httpd process.
To create the Run Command document that will restart the httpd process, in the Systems Manager console, choose Shared Resources, and then choose Documents.
From Create document, choose Command or Session, as shown in Figure 3.
Figure 3: Create the Run Command document
Complete the required fields as shown:
Document Name: Monitor_Process_SSM_Document
Target type: /AWS::EC2::instance
Document type: Command Document
Content: JSON
In the Context field, enter the following JSON and then choose Create document.
{
"schemaVersion": "1.2",
"description": "restart httpd",
"parameters": {},
"runtimeConfig": {
"aws:runShellScript": {
"properties": [
{
"id": "0.aws:runShellScript",
"runCommand": [
"sudo systemctl start httpd",
"echo Process restarted with status $?"
]
}
]
}
}
}
The example runbook looks as follows:
{
"description": "Custom Automation to send SSM command to an instance",
"schemaVersion": "0.3",
"assumeRole": "{{ AutomationAssumeRole }}",
"parameters": {
"AutomationAssumeRole": {
"type": "String",
"description": "(Required) The ARN of the role that allows Automation to perform\nthe actions on your behalf. If no role is specified, Systems Manager Automation\nuses your IAM permissions to run this runbook.",
"default": ""
},
"InstanceId": {
"type": "String",
"description": "(Required) The ID of the EC2 instance.",
"default": ""
}
},
"mainSteps": [
{
"name": "createImage",
"action": "aws:executeAwsApi",
"onFailure": "Abort",
"inputs": {
"Service": "ssm",
"Api": "send_command",
"InstanceIds": [ "{{ InstanceId }}" ],
"DocumentName": "Monitor_Process_SSM_Document"
},
"outputs": [
{
"Name": "Command",
"Selector": "$.Command.CommandId",
"Type": "String"
}
]
}
]
}
Create a CloudWatch alarm
From the left navigation pane of the CloudWatch console, choose Alarms, choose Create Alarm, and then choose Select Metric.
Choose your EC2 instance. For Namespace, use CWAgent
. For Metric name, use procstat_cpu_time
.
Under Conditions, for Threshold type, choose Static. Complete the remaining fields as shown in Figure 4.
Figure 4: Create a static CloudWatch alarm
In Configure actions, under Alarm state trigger, choose In alarm. Under Select an SNS topic, choose to send the alarm notifications to an existing SNS topic. You can choose Create a new topic if you don’t already have one. Under Send a notification to, choose Notify_By_Email, as shown in Figure 5:
Figure 5: Configure CloudWatch alarm actions
For the alarm name, enter Monitor_Process_CW_Alert
and then choose Create alarm.
Integrate EventBridge with Systems Manager
Now integrate EventBridge with Systems Manager to trigger the runbook to send the SSM document to the EC2 instance. For more information, including a sample event from CloudWatch, see Alarm events and EventBridge in the CloudWatch User Guide. For information about how to create a custom event pattern for a CloudWatch event rule, see this AWS Knowledge Center article. You get the event pattern shown here after you’ve created the CloudWatch alarm.
In the Amazon EventBridge console, choose Events, choose Rules, and then choose Create Rule. Create the rule with a custom pattern as shown in Figure 6.
In Event pattern, paste the following:
{
"detail-type": ["CloudWatch Alarm State Change"],
"source": ["aws.cloudwatch"],
"detail": {
"alarmName": ["Monitor_Process_CW_Alert"],
"state": {
"value": ["ALARM"]
},
"previousState": {
"value": ["OK"]
}
}
}
Figure 6: Creating CloudWatch alarm rule with a custom pattern
In Select targets, for Target, choose SSM Automation. For Document, choose Monitor_Process_Automation_Document.
Expand Configure automation parameter(s) and choose Input Transformer. In the first field, enter: {"instance": "$.detail.configuration.metrics[0].metricStat.metric.dimensions.InstanceId"}
In the second field, enter: {"InstanceId":[<instance>]}
Choose Use existing role and then choose AutomationServiceRole.
Figure 7: Using input transformers with Automation
Choose Create. Figure 8 shows the CW_Alarm_To_Trigger_SSM_Runbook
rule.
Figure 8: EventBridge rule
Test the SSM automation
Use SSH to connect to the EC2 instance. Run this command to stop Apache HTTP Server:
$ sudo systemctl stop httpd
Run this command to verify that the server has stopped:
$ sudo systemctl status httpd
Wait a few seconds for the runbook to trigger and start the process again:
$ sudo systemctl status httpd
You’ll see that the httpd server process is running again because the automation was triggered by the EventBridge rule.
Cleanup
To avoid ongoing charges in your account, delete the EC2 instance, CloudWatch alarm, and SSM document.
Conclusion
In this post, we showed how you can use EventBridge rules to run Systems Manager automation at scale in response to CloudWatch alarms. We hope you use the information in this post to add process-level metrics and automation in your organization.