A New Integration for CloudWatch Alarms and OpsCenter
Over a year ago, I wrote about the Launch of a feature in AWS Systems Manager called OpsCenter, which allows customers to aggregate issues, events, and alerts into one place and make it easier for operations engineers and IT professionals to investigate and remediate problems. Today, I get to tell you about a new integration between this feature and Amazon CloudWatch Alarms.
When a CloudWatch Alarm enters an alarm state, you can now automatically create an operational work item (OpsItem) inside of Systems Manager OpsCenter.
For example, you can configure an alarm to automatically create an OpsItem if CPU Utilization of your EC2 instance is greater than 75%. The item will include all the information needed for engineers to fix the problem giving your team the tools they need to be more productive and to speed up investigations of issues.
You can also combine multiple metric alarms together; for example, you can create a composite alarm that will only trigger if both CPU Utilization is greater than 75%, and your load balancer latency exceeds 100ms. In this way, you can ignore those instances where CPU utilization has increased, yet your load balancer is still responsive.
To show you how this new integration works, I will create an alarm that triggers the creation of an OpsItem when the alarm gets raised. To start, I head over to the CloudWatch Alarms console.
Raising the Alarm
I create a new Alarm by clicking on the Create alarm button in the console.
I click on the Select metric button, so I can select a metric for CloudWatch to monitor.
I select the instance that I want to monitor and the metric, which is CPUUtilization, and then click on the Select metric button.
In the Specify metric and conditions screen, I select a Threshold type of Static and configure things so that if the CPUUtilization goes above 75, the State will change to Alarm.
Creating an OpsItem
Now I will configure the actions for the alarm. I click the Remove button in the notification section; this deletes the default action. I then scroll down to the Systems Manager OpsCenter action section and press the button called Add Systems Manager OpsCenter action.
I select Medium as the the severity for the OpsItem. Even though the category is optional, I choose to select Performance. You might notice that unlike notifications, the integration will only trigger when the alarm is in an Alarm state; you cannot create an OpsItem for the Ok or Insufficient conditions. I click Next to create the action.
Lastly, I give this alarm a name and a description.
In the next screen, I review all the alarm settings. I am happy with what I have set up, and so I click the Create button.
The alarm is now active, and the system is monitoring the chosen metric.
For this demo, I now run a CPU Stress test on my EC2 instance; I expect to max out the CPU and trigger my newly created alarm.
After a few minutes, I check the CloudWatch Alarm console and confirm that my Alarm is now in the Alarm state.
Viewing the OpsItem
The new integration will trigger the creation of an OpsItem, so when I go to my Systems Manager OpsCenter console, I see a newly created OpsItem.
I drill into the OpsItem and can see the detail. I can view information about the CPU Utilization when the Alarm triggered, suggested runbooks to resolve the issue and the related resources.
All of the important information required to resolve the issue is located in the OpsItem, for example, if I click on the Resource ARN for the alarm in the Related resources section, I get to see relevant alarm information including a graph of the CPUUtilization without leaving OpsCenter.
Similarly, if I click on the Resource ARN for the EC2 instance, relevant information about that resource is displayed to me without leaving OpsCenter.
In the runbooks section I am provided with a list of suggested runbooks that may resolve the issue automatically. In the real world, I might have some custom runbooks to resolve common issues in my system, but I’m going to perform that age-old IT trick of turning it off and on again by running the AWS-RestartEC2Instance runbook directly from the OpsItem.
Hopefully, this demo has demonstrated that this new integration can make engineers more productive by ensuring issues get raised quickly, and the critical investigation data is available in one place.
Good to Know
Systems Manager OpsCenter action works in parallel with existing notifications. So you do not have to choose one or the other, you can continue sending notifications via SNS for example allowing you to continue using your existing support mechanisms.
OpsCenter dedupes the alarm events. This avoids a “flapping issue” where an Alarm going in and out of Alarm state could potentially create multiple OpsItems.
This new integration between AWS Systems Manager OpsCenter and Amazon CloudWatch Alarms is available in all regions where Systems Manager is offered. To get started, head over to the CloudWatch Alarms section of the AWS management console and attach your first Systems Manager OpsCenter action, you can also check out the documentation for more specific details on how the integration works.
Happy Alarming— Martin