AWS Cloud Operations & Migrations Blog

Tag: Incident Manager

Why you should develop a correction of error (COE)

Application reliability is critical. Service interruptions result in a negative customer experience, thereby reducing customer trust and business value. One best practice that we have learned at Amazon, is to have a standard mechanism for post-incident analysis. This lets us analyze a system after an incident in order to avoid reoccurrences in the future. These […]

Read More
Creating contacts, escalation plans, and response plans in AWS Systems Manager Incident Manager

Creating contacts, escalation plans, and response plans in AWS Systems Manager Incident Manager

Many of our customers need an effective incident management and response solution to achieve operational excellence and performance efficiency. Transparency between those who are affected by the incident and those who respond to the incident is key to any incident management process. Finding the right team to mitigate the impact of application or workload incidents […]

Read More
AWS Systems Manager Incident Manager integration with Amazon CloudWatch Part 2

AWS Systems Manager Incident Manager integration with Amazon CloudWatch

This is the second post in a two-part series about AWS Systems Manager Incident Manager. In the first post, we covered onboarding steps like creating contacts, an escalation plan, and a response plan in Incident Manager. In this post, we discuss the integration between Incident Manager and Amazon CloudWatch and how Incident Manager components manage an […]

Read More