AWS Cloud Operations & Migrations Blog
Tag: Incident Manager
Why you should develop a correction of error (COE)
Application reliability is critical. Service interruptions result in a negative customer experience, thereby reducing customer trust and business value. One best practice that we have learned at Amazon, is to have a standard mechanism for post-incident analysis. This lets us analyze a system after an incident in order to avoid reoccurrences in the future. These […]
Read MoreCreating contacts, escalation plans, and response plans in AWS Systems Manager Incident Manager
Many of our customers need an effective incident management and response solution to achieve operational excellence and performance efficiency. Transparency between those who are affected by the incident and those who respond to the incident is key to any incident management process. Finding the right team to mitigate the impact of application or workload incidents […]
Read MoreAWS Systems Manager Incident Manager integration with Amazon CloudWatch
This is the second post in a two-part series about AWS Systems Manager Incident Manager. In the first post, we covered onboarding steps like creating contacts, an escalation plan, and a response plan in Incident Manager. In this post, we discuss the integration between Incident Manager and Amazon CloudWatch and how Incident Manager components manage an […]
Read More