AWS Cloud Operations & Migrations Blog

Tag: Incident Manager

How to Automate Incident Response with PagerDuty and AWS Systems Manager Incident Manager

Incident response is a core operations capability for organizations to develop, and a core element in the AWS Cloud Adoption Framework (AWS CAF). Responding to operations incidents quickly is important to minimize their impacts. Automating incident response helps you scale your capabilities, rapidly reduce the recovery time, and reduce repetitive work by your cloud operations teams. […]

Automate AIOps for your microservices in AWS using Amazon DevOps Guru and AWS Systems Manager Incident Manager

Artificial intelligence operations (AIOps) is the process of using machine learning techniques to solve operational problems. The goal of AIOps is to reduce human intervention in IT operations processes. By using advanced machine learning techniques, you can reduce operational incidents and increase service quality, and AIOps can help you predict incidents before they happen. Amazon […]

Why you should develop a correction of error (COE)

Application reliability is critical. Service interruptions result in a negative customer experience, thereby reducing customer trust and business value. One best practice that we have learned at Amazon, is to have a standard mechanism for post-incident analysis. This lets us analyze a system after an incident in order to avoid reoccurrences in the future. These […]

Creating contacts, escalation plans, and response plans in AWS Systems Manager Incident Manager

Creating contacts, escalation plans, and response plans in AWS Systems Manager Incident Manager

Many of our customers need an effective incident management and response solution to achieve operational excellence and performance efficiency. Transparency between those who are affected by the incident and those who respond to the incident is key to any incident management process. Finding the right team to mitigate the impact of application or workload incidents […]

AWS Systems Manager Incident Manager integration with Amazon CloudWatch Part 2

AWS Systems Manager Incident Manager integration with Amazon CloudWatch

This is the second post in a two-part series about AWS Systems Manager Incident Manager. In the first post, we covered onboarding steps like creating contacts, an escalation plan, and a response plan in Incident Manager. In this post, we discuss the integration between Incident Manager and Amazon CloudWatch and how Incident Manager components manage an […]