AWS Cloud Operations Blog
Automate incident reports from AWS Systems Manager Incident Manager
An effective incident management is foremost for maintaining system reliability and ensuring quick responses to unexpected incidents. Incident Manager, a capability of AWS Systems Manager, helps to mitigate and recover from these incidents by enabling automated responses. In a previous blog with Incident Manager, we talked about setting up escalation mechanisms, creating response plans and automating the remediations with the help of AWS Systems Manager Automation runbooks. To learn more, see Creating contacts, escalation plans, and response plans in AWS Systems Manager Incident Manager.
As part of the incident lifecycle, it is crucial for organizations to gather historical data of their incidents to look back and perform post-incident analysis. Currently, customers can gather and export the information about all these incidents programmatically through APIs using AWS SDKs. Customers often ask us how to automate this report generation process and export these reports regularly to a centralized location.
In this blog , you will be able to generate the Incident Manager reports on schedule and store them to Amazon Simple Storage Service (Amazon S3). These reports help you review and sort incidents based on things like which resources are more involved, frequent problems you’re facing, and how long it takes to fix them. They provide valuable insights for operations engineers, system administrators and IT Managers. Digging into the past data helps you find anomalies in your application or infrastructure.
Prerequisites
For this walkthrough, you need to have the following prerequisites
- An AWS account
- Incident Manager setup with single account or multi-account and multi-Region
Workflow
This solution is enabled by the following services:
- AWS CloudFormation
- Amazon S3
- AWS Identity and Access Management (IAM)
- Automation, a capability of AWS Systems Manager
- State Manager, a capability of AWS Systems Manager
- Incident Manager, a capability of AWS Systems Manager
The CloudFormation template is hosted in a GitHub repository. The CloudFormation deploys the required components used by the solution, including a S3 bucket, Automation runbook, a State Manager association and the necessary IAM permissions. Automation contacts the Incident Manager APIs to extract the required information and uploads to a S3 bucket in a CSV file format.
Every-time you run the solution a new report is generated. Optionally, if you chose to run the solution on schedule, the CloudFormation template is going to create a State Manager association that is going to run on schedule specified as a parameter when running the CloudFormation stack.
The Automation runbook includes the aws:executeScript action which runs a Python script to generate and store the reports. The Python script will invoke the ListIncidentRecords API to fetch all the incidents in the current Region. Then the script iterates through each of the incidents and runs ListRelatedItems API call to fetch all the related items. The collected incidents will be saved in the CSV file and named in the format IMReport-{Current Date}-{Time in UTC}. The CSV file will be then uploaded to a S3 bucket for download. You can further customize this solution to send SNS notifications upon each report and upload to S3. To learn more, see Amazon S3 Event Notifications.
Solution Walkthrough
For a single account or multi-account and multi-Region incident management, incidents will be replicated across all the accounts and ReplicationSet Regions. Hence, you can deploy the CloudFormation template in your desired account and Region to pull the incidents and its related items.
Deploy the Solution
- Download the CloudFormation template
- Navigate to the CloudFormation console in the AWS account where you would like to generate the reports.
- For Create Stack, choose with new resources (standard).
- For Template source, choose Upload a template file. Choose file and select the template you downloaded in step 1.
- Choose Next.
- For Stack name, enter a stack name (such as incident-manager-reporting).
- In the Parameters area, do the following:
- For ‘RunOnSchedule’, Select true to generate reports regularly or select false to run the reports manually. If selected false, you have to execute the Automation runbook manually by referring to Generating OnDemand reports section after deploying this CloudFormation template.
- For ‘AssociationSchedule’ (needed only when RunOnSchedule is set to true), provide the CRON expression. By default, automation runs every Sunday at 02:00 AM UTC. To learn more on supported cron expressions , view State Manager.
- In the Configure stack options page, choose Next.
- Select I acknowledge that AWS CloudFormation might create IAM resources with custom names, then choose Submit.
After the template has deployed, choose Outputs and note the values of the following as shown in Figure 2
- AWSSystemsManagerAutomationExecutionRole
- S3Bucket
- SystemsManagerAutomationDocument
Generating OnDemand reports
*Note- If you set `RunOnSchedule` parameter as true while deploying the CloudFormation template, you can skip this section and jump to Downloading reports from S3 Console section
-
- Navigate to Systems Manager Automation.
- Select Execute Automation.
- Choose Owned by me, then select ‘incident-manager-reporting’. Choose Next.
- Select Simple execution.
- In the Input Parameters area, for AutomationAssumeRole, enter the value of the AWSSystemsManagerAutomationExecutionRole output from the CloudFormation stack as shown in Figure 4. For Amazon S3 bucket, enter the value of the S3Bucket output from the CloudFormation stack (this will be the default value).
- Choose Execute.
The Automation runbook will now collect information about all the incidents from your accounts. Select the Step ID under Executed Steps and note the Amazon S3 bucket and CSV file name in the OutputPayload data as shown in Figure 5.
Downloading reports from S3 Console
Navigate to the S3 console, find the bucket, and download the IM Report CSV. The name includes the date and time (in UTC) the report was generated as shown in Figure 6.
Clean up
To clean up the resources created by CloudFormation:
- Delete the objects generated in the S3 bucket manually
- Open the AWS S3 console and find the S3 bucket generated by CloudFormation
- Select all the objects and click Delete
- To confirm deletion, type permanently delete in the text input field.
- Click ‘Delete Objects’
- Delete the CloudFormation Stack
- Open the AWS CloudFormation console and in the navigation pane, choose Stacks.
- Choose the CloudFormation stack that you created earlier, choose Delete, and choose Delete stack.
Conclusion
In this blog post, we described a solution that uses Incident Manager APIs and AWS Systems Manager Automation and State Manager Association to create a reporting mechanism to store all incidents and its related items regularly on schedule. This solution shows you how to:
- Automate report generation: No more manual work – streamline the process and store reports securely in Amazon S3.
- Uncover valuable trends: Identify frequent issues, vulnerable resources, and resolution times.
- Boost efficiency and security: Proactively address bottlenecks, prevent recurring problems, and optimize your cloud environment for ultimate resilience.
About the authors: