AWS Cloud Operations & Migrations Blog

Automate incident reports from AWS Systems Manager Incident Manager

An effective incident management is foremost for maintaining system reliability and ensuring quick responses to unexpected incidents. Incident Manager, a capability of AWS Systems Manager, helps to mitigate and recover from these incidents by enabling automated responses. In a previous blog with Incident Manager, we talked about setting up escalation mechanisms, creating response plans and automating the remediations with the help of AWS Systems Manager Automation runbooks. To learn more, see Creating contacts, escalation plans, and response plans in AWS Systems Manager Incident Manager.

As part of the incident lifecycle, it is crucial for organizations to gather historical data of their incidents to look back and perform post-incident analysis. Currently, customers can gather and export the information about all these incidents programmatically through APIs using AWS SDKs. Customers often ask us how to automate this report generation process and export these reports regularly to a centralized location.

In this blog , you will be able to generate the Incident Manager reports on schedule and store them to Amazon Simple Storage Service (Amazon S3). These reports help you review and sort incidents based on things like which resources are more involved, frequent problems you’re facing, and how long it takes to fix them. They provide valuable insights for operations engineers, system administrators and IT Managers. Digging into the past data helps you find anomalies in your application or infrastructure.

Prerequisites

For this walkthrough, you need to have the following prerequisites

  1. An AWS account
  2. Incident Manager setup with single account or multi-account and multi-Region

Workflow

This solution is enabled by the following services:

The user uses CloudFormation to deploy components required by the solution, including Amazon S3, Automation, State Manager and IAM. Automation contacts the Incident Manager APIs to extract the required information and upload to Amazon S3 as a CSV file

Figure1: Incident Manager report generation solution

The CloudFormation template is hosted in a GitHub repository. The CloudFormation deploys the required components used by the solution, including a S3 bucket, Automation runbook, a State Manager association and the necessary IAM permissions. Automation contacts the Incident Manager APIs to extract the required information and uploads to a  S3 bucket in a CSV file format.

Every-time you run the solution a new report is generated. Optionally, if you chose to run the solution on schedule, the CloudFormation template is going to create a State Manager association that is going to run on schedule specified as a parameter when running the CloudFormation stack.

The Automation runbook includes the aws:executeScript action which runs a Python script to generate and store the reports. The Python script will invoke the ListIncidentRecords API to fetch all the incidents in the current Region. Then the script iterates through each of the incidents and runs ListRelatedItems API call to fetch all the related items. The collected incidents will be saved in the CSV file and named in the format IMReport-{Current Date}-{Time in UTC}. The CSV file will be then uploaded to a S3 bucket for download. You can further customize this solution to send SNS notifications upon each report and upload to S3. To learn more, see Amazon S3 Event Notifications.

Solution Walkthrough

For a single account or multi-account and multi-Region incident management, incidents will be replicated across all the accounts and ReplicationSet Regions. Hence, you can deploy the CloudFormation template in your desired account and Region to pull the incidents and its related items.

Deploy the Solution

  1. Download the CloudFormation template
  2. Navigate to the CloudFormation console in the AWS account where you would like to generate the reports.
  3. For Create Stack, choose with new resources (standard).
  4. For Template source, choose Upload a template file. Choose file and select the template you downloaded in step 1.
  5. Choose Next.
  6. For Stack name, enter a stack name (such as incident-manager-reporting).
  7. In the Parameters area, do the following:
    1. For ‘RunOnSchedule’, Select true to generate reports regularly or select false to run the reports manually. If selected false, you have to execute the Automation runbook manually by referring to Generating OnDemand reports section after deploying this CloudFormation template.
    2. For ‘AssociationSchedule’ (needed only when RunOnSchedule is set to true), provide the CRON expression. By default, automation runs every Sunday at 02:00 AM UTC. To learn more on supported cron expressions , view State Manager.
  8. In the Configure stack options page, choose Next.
  9. Select I acknowledge that AWS CloudFormation might create IAM resources with custom names, then choose Submit.

After the template has deployed, choose Outputs and note the values of the following as shown in Figure 2

  • AWSSystemsManagerAutomationExecutionRole
  • S3Bucket
  • SystemsManagerAutomationDocument
CloudFormation output resources displays the values of Automation assume role, S3Bucket and Automation runbook name

Figure 2: CloudFormation output resources

Generating OnDemand reports

 *Note- If you set `RunOnSchedule` parameter as true while deploying the CloudFormation template, you can skip this section and jump to Downloading reports from S3 Console section 

    1. Navigate to Systems Manager Automation.
    2. Select Execute Automation.
    3. Choose Owned by me, then select ‘incident-manager-reporting’. Choose Next.

     

    This image describes on navigating to automation console and selecting the incident-manager-reporting document for execution

    Figure 3: Automation Runbook

     

    1. Select Simple execution.
    2. In the Input Parameters area, for AutomationAssumeRole, enter the value of the AWSSystemsManagerAutomationExecutionRole output from the CloudFormation stack as shown in Figure 4. For Amazon S3 bucket, enter the value of the S3Bucket output from the CloudFormation stack (this will be the default value).

     

    Input parameter section displays an IAM Role field and S3BucketName where values from CloudFormation output resources are entered

    Figure 4: Input Automation Assume Role and S3 Bucket

     

    1. Choose Execute.

    The Automation runbook  will now collect information about all the incidents from your accounts. Select the Step ID under Executed Steps and note the Amazon S3 bucket and CSV file name in the OutputPayload data as shown in Figure 5.

    This image displays the output payload parameters of the automation execution. Output payload includes s3 bucket name, s3 object name and status of the automation.

    Figure 5: Systems Manager Automation execution output

    Downloading reports from S3 Console

    Navigate to the S3 console, find the bucket, and download the IM Report CSV. The name includes the date and time (in UTC) the report was generated as shown in Figure 6.

    This image shows CSV report file in the Amazon S3 and download option

    Figure 6: Download the IMReport CSV report file

     

    This image shows the sample Incident Manager report in CSV format and various columns that are present in the report

    Figure 7: Example of Incident Manager report CSV file

    Clean up

    To clean up the resources created by CloudFormation:

    1. Delete the objects generated in the S3 bucket manually
      1. Open the AWS S3 console and find the S3 bucket generated by CloudFormation
      2. Select all the objects and click Delete
      3. To confirm deletion, type permanently delete in the text input field.
      4. Click ‘Delete Objects’
    2. Delete the CloudFormation Stack
      1. Open the AWS CloudFormation console and in the navigation pane, choose Stacks.
      2. Choose the CloudFormation stack that you created earlier, choose Delete, and choose Delete stack.

    Conclusion

    In this blog post, we described a solution that uses Incident Manager APIs and AWS Systems Manager Automation and State Manager Association to create a reporting mechanism to store all incidents and its related items regularly on schedule.  This solution shows you how to:

    • Automate report generation: No more manual work – streamline the process and store reports securely in Amazon S3.
    • Uncover valuable trends: Identify frequent issues, vulnerable resources, and resolution times.
    • Boost efficiency and security: Proactively address bottlenecks, prevent recurring problems, and optimize your cloud environment for ultimate resilience.

    About the authors:

    Ali Alzand

    Ali is a Microsoft Specialist Solutions Architect at Amazon Web Services. Ali works with global customers, helping them migrate, modernize, and optimize their Microsoft Workloads for the AWS cloud. He is specializes in AWS Systems Manager, Amazon EC2 Windows, and PowerShell. Outside of work, Ali enjoys barbecuing, outdoor activities, and trying all kinds of food.

    Raviteja Sunkavalli

    Raviteja Sunkavalli is a Senior Cloud Support Engineer at Amazon Web Services. Ravi focuses on supporting global customers in cloud migration and streamlining their centralized operations. Ravi specializes in AWS Systems Manager and EC2 Windows services. Besides technology, Ravi enjoys playing cricket and trying new recipes.