AWS Cloud Operations & Migrations Blog

Use AWS Systems Manager Automation runbooks to resolve operational tasks

OpsCenter provides a central location where operations engineers and IT professionals can view, investigate, and resolve operational work items (OpsItems) related to AWS resources.

AWS Systems Manager Automation simplifies common maintenance and deployment tasks for Amazon Elastic Compute Cloud (Amazon EC2) instances and other AWS resources. You can use this capability to build automations to configure and manage instances and AWS resources. You can also create custom runbooks or use predefined runbooks maintained by AWS.

AWS Systems Manager Explorer is a customizable operations dashboard that reports information about your AWS resources. Explorer displays an aggregated view of operations data (OpsData) across your your AWS accounts and aegions. Explorer provides context into how operational issues are distributed, trend over time and vary by category.

In the first post in this series, Aggregate operational tasks with AWS Systems Manager Explorer and OpsCenter, we showed you how to:

  • Set up Explorer and OpsCenter with Systems Manager Quick Setup.
  • Create OpsItems from an Amazon CloudWatch alarm.
  • Create OpsItems manually through OpsCenter.

In this blog post, we show you how to use Systems Manager Automation documents (runbooks) to resolve your operational tasks from OpsItems.

The following diagram shows the architecture of the solution.

 

Diagram shows the interaction between OpsCenter, OpsItems, Automation workflows, remediation, and the Systems Manager Explorer dashboard.

Figure 1: Automate operational work items with AWS Systems Manager Explorer

Solution overview

In this post, we’ll show you how to perform the following steps:

  • Set up a service role in AWS Identity and Access Management (IAM) to access Automation document workflows to remediate your OpsItems.
  • Configure Automation runbooks to remediate and resolve OpsItems.

Prerequisites

Complete the steps in the first blog post, Aggregate operational tasks with AWS Systems Manager Explorer and OpsCenter.

After you complete those steps, you will have three OpsItems, as shown here. Two were created manually. One was created automatically through a CloudWatch alarm.

OpsCenter dashboard shows there are 3 open OpsItems. 0 of these OpsItems are in progress. The sources for the OpsItems are CloudWatch Alarm, RDS, and EC2.

Figure 2: Open OpsItems in OpsCenter dashboard

Set up Automation

AWS provides a library of Automation documents that you can choose for a variety of operational tasks. You can build, run, and share automations with others on your team or inside your organization.

Figure 3 shows the Automation document categories for the automation of your operational tasks.

Choose document page displays tabs for Owned by Amazon (in this example, selected), Owned by me, Shared with me, and All documents. The document categories include Remediation, Patching, Security, Instance management, Data backup, and more.

Figure 3: Automation documents

If your IAM user account, group, or role is assigned administrator permissions, then you have access to Systems Manager Automation. If you don’t have administrator permissions, then an administrator must give you permission by assigning the AmazonSSMFullAccess managed policy, or a policy that provides comparable permissions, to your IAM account, group, or role. The AmazonSSMFullAccess policy grants permissions to Systems Manager actions, but some runbooks require permissions to other services. For example, the AWS-ReleaseElasticIP runbook  requires IAM permissions for ec2:ReleaseAddress. Review the actions taken in a runbook to ensure your IAM user account, group, or role is assigned the permissions required to perform those actions.

Automations can be initiated under the context of a service role (or assume role). This allows the service to perform actions on your behalf. If you do not specify an assume role, Automation uses the context of the user who invoked the automation. For information about creating a service role, see Use AWS CloudFormation to configure a service role for Automation or Use IAM to configure roles for Automation in the AWS Systems Manager User Guide.

In this post, we use an AWS CloudFormation template to set up an Automation service role.

  1. Download the AWS-SystemsManager-AutomationServiceRole.zip This folder includes the AWS-SystemsManager-AutomationServiceRole.yaml CloudFormation template file.
  2. Sign in to the AWS CloudFormation  console and choose Create Stack.
  3. In Specify template, choose Upload a template file.
  4. Choose AWS-SystemsManager-AutomationServiceRole.yaml and then choose Next.

Under Specify template, the Upload a template file option is selected. Under Upload a template file, the AWS-SystemsManager-AutomationServiceRole.yaml file is displayed.

Figure 4: Creating the Automation service role

  1. For the stack name, enter automation-role.
  2. In Configure stack options, leave the defaults, and then choose Next.
  3. On the Review page, select I acknowledge that AWS CloudFormation might create IAM resources with custom names to create the CloudFormation stack.

The Resources tab of the automation-role page is selected. It displays a logical ID (in this example, AutomationServiceRole), physical ID (AutomationServiceRole), type (AWS::IAM::Role), and status (CREATE_COMPLETE).

Figure 5:  Automation service role

  1. To get the ARN for the automation service role, choose the AutomationServiceRole Your ARN will be similar to arn:aws:iam::<AccountID>:role/AutomationServiceRole. where AccountID is your AWS Account ID.
  2. Some runbooks require permissions to other services. In Figure 6, we add EC2, S3, and DynamoDB full access inline IAM policies to the Automation service role. If an administrator performs operational tasks using Automation, you can keep these full access IAM policies, but always check the Automation documents and provide only required policies to the Automation service role.

The Summary page for AutomationServiceRole includes tabs for Permissions, Trust relationships, Tags, Access Advisor, and Revoke sessions. The Permissions tab is selected and there are five AWS managed policies in the list.

Figure 6:  Automation service role inline IAM policies

You can now use the Automation service role ARN in your runbooks. For information about creating your own Automation runbooks, see the New Automation Features in AWS Systems Manager blog post.

Depending on your use case, you can run automations by using different security models or target and rate controls. You can run automation with approvers or you can run a manual automation.

In this post, we will show you how to remediate OpsItems through Automation runbooks.

Remediate OpsItems with Automation documents

  1. From the left navigation pane in the AWS Systems Manager console, choose OpsCenter.

OpsCenter page displays three open OpsItems. One has a source of RDS. One has a source of EC2. One has a source of CloudWatch Alarm.

Figure 7:  Open OpsItems

  1. Choose the OpsItem ID for the CloudWatch alarm to initiate the remediation through Automation runbooks.
  2. On the details page for the alarm OpsItem, choose the AWS-ResizeInstance runbook, and then choose Execute.

The alarm OpsItem details page displays sections for Related resources, Automation executions in the last 30 days, and Runbooks. The AWS-ResizeInstance runbook appears in the Runbooks list.

Figure 8:  CloudWatch alarm OpsItem details page

  1. To resolve the high CPU issue, change the instance type from t2.micro to t2.large. You can choose another size, as appropriate for your workload.
  2. In Run automation: AWS-ResizeInstance, enter the following values, and then choose Execute.
    • For InstanceId, enter your web-1 EC2 instance ID.
    • For InstanceType, choose large.
    • For AutomationAssumeRole, choose AutomationServiceRole from the dropdown. This is the role you created as a part of Automation setup.

Run automation: AWS-ResizeInstance displays fields for instance ID, instance type, and AutomationAssumeRole.

Figure 9: Run automation: AWS-ResizeInstance

  1. On the Automation executions page, you can confirm the runbook execution.

Automation executions displays the AWS-ResizeInstance document, its execution ID, status (Success), start time, end time, and executed by.

Figure 10: Automation executions page

  1. To remediate the RDS OpsItem, in the OpsItems list (Figure 7), choose the OpsItem ID for Create RDS snapshot for mysql database. On the details page, choose AWS-CreateRdsSnapshot, and then choose Execute.

The RDS OpsItem details page displays sections for Related resources, Automation executions in the last 30 days, and Runbooks. The AWS-CreateRdsSnapshot runbook appears in the Runbooks list.

Figure 11: RDS OpsItem details page

  1. In Run automation: AWS-CreateRdsSnapshot, enter the following values, and then choose Execute.
    • For DBInstanceIdentifier, enter your RDS instance ID.
    • For AutomationAssumeRole, choose AutomationServiceRole from the dropdown. This is the role you created as part of Automation setup.

Run automation: AWS-CreateRdsSnapshot displays fields for DB instance ID, DB snapshot ID, instance tags, snapshot tags, and AutomationAssumeRole.

Figure 12: Run automation: AWS-CreateRdsSnapshot

  1. To remediate the EC2 OpsItem, in the OpsItems list (Figure 7), choose the OpsItem ID for Create an image for web-1 EC2 instance. On the details page, choose AWS-CreateImage, and then choose Execute.

The EC2 OpsItem details page displays sections for Related resources, Automation executions in the last 30 days, and Runbooks. The AWS-CreateImage runbook appears in the Runbooks list.

Figure 13: EC2 OpsItem details page

  1. In Run automation: AWS-CreateImage, enter the following values, and then choose Execute.
    • For InstanceId, enter your EC2 instance ID.
    • For AutomationAssumeRole, choose AutomationServiceRole from the dropdown. This is the role you created as part of Automation setup.

Run automation: AWS-CreateImage displays fields for instance ID, no reboot, and AutomationAssumeRole.

Figure 14: Run automation: AWS-CreateImage

You have now successfully remediated three OpsItems without writing any code.

Conclusion

In this blog post, we’ve shown you how to use Systems Manager Automation runbooks to resolve and remediate your OpsItems  via the console . With the information in this post, you can create your own OpsItems and remediate your operational tasks. For more information about AWS Systems Manager features, see the AWS Systems Manager User Guide.

For information about how to remediate noncompliant AWS Config rules, see the Remediate noncompliant AWS Config rules with AWS Systems Manager Automation runbooks blog post.

About the authors

Raghavarao Sodabathina

Raghavarao Sodabathina

Raghavarao Sodabathina is an Enterprise Solutions Architect at AWS. His areas of focus are data analytics, AI/ML, and the serverless platform. He engages with customers to create innovative solutions that address customer business problems and accelerate the adoption of AWS services. In his spare time, Raghavarao enjoys spending time with his family, reading books, and watching movies.

Dustin Liukkonen

Dustin Liukkonen

Dustin Liukkonen is an Enterprise Solutions Architect at AWS. He helps enterprise customers achieve their business goals by providing guidance and support as they build solutions using AWS. Outside of work, he enjoys spending time with his family, day trips to the coast for some seafood, and working outside.