AWS Cloud Operations & Migrations Blog

Onica demonstrates uses for new AWS Systems Manager Automation actions

AWS Partner Guest Post

By Eric Miller, VP of Solutions Development for Onica, a Premier Tier APN Consulting Partner

As an AWS DevOps Competency Partner, Onica helps our customers across a wide variety of challenging automation problems. One very important suite of tools in our AWS toolbox is AWS Systems Manager. AWS Systems Manager simplifies resource and application management for our customer projects, while making it easier to operate AWS infrastructure securely and reliably at scale. Systems Manager offers many benefits for AWS customers, including resource grouping, automated instance maintenance, and even managing on-premises servers and virtual machines. Leveraging these capabilities solves many business problems for our customers, such as:

  • Reducing the time to detect problems
  • Automating the actions in response to problems, which can reduce incident resolution to seconds
  • Improving visibility and control of AWS infrastructure
  • Automating the maintenance of security and compliance standards

One of the most useful features of the AWS Systems Manager suite of tools is AWS Systems Manager Automation. This is an AWS-hosted service that uses automation jobs to simplify common instance and system maintenance deployment tasks. Until very recently, Systems Manager had a list of 15 actions available for use in Automation documents. Today, the AWS Systems Manager Automation team has announced three new officially supported Automation actions. These new actions greatly expand what customers can do with Systems Manager Automation and will allow us to rethink the limits on the ways we can use Automation to solve customer business problems. These three new actions are:

executeAwsApi

Potentially the most powerful of the three new features being announced, the new executeAwsApi action gives customers who are using Automation first-class access to nearly every single API call in the AWS ecosystem from inside their Automation documents. We will unpack just how much functionality this exposes in some examples later in the article.

waitForAwsResourceProperty

This enables an Automation document step to wait until a specified property of an AWS resource meets a certain criterion.

assertAwsResourceProperty

This new feature is a bit more subtle, as it is used in conjunction with other Automation document properties that we use to develop Dynamic Automation Workflows. The assertAwsResourceProperty action can be used in conjunction with a step’s onFailure, and/or isCritical values to build complex workflows based on pass/fail type assertion logic.

Solving customer business problems with the new Automation actions

To illustrate the benefits of these new Automation actions, we’ll examine a few practical, real world examples that are representative of the kinds of customer problems we at Onica see working in the field with AWS customers. This gives us a chance to explore the power that these new Automation actions expose, by demonstrating some use cases of where and how these new actions might be used.

Scenario one: Automate the distribution of golden images across teams

In this first example, consider a customer scenario where:

  •  Dev teams leverage continuous integration/continuous delivery (CI/CD) pipelines to automate the deployment of infrastructure artifacts such as AWS CloudFormation templates and (most importantly, in this case) Systems Manager Automation documents.
  •  Security team maintains ‘golden image’ Amazon Machine Images (AMIs) with validated baseline security configurations to be consumed by the organization’s downstream deployment pipelines.
  •  Ops teams are responsible for ensuring the development teams’ deployment pipelines build their application AMIs using the base golden images provided by the security team.
  •  Problem: When the security team releases a new golden image AMI, the AMI IDs are being updated manually by the ops team in the dev teams’ CI/CD pipeline configurations.

To address this problem, customers can use a combination of Systems Manager Parameter Store and Systems Manager Automation, with the new executeAwsApi action. The new Automation actions enable the following solution:

  • A step is added to the automation used by the security team to build the golden images. This step automatically publishes each new golden image AMI ID as a value in Parameter Store, as soon as the image is successfully built and available.
  • Automation is added to the CI/CD pipelines used by the dev teams. The automation document is kept in source control and is processed each time the build is executed. The document uses the new executeAwsApi action to automatically pull the latest golden image AMI ID from Parameter Store and uses that as a base for downstream deployment processes. The following simplified code snippet demonstrates this:
---
description: automate instance deployment using ami-id from param store
schemaVersion: '0.3'
mainSteps:
- name: getGoldenImageId
  action: aws:executeAwsApi
  inputs:
    Service: ssm
    Api: GetParameter
    Name: GoldenImageId
  outputs:
  - Name: Value
    Selector: "$.Parameter.Value"
    Type: String

- name: launch_ec2_instance
  action: aws:executeAwsApi
  inputs:
    Service: ec2
    Api: RunInstances
    ImageId: "{{getGoldenImageId.Value}}"
    MaxCount: 1
    MinCount: 1

Let’s quickly step through what is happening in this example:

  • The executeAwsApi action is being used to call the AWS ssm:GetParameter API, with the single argument “Name: GoldenImageId.”
  • The previous step will grab the value of the parameter named getGoldenImageId and store it in a step output string called Value that can be referenced by other steps in the document.
  • The executeAwsApi action is being used to call the AWS ec2:RunInstances API, with the arguments “ImageID: ami-abc1234.” This is the actual ami-id of the golden image that the security team published, that was obtained in the previous step, and referenced by “{{getGoldenImageId.Value}}”. (Note: we are using the executeAwsApi action here to demonstrate how to reference outputs from previous steps. The runInstances Automation action could also be used here.)

The customer outcome in this case is that the time-consuming, manual, and error-prone practice of manually updating the AMI IDs in the developer’s pipelines been replaced. The new Automation process provides the customer an end-to-end automated process with zero touch.

Scenario two: Automate volume snapshots by tag and instance state

Next, we’ll consider a customer scenario where an operations engineer has been tasked with the automation of creating snapshots of volumes attached to Amazon EC2 instances. In this case, many of the volumes are highly transactional, containing data such as database tables and transaction logs, webserver access logs, or other very I/O intensive volumes. In most cases, creating a snapshot of instance volumes while running is not problematic. In this case, the decision has been made to create snapshots only when the volumes are in a stopped state. Currently, our engineer is addressing this with the following process:

  • Maintaining a manual list of DB instances with volumes that need regular snapshots.
  • On a maintenance window schedule, manually shutting down the instances, and waiting until they are in the ‘stopped’ state.
  • Manually creating snapshots of the desired volumes.

For the sake of brevity and clarity, the following example assumes that a previous automation step produced the output value InstanceIDs as a StringList type, using any EC2 instance tagged db_snapshot: true.

---
description: wait for db instance state to be ‘Stopped’
schemaVersion: '0.3'
mainSteps:

- name: waitStep
  action: aws:waitForAwsResourceProperty
  timeoutSeconds: 300
  inputs:
    Service: ec2
    Api: DescribeInstanceStatus
    InstanceIds: "{{PreviousStep.InstanceIDs}}"
    PropertySelector: "$.Instances.InstanceState.Name"
    DesiredValues:
    - Stopped

Using the previous Automation snippet, which leverages the new waitForAwsResourceProperty action, our engineer can automate the process of creating snapshots of his volumes. His new workflow now looks like this:

  • Configure the same maintenance window schedule using Amazon CloudWatch scheduled events, as described in the documentation here.
  • Configure the scheduled event to trigger an automation task that uses logic similar to the example to:
    • Stop all the instances tagged db_snapshot: true.
    • Use waitForAwsResourceProperty to check until all the desired instances are in the ‘stopped’ state.
    • Create the desired snapshots using aws:createImage, or aws:executeAwsApi.
    • Use waitForAwsResourceProperty again to check until all the images/snapshots are complete.
    • Start the instances again using aws:executeAwsApi.
    • Use waitForAwsResourceProperty again to check until all the instances are running.
    • Alert and notify if any instances fail to come back in service within a timeout threshold.

Again, the outcome provided by Systems Manager Automation for this customer is that a tedious and error-prone manual process has been replaced by first-class automation, end-to-end. Our engineer can now re-direct the time previously spent on the manual process, into solving other problems for the business.

Scenario three: Customizing failure actions

Now we’ll build on the customer use case from the previous example. Our engineer wants to take some immediate custom action when any part of the snapshot job reaches (or fails to reach) a certain state. Systems Manager Automation already has a first-class integration with Amazon CloudWatch Events, which customers can use to notify and alert when Automation jobs fail. But we can also take immediate, customized actions right in our Automation document using the new assertAwsResourceProperty action.

The assertAwsResourceProperty action is potentially most useful when used in conjunction with a step’s onFailure, nextStep and/or isCritical values to control the flow of an Automation document. You can read more about those properties in the AWS Systems Manager Automation documentation here.

Let’s say our engineer wants our Automation document to execute a specified automation step called onNotRunning whenever the value of our previous example’s EC2 instance state is not equal to Running after a certain time threshold. Otherwise, if the instance state is Running, we want it to execute the step named onRunning. Here’s what that looks like in an Automation document (again, it assumes a previous step output called InstanceIds):

---
description: execute steps based on resource values
schemaVersion: '0.3'
mainSteps:
- name: assertStep
  action: aws:assertAwsResourceProperty
  nextStep: onRunning
  onFailure: onNotRunning
  inputs:
    Service: ec2
    Api: DescribeInstanceStatus
    InstanceIds: "{{PreviousStep.InstanceIDs}}"
    PropertySelector: "$.Instances.InstanceState.Name"
    DesiredValues:
    - Running

- name: onRunning
  # this step will run only when assertStep above returns ‘Running’

- name: onNotRunning
  # this step will run only when assertStep above returns any value other than
  ‘Running’

Our engineer now has a few options for custom actions, including:

  • Using the executeAwsApi action inside those steps to call nearly any AWS API action, with parameters.
  • Using the invokeLambdaFunction action to trigger the execution of actions requiring more complex custom code using AWS Lambda.

In either case, useful information about the failure can be included with the trigger event, to highly customize the action on the other end, and handle different error types with different actions. And, just like other action types, we can cause the entire Automation document to fail when our assertion fails, simply by setting an isCritical value to True for the step, as described here.

Where to go from here: CI/CD automation!

As an AWS DevOps Competency Partner, Onica is passionate about our customers’ success with configuration management, automation, and CI/CD on AWS. One common theme we see with many AWS services is a declarative top-level artifact being used to intuitively manage an underlying AWS service, such as Systems Manager Automation documents, CloudFormation .yml, AWS CodeBuild buildspec.yml, AWS CodeDeploy appspec.yml, and others.

While in the early stages of their DevOps journey, many organizations ‘get stuck’ at automating the deployment of these top-level declarative artifacts. For example, they invest the effort to write Automation documents and CloudFormation templates, but continue to deploy manually using the AWS Management Console or by manually running CLI scripts locally.

If your organization is already practicing automated deployment for infrastructure artifacts like Automation documents, then keep on automating! However, if your team has ‘gotten stuck’ at automated deployment, then consider investing some resources in learning about and building a deployment pipeline for these new Systems Manager features. You’ll be glad you did. And, whether you are an experienced CI/CD organization, or just getting started, be sure to check out Onica Runway!  Runway is a completely free, open source framework that greatly reduces the amount of ‘glue’ scripting required to automate common infrastructure deployments on AWS.

Thanks for reading, and happy automating!

Onica is a Premier Tier APN Partner since 2015, with 9 AWS Competencies including DevOps.

The content and opinions in this blog are those of the third party author and AWS is not responsible for the content or accuracy of this post.