Amazon EC2 Systems Manager as a General-Purpose DevOps Tool

This guest post was written by Andrew Rout, Engineer at Riverbed SteelCentral Office of the CTO

A long time ago, a manufacturer in Cincinnati invented Play-Doh to be used as a wallpaper cleaner. Twenty years later, an even better purpose was found for it, and kids everywhere rejoiced.

History repeats itself with Amazon EC2 Systems Manager as we discover new ways to use this service from AWS. The following walk through shows you how Run Command can be used as a DevOps tool for orchestration and for systems introspection.

The need to communicate with EC2 instances

To manage the EC2 instances that power Riverbed Technology’s SteelCentral SaaS offering, Riverbed’s DevOps team built an internal tool that allows them to perform tasks on the EC2 instances and gives them insight into the state of the environment. A UI sits on top of a backend that communicates with the EC2 instances and various other AWS services.

This internal DevOps tool allows our operations team to do the following:

See dashboards describing the overall health of all infrastructure components and software components of SteelCentral SaaS
Provision new resources as necessary
Troubleshoot services running on EC2 instances
Manage users and licensing

In addition to a DevOps tool, Riverbed’s SaaS environment includes an event-driven service that also needs to communicate with EC2 instances. The event-driven system is used to provision additional resources on an EC2 instance as the system scales.

Each of the tasks executed by the DevOps tool and the event-driven service requires one or more remote shell commands to be executed on an EC2 instance to either fetch information from an application or make a change to it.

The Riverbed DevOps tool initially issued these commands via SSH, but the need to maintain an SSH key in multiple places was a headache from a logistics and security point-of-view. We preferred not to manage the key bits or the access control for SSH keys on our own, nor did we not want to develop and run a web service on all EC2 instances to handle our specific use cases.

EC2 Systems Manager makes software inventory management easier

Enter EC2 Systems Manager…

EC2 Systems Manager was initially designed to be a tool for managing the software packages installed on EC2 instances, but it can be used for much more than just that.

The original thinking was that if you needed to install or upgrade software on multiple EC2 instances, you could execute the yum, apt-get, Windows PowerShell, or other package manager command via EC2 Systems Manager, and it would do it for you without the user needing to SSH into each EC2 instance individually.

No, EC2 Systems Manager makes issuing remote commands easier

Really what Run Command did was provide a way to execute ANY command on a remote EC2 instance. Anywhere an SSH command is needed, a Run Command call can take its place.

The benefits of using Run Command instead of SSH have been stated in other blog posts, so I’ll briefly list them here.

No need to store SSH keys anywhere
Ability to execute remote shell commands is controlled by IAM policies
Commands issued via Run Command are auditable
Command output can be stored in Amazon S3 for historical reference

Essentially, EC2 Systems Manager is used as a communication service between a client and your EC2 instances. In Riverbed’s case, the DevOps tool is the client that wants to communicate with our EC2 instances.

Riverbed’s SteelCentral SaaS uses EC2 Systems Manager to issue commands to stop, start, and modify services running on EC2 instances. For example, when a new user signs up, an event is triggered and Run Command sends commands to ensure additional services are provisioned and configured as the system scales.

In the Riverbed internal DevOps tool, a human operator can visit a page that displays the overall health of all of the services running on the EC2 instance serving the new user. To get that information, a Run Command is sent to the EC2 instance to query process status, and the output is populated into the DevOps tool’s UI.

Using Run Command in place of SSH has allowed Riverbed Technology to save tens of hours of engineering time each year due to no longer needing to manage and maintain SSH keys or troubleshoot SSH keys that aren’t working.

More importantly, EC2 Systems Manager makes our security policies simpler to enforce because access controls are moved out of the code and into Amazon IAM. This saves time during compliance reviews and makes it easier for management to get a picture of what access paths are defined.

Setting up your AWS account to use the SSM Agent

Before you can execute commands on your EC2 instances using the SSM Agent, you need to do the following:

Install the SSM Agent on your EC2 instances.
Update your EC2 instance’s IAM role to include the AWS managed policy named “AmazonEC2RoleforSSM”. Or, you can create a custom policy for SSM as an alternative to using a managed policy.
Grant permission to your user’s IAM role to allow it to execute SSM commands. This simple policy grants access to all of SSM:

{
        "Effect": "Allow",
        "Action": [
            "ssm:*”
        ],
        "Resource": "*"
    }

Here’s an example of how to execute a command using Run Command

As a simple test, let’s execute “whoami” on an EC2 instance via SSM.

Note: This example is written in Python using the AWS SDK for Python (boto3) to communicate with AWS services, but you can use any SDK of your choice, including the AWS CLI.

First, send your command to SSM, and note the returned CommandId:

import boto3
import time
 
instance_id = 'i-abcdef123456'
cmd = 'whoami'
 
ssm = boto3.client('ssm', region_name='us-east-1')
 
response = ssm.send_command(
    InstanceIds=[instance_id],
    DocumentName='AWS-RunShellScript',
    Parameters={"commands":[cmd]}
    )
 
command_id = response.get('Command', {}).get("CommandId", None)

Second, wait for the command to finish (use the CommandId from the previous step):

while True:
 
    response = ssm.list_command_invocations(CommandId=command_id, Details=True)
 
    # If the command hasn't started to run yet, keep waiting
    #
    if len(response['CommandInvocations']) == 0:
        time.sleep(1)
        continue
 
    # There could be >1 CommandInvocation if the command was sent to multiple
    # EC2 instances, but in this example, we just sent the command to one.
    #
    invocation = response['CommandInvocations'][0]
 
    # Once we detect the command is done, exit the while loop
    if invocation['Status'] not in ('Pending', 'InProgress', 'Cancelling'):
        break
 
    time.sleep(1)

Last, grab the command output:

command_plugin = invocation['CommandPlugins'][-1]
 
output = command_plugin['Output']
 
status = command_plugin['ResponseCode']
 
print "Output =", output
print "Status =", status

If the SSM command succeeded, you should see output that looks like the following:

Output = root

Status = 0

It’s important to note that the command output returned in the SSM response is truncated at 2,500 characters. If you expect your command output to be more than 2,500 characters, you can store the full command output in Amazon S3 and fetch it from there.

To store the command output in Amazon S3, add the parameter “OutputS3BucketName” when running “send_command”:

response = ssm.send_command(
    InstanceIds=[instance_id],
    DocumentName='AWS-RunShellScript',
    Parameters={"commands":[cmd]},
    OutputS3BucketName='<bucket-name>'
    )

Summary

Run Command is an ideal service to use inside an application that needs to communicate with EC2 instances. It provides a way to execute any remote command on any of your EC2 instances. With a little bit of IAM configuration, you can throw away your SSH keys forever and let Run Command handle executing your remote shell commands.

The use cases for using Run Command are as expansive as the use cases for using SSH. Whether it be controlling the applications running on your EC2 instances or fetching system and application level information, AWS has given users a reliable and easy way to manage EC2 instances and the software running on them.

About Riverbed Technology

Riverbed’s SteelCentral SaaS provides full-Stack Application Performance Monitoring with real-time visibility into the end-user experience, network, infrastructure and applications for applications hosted on or off the cloud. Users can diagnose application performance problems down to the offending code, SQL, web service, network, or system resource.

About the Author

Andrew Rout joined Riverbed Technology in 2013. As an engineer in Riverbed’s SteelCentral Office of the CTO, he evaluates new technologies for product integration and has recently spent time leveraging AWS and Docker. He currently builds and manages the tools and AWS infrastructure that power SteelCentral SaaS. He has a strong interest in using Python to drive DevOps activities.

AWS Cloud Operations & Migrations Blog

Amazon EC2 Systems Manager as a General-Purpose DevOps Tool

The need to communicate with EC2 instances

Summary

About Riverbed Technology

About the Author

Resources

Follow