AWS Management Tools Blog

Administering a Group of Instances using Run Command

Emily Freebairn, Software Development Engineer with Amazon Web Services.

Frequently, engineers want to perform operational tasks across a group of instances. However, many of these tasks need to be performed at a controlled speed, and return feedback when there is a problem. Furthermore, administrators often want to ensure that engineers can perform only specific actions.

Run Command, which is part of Amazon EC2 Systems Manager (SSM), is designed to let you remotely and securely manage instances. Run Command provides a simple way of automating common administrative tasks like running shell scripts, installing software or patches, and more. Run Command allows you to execute these commands across multiple instances and provides visibility into the results. Through integration with AWS Identity and Access Management (IAM), you can apply granular permissions to control the actions users can perform on instances. All actions taken with Run Command are recorded by AWS CloudTrail, allowing you to audit changes in your fleet.

In this post, I demonstrate how to send a command to collect diagnostic information on my instances. Because capacity is added to the fleet on demand, the fleet composition changes over time. To reduce the likelihood of unintentional issues on instances, commands can be run at a controlled rate across instances. You get notified if there are any failures for analysis later on. To make sure you can’t accidentally run other commands, you use a custom action with locked down permissions to perform only specific tasks.

Walkthrough

In this walkthrough, I show you how to setup instances using Auto Scaling, create a custom SSM document, and then run a command on all instances in the Auto Scaling Group. I also show how to set up Amazon CloudWatch Events so that you get notified in case there are any failures.

Step 1: Launch instances using Auto Scaling Groups

To use Run Command, instances need the following:

The SSM agent talks to the Run Command service to receive commands and send output, and uses the IAM role to grant permission to call the service.

For this post, use an Auto Scaling Group to create a group of instances that are configured appropriately. For step-by-step instructions, see Getting Started with Auto Scaling.

Here’s an example of an Auto Scaling Group created with five instances.

Step 2: Create a custom document

Run Command uses documents to specify what actions to perform on the instance. Documents are AWS resources defined as JavaScript Object Notation (JSON), and they include steps and parameters that you specify.

AWS provides a set of documents that perform common tasks such as running a shell script, configuring CloudWatch, installing an application, and others. In addition, you can write your own documents for specific tasks. Because IAM policies allow you to control which documents a user is authorized to use, you can lock down the actions a specific user can take by restricting them to a subset of documents.

Here’s an example of a document that finds out the top processes consuming memory.

{
    "schemaVersion": "2.0",
    "description": "Instance Diagnostics",
    "parameters": { },
    "mainSteps": [
        {
            "action": "aws:runShellScript",
            "name": "collectInformation",
            "inputs": {
                "runCommand": [ "ps aux --sort '%mem' | head -5" ]
            }
        }
    ]
}

To create a custom document, use the create document SSM API.

aws ssm create-document --name InstanceDiagnostics --content file://~/workspace/document.json --document-type Command
{
    "DocumentDescription": {
        "Status": "Creating", 
        "Hash": "92182f1392807f23556ecc3f9e1d950a575dce8e2a4b96d284b1b2fb93369db2", 
        "Name": "InstanceDiagnostics", 
        "Parameters": [], 
        "DocumentType": "Command", 
        "PlatformTypes": [
            "Linux"
        ], 
        "DocumentVersion": "1", 
        "HashType": "Sha256", 
        "CreatedDate": 1492636792.396, 
        "Owner": "040557870006", 
        "SchemaVersion": "2.0", 
        "DefaultVersion": "1", 
        "LatestVersion": "1", 
        "Description": "Instance diagnostics example."
    }
}

Step 3: Set up CloudWatch Events

Last year, we added support in Run Command for notifications on command state changes. When setting up CloudWatch Event notifications, you can decide to trigger events on a per instance or per command basis, and specify the statuses for notification. With this feature, you can choose to be notified when a command finishes, to take the necessary follow-up actions.

Set up a CloudWatch Events notification to get notified via Amazon SNS and receive an email when a command finishes. Start by creating an SNS topic configured to send email when triggered.

Next, create the CloudWatch Event rule to trigger the SNS topic when a command completes execution.

Step 4: Test the command on an instance

Before you send the command to the entire group of instances, make sure that it works as expected.

First, check that a test instance is setup properly and connected to the service by using the DescribeInstanceInformation API. This returns information about the ping status of the agent, the platform the agent is running on, and other instance information.

aws ssm describe-instance-information --filters "Key=InstanceIds,Values=i-01222ecf7db201ca2"
{
    "InstanceInformationList": [
        {
            "IsLatestVersion": false, 
            "ComputerName": "ip-172-31-24-177.us-west-1.compute.internal", 
            "PingStatus": "Online", 
            "InstanceId": "i-01222ecf7db201ca2", 
            "IPAddress": "172.31.24.177", 
            "ResourceType": "EC2Instance", 
            "AgentVersion": "2.0.755.0", 
            "PlatformVersion": "2017.03", 
            "PlatformName": "Amazon Linux AMI", 
            "PlatformType": "Linux", 
            "LastPingDateTime": 1492637593.888
        }
    ]
}

Next, send a command to the instance above using the document created earlier.

aws ssm send-command --document-name "InstanceDiagnostics" --instance-ids "i-01222ecf7db201ca2" 
{
    "Command": {
        "Comment": "", 
        "Status": "Pending", 
        "MaxErrors": "0", 
        "Parameters": {}, 
        "ExpiresAfter": 1492645288.475, 
        "ServiceRole": "", 
        "DocumentName": "InstanceDiagnostics", 
        "TargetCount": 1, 
        "OutputS3BucketName": "", 
        "NotificationConfig": {
            "NotificationArn": "", 
            "NotificationEvents": [], 
            "NotificationType": ""
        }, 
        "CompletedCount": 0, 
        "Targets": [], 
        "StatusDetails": "Pending", 
        "ErrorCount": 0, 
        "OutputS3KeyPrefix": "", 
        "RequestedDateTime": 1492638088.475, 
        "CommandId": "11cf0866-fdec-43a4-987b-b7a5f8ad60e9", 
        "InstanceIds": [
            "i-01222ecf7db201ca2"
        ], 
        "MaxConcurrency": "50"
    }
}

Finally, check to make sure the command completed successfully.

aws ssm list-commands --command-id 11cf0866-fdec-43a4-987b-b7a5f8ad60e9
{
    "Commands": [
        {
            "Comment": "", 
            "Status": "Success", 
            "MaxErrors": "0", 
            "Parameters": {}, 
            "ExpiresAfter": 1492645288.475, 
            "ServiceRole": "", 
            "DocumentName": "InstanceDiagnostics", 
            "TargetCount": 1, 
            "OutputS3BucketName": "", 
            "NotificationConfig": {
                "NotificationArn": "", 
                "NotificationEvents": [], 
                "NotificationType": ""
            }, 
            "CompletedCount": 1, 
            "Targets": [], 
            "StatusDetails": "Success", 
            "ErrorCount": 0, 
            "OutputS3KeyPrefix": "", 
            "RequestedDateTime": 1492638088.475, 
            "CommandId": "11cf0866-fdec-43a4-987b-b7a5f8ad60e9", 
            "InstanceIds": [
                "i-01222ecf7db201ca2"
            ], 
            "MaxConcurrency": "50"
        }
    ]
}

Step 4: Run the command with velocity control

Now you’re ready to send a command to your group of instances.

Run Command exposes two new concepts to help you control the rate at which commands are sent. You can control how many instances execute the command at the same time by using the max-concurrency parameter. You can specify either an absolute number of instances, for example 10, or a percentage, like 50%. The system will gradually create more invocations (the pairing of command and instance ID) until it reaches the max concurrency limit, at which time it will wait for each current invocation to finish before creating the next.

The second parameter, max-errors, allows you to specify how many errors are allowed before Run Command stops sending commands to additional instances. Like max-concurrency, max-errors can be specified as either an absolute number or percentage.

Send a command to all instances in the Auto Scaling Group created, and as an example, we specify max-concurrency of 40% and max-errors of 100%. You can use the autogenerated Auto Scaling Group tag without any additional work. By setting max-concurrency to 40%, you ensure that commands are not sent to all instances at the same time. Setting max-errors to 100% will make sure that the command runs for all instances, even if some of the command invocations are not successful.

Because you have CloudWatch Events notification set up, you will be notified when the command finishes. The Run Command API has a limit of 1200 characters for the output, so specifying an S3 location will ensure that the full output is captured.

aws ssm send-command --document-name "InstanceDiagnostics" --target "Key=tag:aws:autoscaling:groupName,Values=RunCommandASG" --max-concurrency 40% --max-errors 100% --output-s3-bucket-name "run-command-blog" --output-s3-key-prefix "diagnostics"
{
    "Command": {
        "Comment": "", 
        "Status": "Pending", 
        "MaxErrors": "100%", 
        "Parameters": {}, 
        "ExpiresAfter": 1492647224.49, 
        "ServiceRole": "", 
        "DocumentName": "InstanceDiagnostics", 
        "TargetCount": 0, 
        "OutputS3BucketName": "run-command-blog", 
        "NotificationConfig": {
            "NotificationArn": "", 
            "NotificationEvents": [], 
            "NotificationType": ""
        }, 
        "CompletedCount": 0, 
        "Targets": [
            {
                "Values": [
                    "RunCommandASG"
                ], 
                "Key": "tag:aws:autoscaling:groupName"
            }
        ], 
        "StatusDetails": "Pending", 
        "ErrorCount": 0, 
        "OutputS3KeyPrefix": "diagnostics", 
        "RequestedDateTime": 1492640024.49, 
        "CommandId": "666b0ea2-0004-4352-bddc-ac212e0e4090", 
        "InstanceIds": [], 
        "MaxConcurrency": "40%"
    }
}

Step 5: Verify the command results

When the command finishes, you receive an SNS notification in email.

{
    "version": "0",
    "id": "8bb24048-9af4-4f88-a70d-47feba9da26c",
    "detail-type": "EC2 Command Status-change Notification",
    "source": "aws.ssm",
    "account": "040557870006",
    "time": "2017-04-19T22:38:19Z",
    "region": "us-west-1",
    "resources": [
        
    ],
    "detail": {
        "command-id": "666b0ea2-0004-4352-bddc-ac212e0e4090",
        "document-name": "InstanceDiagnostics",
        "requested-date-time": "2017-04-19T22:38:16.105Z",
        "expire-after": "2017-04-20T00:38:16.105Z",
        "output-s3bucket-name": "run-command-blog",
        "output-s3key-prefix": "diagnostics",
        "parameters": "",
        "status": "Success"
    }
}

Use the ListCommands API to check on the overall command status. This returns information such as the status of the command, how many instances were targeted (TargetCount), and how many invocations finished (CompletedCount).

aws ssm list-commands --command-id 666b0ea2-0004-4352-bddc-ac212e0e4090 
{
    "Commands": [
        {
            "Comment": "", 
            "Status": "Success", 
            "MaxErrors": "100%", 
            "Parameters": {}, 
            "ExpiresAfter": 1492647224.49, 
            "ServiceRole": "", 
            "DocumentName": "InstanceDiagnostics", 
            "TargetCount": 5, 
            "OutputS3BucketName": "run-command-blog", 
            "NotificationConfig": {
                "NotificationArn": "", 
                "NotificationEvents": [], 
                "NotificationType": ""
            }, 
            "CompletedCount": 5, 
            "Targets": [
                {
                    "Values": [
                        "RunCommandASG"
                    ], 
                    "Key": "tag:aws:autoscaling:groupName"
                }
            ], 
            "StatusDetails": "Success", 
            "ErrorCount": 0, 
            "OutputS3KeyPrefix": "diagnostics", 
            "RequestedDateTime": 1492640024.49, 
            "CommandId": "666b0ea2-0004-4352-bddc-ac212e0e4090", 
            "InstanceIds": [], 
            "MaxConcurrency": "40%"
        }
    ]
}

Now you want the diagnostic information you collected from a specific instance. Use the GetCommandInvocation API. This returns the output of the command, including more details about the execution such as when it started and how long it took.

aws ssm get-command-invocation --command-id 666b0ea2-0004-4352-bddc-ac212e0e4090 --instance-id i-01222ecf7db201ca2
{
    "Comment": "", 
    "ExecutionElapsedTime": "PT0.004S", 
    "ExecutionEndDateTime": "2017-04-19T22:13:47.122Z", 
    "StandardErrorContent": "", 
    "InstanceId": "i-01222ecf7db201ca2", 
    "StandardErrorUrl": "https://s3-us-west-1.amazonaws.com/run-command-blog/diagnostics/666b0ea2-0004-4352-bddc-ac212e0e4090/i-01222ecf7db201ca2/awsrunShellScript/0.diagnose/stderr", 
    "DocumentName": "InstanceDiagnostics", 
    "StandardOutputContent": "eth0      Link encap:Ethernet  HWaddr 02:9B:C5:26:4B:00  \n          inet addr:172.31.24.177  Bcast:172.31.31.255  Mask:255.255.240.0\n          inet6 addr: fe80::9b:c5ff:fe26:4b00/64 Scope:Link\n          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1\n          RX packets:9905 errors:0 dropped:0 overruns:0 frame:0\n          TX packets:3260 errors:0 dropped:0 overruns:0 carrier:0\n          collisions:0 txqueuelen:1000 \n          RX bytes:11135049 (10.6 MiB)  TX bytes:514905 (502.8 KiB)\n\nlo        Link encap:Local Loopback  \n          inet addr:127.0.0.1  Mask:255.0.0.0\n          inet6 addr: ::1/128 Scope:Host\n          UP LOOPBACK RUNNING  MTU:65536  Metric:1\n          RX packets:2 errors:0 dropped:0 overruns:0 frame:0\n          TX packets:2 errors:0 dropped:0 overruns:0 carrier:0\n          collisions:0 txqueuelen:1 \n          RX bytes:140 (140.0 b)  TX bytes:140 (140.0 b)\n\n", 
    "Status": "Success", 
    "StatusDetails": "Success", 
    "PluginName": "diagnose", 
    "ResponseCode": 0, 
    "ExecutionStartDateTime": "2017-04-19T22:13:47.122Z", 
    "CommandId": "666b0ea2-0004-4352-bddc-ac212e0e4090", 
    "StandardOutputUrl": "https://s3-us-west-1.amazonaws.com/run-command-blog/diagnostics/666b0ea2-0004-4352-bddc-ac212e0e4090/i-01222ecf7db201ca2/awsrunShellScript/0.diagnose/stdout"

Finally, because you set up an S3 bucket, the output from all the command invocations end up in the common S3 location you specified.

Conclusion

In this post, I showed how you can use Run Command, along with other AWS services, to take administrative actions on a group of instances. Run Command provides a simple and scalable way for you to administer your instances. You can control the rate at which you send commands, use fine-grained permissions, and use notifications to simplify your workflow.

About the Author

Emily Freebairn is a Software Development Engineer in the Amazon EC2 Systems Manager team. She has been with Amazon for four years, working on Run Command and other features in Amazon EC2 and Systems Manager. When she’s not at work designing and building scalable products, she enjoys sailing and dancing.