Infrastructure & Automation

Using AWS Systems Manager Automation and AWS CloudFormation together

The Microsoft Active Directory Domain Services Quick Start was first published in March of 2014. Since 2014, it has seen some revisions and modifications, but the Quick Start team wanted to update the Active Directory Domain Services Quick Start with tools and services that are available now. This post is the first of a two-part series where I will discuss the updates to the Active Directory Domain Services Quick Start.

quick start logo
button to go to source code

I wanted to make some improvements that would help to maintain the project but also give customers easier visibility into troubleshooting the Quick Start themselves. Previously, if the AD DS Quick Start failed to create the stack, you would need to relaunch the template with Rollback Set to No. Then you would need to use Remote Desktop to connect to the instance that errored and examine the logs in a local directory.

In the update, I wanted to push logs centrally to Amazon CloudWatch, which would allow customers to be able to search through logs without having to connect to each instance. However, I still wanted to maintain the ability to spin up a Quick Start and tear down all the associated resources. To achieve these improvements, I used a combination of AWS CloudFormation and AWS Systems Manager.

In this first blog post, I’ll discuss the following:

  • Using Systems Manager and AWS CloudFormation together
  • Triggering Systems Manager automation from AWS CloudFormation
  • Making AWS CloudFormation wait while the Automation document completes
  • Signaling AWS CloudFormation from Systems Manager
  • Searching logs from automation output

You can find the artifacts for this Quick Start on GitHub or by launching the Quick Start.

Why use Systems Manager and AWS CloudFormation together

If AWS CloudFormation already works, why use Systems Manager? This is an understandable question. To answer, I’ll draw a comparison to building a house. In building a house you would not use only nails, but a combination of screws and nails depending on the task. Each fastener has its strength in a different situation.

For a Quick Start, Systems Manager has a strength in automation within a guest operating system (OS), whereas AWS CloudFormation has a strength in defining AWS Cloud resources. When updating the Active Directory Domain Services Quick Start, I used AWS CloudFormation at the AWS Cloud layer to define AWS resources, and I used Systems Manager to configure the OS of the instance, as shown in the following diagram.

diagram of a w s cloud and o s layer

By maintaining the Quick Start within AWS CloudFormation, you keep the ability to deploy a stack and then delete a stack from a single template. Systems Manager gives you an easy mechanism to centrally aggregate logs to CloudWatch.

When you add Windows PowerShell Desired State Configuration (DSC), Systems Manager also provides a mechanism to orchestrate reboots between steps (I’ll cover the details in part two of this blog post series). Systems Manager also allows you to document that configuration workflow in a single Automation document, making it easier to follow the flow of configuration steps within the Quick Start.

Triggering Systems Manager automation from AWS CloudFormation

Having defined the pattern of combining Systems Manager and AWS CloudFormation, the next question is how to trigger Systems Manager Automation document execution from AWS CloudFormation.

When working on the Active Directory Domain Services Quick Start, my initial thought was to create an AWS Lambda-backed custom resource to trigger the Automation document, but I wanted to go simpler and not have to manage creation of Lambda functions and Lambda execution roles if I didn’t have to.

Instead, this Quick Start leverages Amazon Machine Images (AMIs) that include pre-installed tools such as AWS Tools for PowerShell and the Systems Manager Agent, so the needed components are available on each instance. In the diagram, you can see that when the second domain controller (DC) is launched, the Systems Manager Automation document is kicked off.

workflow diagram

This is accomplished through the Amazon Elastic Compute Cloud (Amazon EC2) User Data feature using the Start-SSMAutomationExecution cmdlet, as in the following example.

Start-SSMAutomationExecution -DocumentName "AutomationDocumentName" -Parameter @{"ParameterName"="ParameterValue"}

You can also see how this is accomplished within the ad-1.template in the GitHub repo.

After a Systems Manager Automation document is executed, you need to make sure that AWS CloudFormation waits for the document execution to complete. Otherwise, AWS CloudFormation will not be able to report back whether automation within the guest operating system completed successfully.

Making AWS CloudFormation wait while the Systems Manager Automation document completes

To have AWS CloudFormation wait, add a CreationPolicy attribute to the second domain controller.

DomainController2:
  Type: AWS::EC2::Instance
  DependsOn: DomainController1
  CreationPolicy:
    ResourceSignal:
      Timeout: PT60M
      Count: 1

Adding a creation policy to the second domain controller causes AWS CloudFormation to wait until it receives a success or failure signal or until the time specified in the policy expires. After the Systems Manager Automation document is triggered by DC2 via user data, the DC2 resource will wait for one hour, or until it receives one success or failure signal.

Signaling AWS CloudFormation from Systems Manager

The next hurdle, after triggering and waiting for Systems Manager to complete the automation, is signaling AWS CloudFormation of success or failure of the automation. To accomplish this, you need to signal the DC2 Resource in the CloudFormation stack where you attached the creation policy. In the automation workflow diagram, you can see that the signaling happens in step 13, which is a Systems Manager Automation step. Systems Manager enables you to use the aws:executeAwsApi action for native API calls to all AWS services. The Automation document includes a step that signals failure to AWS CloudFormation.

If any step fails, it will trigger the failure signal to the DC2 resource. However, if all steps complete successfully, it will signal success to the DC2 resource. These steps from the Automation document are shown in the following example.

- name: "DnsConfig"
  action: "aws:runCommand"
  onFailure: "step:signalfailure"
  inputs:
    DocumentName: "AWS-RunRemoteScript"
    InstanceIds:
      - "{{dc2InstanceId.InstanceId}}"
    CloudWatchOutputConfig:
      CloudWatchOutputEnabled: "true"
      CloudWatchLogGroupName: "/aws/Quick_Start/ActiveDirectoryDS"
  Parameters: …
- name: "signalsuccess"
  action: "aws:executeAwsApi"
  isEnd: True
  inputs:
    Service: cloudformation
    Api: SignalResource
    LogicalResourceId: "DomainController2"
    StackName: "{{StackName}}"
    Status: SUCCESS
    UniqueId: "{{dc2InstanceId.InstanceId}}"
- name: "signalfailure"
  action: "aws:executeAwsApi"
  inputs:
    Service: cloudformation
    Api: SignalResource
    LogicalResourceId: "DomainController2"
    StackName: "{{StackName}}"
    Status: FAILURE
    UniqueId: "{{dc2InstanceId.InstanceId}}"

Note that for each signal action, you need the StackName, LogicalResourceId, and a UniqueId, which is the instance ID of the resource.

Searching logs from automation output

Note also that each step that executes scripts is outputting logs to CloudWatch, allowing you to aggregate logs centrally in the Quick Start. If the automation were to fail, you would be able to search for the /aws/Quick_Start/${StackName} log group in CloudWatch to determine which step failed and why. Let’s use the AD DS Quick Start to show how you can search through CloudWatch to find a specific event.

The following screenshot shows the log group created by the AD DS Quick Start. You can choose the log group to examine all the log streams.

log group

When you are within the log group, choose Search Log Group to search specific events.

search log group button

Note the extensive logs that are outputted from each step. In this example, we want to find the specific event Renamed computer to DC1. Enter the string you are looking for in the box and press Enter.

error search

This will reduce the list to the log entries that contain this string. You can use this method to search for errors as well.

If you go to the execution of the Automation document in Systems Manager, you can also examine which steps were successful and which steps failed.

list of executed steps

If a step failed, you can then choose the Step ID and examine the input parameters for the step in question.

show input parameters

You can also get some of the log output from the step that ran, and you can choose the CommandId, which will also have a link to the logs that pertain to that step.

outputs

Conclusion

In this blog post, I discussed why I chose to use Systems Manager and AWS CloudFormation together for the Active Directory Domain Services Quick Start. I covered one possible pattern to trigger Systems Manager from AWS CloudFormation to configure EC2 instances, to make AWS CloudFormation wait, and to signal CloudFormation. By using this pattern, you get the benefit of centralized logging and you can begin to see how multiple AWS services can be used together to meet requirements. In my next blog post, I cover using PowerShell DSC with Systems Manager, and the workflow and orchestration between each node in the Quick Start.