AWS Storage Blog

Deploying AWS Elastic Disaster Recovery at scale with AWS Systems Manager

In the digital era, ensuring business continuity through effective disaster recovery measures is crucial for organizations of all sizes. Setting up disaster recovery solutions manually, such as installing recovery agents on multiple servers, can be a significant and time-consuming task. Therefore, many customers are increasingly seeking automation not only to streamline common administrative tasks but also to establish consistency across all operational layers.

This is where AWS Elastic Disaster Recovery (DRS) comes into play. AWS DRS simplifies and automates the process, minimizing both downtime and data loss by enabling the seamless recovery of physical, virtual, and cloud-based servers into AWS. In tandem, AWS Systems Manager, a comprehensive suite of capabilities, aids in managing your applications and infrastructure within the AWS Cloud. Notably, its Run Command feature is capable of automating tasks such as large-scale agent installations, thereby reducing manual effort and ensuring an efficient and consistent setup of your disaster recovery processes.

In this post, we will show you how to use the Systems Manager Run Command to deploy the AWS DRS replication agent to your Amazon Elastic Compute Cloud (EC2) instances at scale, providing you with a means to expedite and automate the disaster recovery setup process. This automation not only reduces manual admin tasks and errors but also ensures a consistent disaster recovery setup across your infrastructure, thus enhancing business resilience and continuity. We will explore two different options to install the AWS DRS replication agent.

  • Option 1 involves adding the AWSElasticDisasterRecoveryEc2InstancePolicy to the existing instance profile role, followed by running an AWS managed Systems Manager command document to install the AWS DRS replication agent. This method is particularly beneficial for customers who are comfortable extending the permissions of their existing instance profile role with an additional policy, thereby avoiding the creation of new roles.
  • Option 2 presents an alternative approach, ideal for those seeking segregation of duties or specific role-based access controls. It involves creating a new role with AWSElasticDisasterRecoveryEc2InstancePolicy, then executing a custom Systems Manager command document to install the AWS DRS replication agent through the assume role function. Understanding and applying these options will enable you to cater to different organizational needs and security requirements, thereby achieving an efficient and consistent disaster recovery setup.

Prerequisites

When, you use AWS Systems Manager you must first work through some prerequisites. Above all, you must install the Systems Manager Agent on your instances. In most cases, Amazon Machine Images (AMIs) are configured to launch instances with the Systems Manager Agent preinstalled.

Your EC2 instances should have an AWS Identity and Access Management (IAM) instance profile role with AmazonSSMManagedInstanceCore policy that grants EC2 instances the permissions needed for core Systems Manager functionality.

Next, you should confirm that the AWS DRS service is initiated in the target Region. For further information see the getting started guide.

Option 1: Managed AWS DRS Agent installation via Instance Profile Role and Systems Manager

Option 1 consists of the following steps:

  1. Adding AWSElasticDisasterRecoveryEc2InstancePolicy to existing instance profile role.
  2. Deploying the AWS DRS Replication Agent.

1. Adding AWSElasticDisasterRecoveryEc2InstancePolicy to existing instance profile role

  1. Navigate to IAM Console. Select Roles in the navigation pane. IAM navigation pane highlighting Roles
  1. In the list, choose the name of the role that is attached to your instances.
  1. Choose the Permissions tab, then choose Add permissions and then select Attach Policies.

Attach Policies to Role

  1. Search for and select AWSElasticDisasterRecoveryEc2InstancePolicy then choose “Add permissions“. This policy enables the installation and usage of AWS DRS Replication Agent.

Attach AWSElasticDisasterRecoveryEc2InstancePolicy Policy to Role

  1. Confirm that AWSElasticDisasterRecoveryEc2InstancePolicy is added to the role.

Confirm Policies added to Role

2. Deploying the AWS DRS Replication Agent

  1. I have five EC2 instances, which become targets for the AWS DRS Agent install. First, make sure these instances have a common tag, which is used as the targeting method.

a. Open the Tag Editor Console in the AWS Resource Groups console.

b. For Regions, select the Region where the resources to tag are located.

c. For Resource types, select the type as AWS::EC2::Instance and choose Search resources.

Tag Editor - Find all EC2 resources

d. Under Resource search results, select the check box next to each resource to tag and choose Manage tags of selected resources.

Tag Editor - Manage Tags for selected resources

e. Under Edit tags of all selected resources, choose Add tag, and then enter the new tag key and value. In this example, I use Key: DRS and Value: True

f. Choose Review and apply tag changes and then choose Apply changes to all selected.

Tag Editor - Add DRS-True Tag to selected EC2 Instances

2. Navigate to the Systems Manager Select the Run Command component under the heading Node Management.

Systems Manager navigation pane highlighting Run Command

3. On the next page, select Run command:

Systems Manager Run Comman console

4. Search for and select the command document AWSDisasterRecovery-InstallDRAgentOnInstance. Then, select the latest document version.

Systems Manager AWSDisasterRecovery-InstallDRAgentOnInstance AWS Managed document

5. Under Command parameters:

a. Select the Region to which you want to replicate the instances.

Systems Manager AWSDisasterRecovery-InstallDRAgentOnInstance Run command document attributes

b. Multiple options are available for selecting targets. In this case, I select targets based on tag key-value pairs. Specify the tag Key and Value, and then select Add.

Systems Manager Run Command select targets

c. As an optional step, configure an output option to Amazon Simple Storage Service (S3) or Amazon CloudWatch Logs.

Systems Manager Run Command Output Options

d. If you choose Amazon S3 or CloudWatch logs as an output, the IAM role must also have the relevant permissions added to allow the action.

e. Leave all other options as default, and then select Run.

f. Once you run the command, the Console should refresh and display the command status.

g. In the following diagram, you can see that the command targeted all five targets and completed:

Systems Manager Run Command Success confirmation

h. More detailed information can be gained from the Targets and outputs window:

Systems Manager Run Command Tagets and Status

  1. Navigate to the AWS DRS service within the disaster recovery (DR) Region. Confirm that the instances are present and that replication has begun.

Elastic Disaster Recovery Source servers dashboard

Option 2: Custom AWS DRS Agent installation with new IAM role creation and Systems Manager

The best practice is to use Option 1 in most scenarios. Option 1 involves adding the AWSElasticDisasterRecoveryEc2InstancePolicy to the existing instance profile role. This eliminates the need to create a new IAM role. This approach simplifies IAM management and installation of the AWS DRS Agent.

However, in some situations, Option 2 may be necessary. Option 2 is a unique situation that arises when the same Instance Profile Role is assigned to multiple EC2 instances that won’t be protected by AWS DRS and adding additional IAM policies to this instance profile role is not possible. In this scenario, you can create a new IAM role tailored specifically for AWS DRS.

The AWSElasticDisasterRecoveryEc2InstancePolicy should be associated with this IAM role to grant the necessary permissions required for the AWS DRS Replication Agent to undergo the agent installation. The role should be configured to allow your instance profile to temporarily assume these permissions using AWS Security Token Service (AWS STS).

To achieve this, I use the source code that consists of a single file: drs-template.json.

  • The drs-template.json is an AWS CloudFormation template that is designed to create both the AWS DRS IAM role and a custom Systems Manager Command Document named DRS-SSM-Installer-Document aimed at installing the AWS DRS Agent on your EC2 instances.

Option 2 consists of the following steps:

  1. Creating the IAM role and Systems Manager Command Document
  2. Deploying the AWS DRS Replication Agent

1. Creating the IAM role and Systems Manager Command Document

  1. Navigate to CloudFormation service:

a. Choose Create Stack, and then select With new sources(standard).

CloudFormation Console - Create Stack with new resources

b. In the Template section, select Upload a template file and upload the drs-template.json file, then choose Next.

CloudFormation Create Stack by Uploading Template

c. Enter the stack Name, for example, DRS-Template, select the parameters, and then choose Next. When you run the template, you are prompted for the following parameters:

i. Are the Source Servers AWS Systems Manager managed?

a. True

ii. Enter Instance Profile Role name that is attached to an EC2 instance.

a. Enter your instance profile role name.

iii. Enter name for the DRS Agent Deployment IAM Role.

a. The default choice is DRS-Agent-Deployment-Role → Change this as necessary.

CloudFormation Create Stack Attributes

d. Next, add Tags, which are key-value pairs that can help you identify your stacks if necessary. For more information, see Adding tags to your CloudFormation stack and choose

e. Review the stack information. When you’re satisfied with the settings, choose Create.

f. When the stack has a status of CREATE_COMPLETE, check the Outputs menu to copy the IAM role ARN. This is used later while running the Systems Manager Document.

CloudFormation Stack Outputs section

  1. Next, to use AWS STS AssumeRole, AWS Command Line Interface (AWS CLI) must be installed onto the Source Servers. AWS CLI must be configured with the AWS Access Key and Secret Access Key under the default profile, or AWS CLI installed with an EC2 Instance Profile attached to the Source Servers. In this scenario, we utilize the EC2 Instance Profile role already attached to our EC2 instances, which has permissions to assume the DRS-Agent-Deployment-Role.
  1. For AWS CLI installation on Linux, the server needs the unzip package to unzip the installation files. Through the CloudFormation template deployed earlier, I created a custom Systems Manager Document (DRS-SSM-Installer-Document) that can check if AWS CLI or unzip (only on Linux based instances) are installed, and then install them if need be.

2. Deploying the AWS DRS Replication Agent

  1. I have five EC2 instances that become targets for the AWS DRS Agent install. First, make sure these instances have a common tag, which is used as the targeting method.

a. Open the Tag Editor Console in the AWS Resource Groups console.

b. For Regions, select the Region where the resources to tag are located.

c. For Resource types, select the type as AWS::EC2::Instance and choose Search resources.

Tag Editor - Find all EC2 resources

d. Under Resource search results, select the check box next to each resource to tag and choose Manage tags of selected resources.

ag Editor - Manage Tags for selected resources

e. Under Edit tags of all selected resources, choose Add tag, and then enter the new tag key and value. In this example, I use Key: DRS and Value: True

f. Choose Review and apply tag changes and then choose Apply changes to all selected.

 Tag Editor - Add DRS-True Tag to selected EC2 Instances

  1. Navigate to the Systems Manager

a. Select Documents under the heading Shared Resources.

b. Under Owned by me in the Documents Page, you should see a new document named DRS-SSM-Installer-Document.

Systems Manager Custom document created by running the CloudFormation template

  1. Open the document and select Run command.

Systems Manager Custom document

  1. Then, select the latest document version.

Systems Manager Custom document Run Command

  1. Under Command parameters, select the Region to which you want to replicate the instances, and paste the IAM role ARN from the Outputs section of CloudFormation Stack DRS-Template.

Systems Manager Custom document input attributes

a. Multiple options are available for selecting targets. In this case, I select targets based on tag key-value pairs. Specify the tag Key and Value, and then select Add.

Systems Manager Run Command select targets

b. As an optional step, configure an output option to Amazon S3 or CloudWatch Logs.

Systems Manager Run Command Output Options

c. If you choose Amazon S3 or CloudWatch Logs as an output, the IAM role must also have the relevant permissions added to allow the action.

d. Leave all other options as default, and then select Run.

e. Once you run the command, the Console should refresh and display the command status.

f. In the following diagram, you can see that the command targeted all five targets and completed:

Systems Manager Run Command Success confirmation

g. More detailed information can be gained from the Targets and outputs window:

Systems Manager Run Command Tagets and Status

  1. Navigate to the AWS DRS service within the DR Region. Confirm that the instances are present and that replication has begun.

Elastic Disaster Recovery Source servers dashboard

Cleaning up

Clean up the resources created during the previous steps to make sure you do not incur further charges.

Navigate to AWS DRS within the target Region. Select the source servers that you created as part of this activity, and then select Disconnect from AWS from the Actions menu. This action uninstalls the replication agent from the source servers and deletes the corresponding staging resources within the target Region.

Elastic Disaster Recovery disconnect source servers

Once the source servers show as disconnected, select Delete server from the Actions menu to remove them from the AWS DRS console.

Elastic Disaster Recovery delete source servers

Conclusion

In this blog post, we’ve explored two effective ways to install the AWS DRS replication agent on EC2 instances using AWS Systems Manager’s Run Command. These options can you help swiftly and efficiently install the AWS DRS replication agent, safeguarding your EC2 instances. Not only do these methods help alleviate administrative burdens, but they also provide a streamlined process for deploying the AWS DRS replication agent at scale.

As next steps, I encourage you to apply these techniques in your own environment to see their benefits firsthand. I also invite you to share your experiences and questions in the comments section.

Brian MacDonald

Brian MacDonald

Brian MacDonald is Senior Solution Architect and VP of JetSweep. He has been an integral part of our journey into the AWS partnership, providing direction for our cloud practice, as well as driving the implementation of AWS projects under JetSweep’s professional services. Brian is 8x AWS Certified and has participated in AWS CloudEndure Beta programs.

Stuart Lupton

Stuart Lupton

Stuart Lupton is a Specialist Solutions Architect for Disaster Recovery at AWS.

Sravan Rachiraju

Sravan Rachiraju

Sravan Rachiraju is a Solutions Architect with AWS. He specializes in AWS Migrations and Disaster Recovery services. He enjoys helping customers build reliable and cost-effective solutions to safeguard workloads in the event of a disaster. Sravan loves watching anime and spending time in nature catching Pokémon when he is not working.