Using Puppet to automate AWS Elastic Disaster Recovery for Amazon EC2 instances at scale

Customers improve their disaster recovery posture with automation. Automation reduces the operational overhead of managing source servers and automatically implementing your disaster recovery strategy. AWS Elastic Disaster Recovery replicates your servers, such as Amazon EC2 instances. In the case of a disaster, you can use AWS Elastic Disaster Recovery to recover your application servers. It enables Recovery Point Objectives (RPO) in seconds and Recovery Time Objectives (RTO) measured in minutes. RPO is the maximum amount of acceptable data loss, which depends on the data backup frequency. RTO is the time after a disaster until an application must be available again. This disaster recovery service helps you to minimize your downtime and data loss.

Definition of RTO and RPO

Figure 1: Importance of RPO and RTO for business continuity

To operationalize your disaster recovery strategy, you need to ensure that all of your critical servers replicate to AWS Elastic Disaster Recovery. There are various ways of enabling AWS Elastic Disaster Recovery on Amazon EC2 instances. In this blog post, we focus on Puppet and how you can leverage it to install the AWS Replication Agent on your servers and enable replication to AWS Elastic Disaster Recovery.

If you do not have an existing Puppet setup, we recommend you compare Puppet to other automation tools such as AWS Systems Manager to find the tool that best suites you, your team’s knowledge, and the desired features or complexity.

Overview of solution

The architecture diagram depicts the architecture to automate AWS Elastic Disaster Recovery with Puppet. It is more realistic if you have multiple Amazon Virtual Private Clouds (VPCs) for different applications. The walkthrough describes a simplified architecture without multiple VPCs compared to the architecture diagram. Throughout this article we reference terms that relate to AWS Elastic Disaster Recovery and terms that are specific to Puppet. Let’s define the terminology, no matter your prior experience with either AWS Elastic Disaster Recovery, Puppet, or both.

You have an application running on a server. Let’s call this server the application server. The application server might be running in AWS, in your own datacenter, or in a different cloud. No matter the location of your application server, you want to be able to failover the application as part of your disaster recovery strategy. To prepare for the failover you need to replicate the application server. You run the AWS Replication Agent on your application servers to replicate your data to AWS Elastic Disaster Recovery. The AWS Replication Agent copies the blocks of your application servers’ storage devices to the VPC subnet that you designated as the staging area with AWS Elastic Disaster Recovery. After replicating the application server, AWS Elastic Disaster Recovery can launch EC2 instances in a recovery subnet. The recovery instances run the application with the last recovered state. The architecture diagram uses purple dashed arrows to show this relationship between the application server and AWS Elastic Disaster Recovery replicating the data into the staging area. The purple dashed arrows also show AWS Elastic Disaster Recovery recovering the application into Amazon EC2 recovery instances in your recovery subnet. To learn more about the AWS Elastic Disaster Recovery replication network requirements, you can reference the AWS Elastic Disaster Recovery network diagrams guide.

Instead of installing the AWS Replication Agent manually on each application server, you want to automate the installation at scale using Puppet. The application server which runs the Puppet agent is called Puppet agent node. The Puppet agent is software that ensures that the application server aligns with the desired state of your infrastructure. You use a Puppet manifest file to declare the desired state of your infrastructure. The manifest file defines the desired state of your infrastructure for multiple application servers. Hence the application server needs to be able to pull the latest desired state from a Puppet server. The Puppet server is a server that hosts your manifest files. The diagram uses yellow arrows to depict the relationship of your application servers running the Puppet agent pulling the latest desired state from your Puppet server. The AWS multi-Region architectures for Puppet Enterprise documentation has more reference architectures for using Puppet on AWS.

Figure 2: Puppet and AWS Elastic Disaster Recovery architecture diagram

Prerequisites

For this walkthrough, you should have the following prerequisites:
Your business requirements should define your RPO and RTO. If you define your RPO as seconds and your RTO as minutes, then AWS Elastic Disaster Recovery is a suitable solution. If your business only requires RPO/RTO in 10s of minutes or hours then you should consider other cost-effective solutions: AWS Whitepaper: Disaster Recovery of Workloads on AWS.
Access to an AWS account with permissions to create AWS Identity and Access Management (IAM) roles with AWS Elastic Disaster Recovery permissions, VPC permissions, and permissions to launch Amazon EC2 instances.
Make sure that the Amazon EC2 instances have the right operating system to install the AWS Replication Agent on them.
Check the installation requirements for your operating system.

Walkthrough

Optional: Puppet setup

If you are already using Puppet to automate your infrastructure then you can skip to the next section
Installation of AWS Replication Agent using Puppet

This section is a step-by-step guide to deploy a basic Puppet architecture on AWS. You can follow this guide for a proof-of-concept Puppet setup that you can use to test and understand how you can use Puppet to automate the AWS Replication Agent installation.

Creating an Amazon Virtual Private Cloud (VPC) for Puppet

In the AWS Management Console, search for VPC to get to the VPC dashboard.
Choose the Create VPC button.
In the VPC settings, configure the VPC that you will use for this walk-through.
1. For resources to create keep it as VPC and more.
2. Use puppet for the auto generated name tag.
3. Select 2 for the Number of Availability Zones (AZs).
4. Leave 2 for the Number of public subnets.
5. Select 4 for the Number of private subnets.
6. Select In 1 AZ to create one NAT gateway for outgoing traffic.
7. Leave all the other settings as their default and Create VPC.

Rename subnets

After the VPC creation navigate to the Subnets page.
In the list of subnets rename two of the private subnets that are in the same AZ to puppet-server-subnet and the other to app-subnet. You can rename a subnet by clicking the pencil item when hovering over the subnets name in the table.
Rename one of the private subnets in the second AZ to replication-subnet and rename the other private subnet in the second AZ to staging-subnet.

AWS Systems Manager Quick Setup Host Management

For this example, you can use Sessions Manager, a capability of AWS Systems Manager to connect to the Amazon EC2 instances without having to expose them publicly and without a bastion host. To use Sessions Manager, complete the AWS Systems Manager Quick Setup Host Management.

IAM instance profile for AWS Elastic Disaster Recovery

In the AWS Management Console search for IAM to get to the AWS Identity and Access Management (IAM).
Select Roles and click on Create role to create a new IAM instance profile for the application server that authenticates it with AWS Elastic Disaster Recovery and Sessions Manager to connect to the EC2 instances.
1. Select EC2 as the trusted entity.
2. Add the AWS managed AWSElasticDisasterRecoveryEc2InstancePolicy permissions policy.
3. Add the AmazonSSMManagedInstanceCore permission policy, which is a managed policy and provides the required permissions for Session Manager.
4. Name the role puppet-drs-role.
5. Use Allows EC2 instances to install AWS Replication Agent and to use Sessions Manager as the description for the role.
6. Choose Create Role.

Launching Amazon Elastic Compute Cloud (EC2) for the Puppet Server

In the AWS Management Console search for EC2 to get to the EC2 Dashboard. Click Launch instance to configure an EC2 instance for the Puppet Server.

Name the instance puppet-server.
Select Amazon Linux 2 as the Amazon Machine Image (AMI).
For the Instance type select t3.medium.
You will use Session Manager, a capability of AWS Systems Manager, to connect to the EC2 instance so for Key pair you can select Proceed without a key pair.
Edit the network settings:
1. Select puppet-vpc as the VPC to launch the EC2 instance in.
2. Select puppet-server-subnet for the subnet.
3. Create a new security group with the name puppet-sg and for the description use Puppet communication between instances. Remove the default inbound security group rule for now. You will configure the security groups in the next step.
Expand the Advanced details and for IAM instance profile select puppet-drs-role.
Use the Launch instance button to launch the Puppet Server.

Launch the Application Server

Choose Launch instance to configure an EC2 instance for the Application Server.

Name the instance application-server.
Select Amazon Linux 2 as the Amazon Machine Image (AMI). You need to use an operating system that AWS Elastic Disaster Recovery supports.
For the Instance type select t3.micro.
You will use Session Manager, a capability of AWS Systems Manager, to connect to the EC2 instance so for Key pair you can select Proceed without a key pair.
Edit the network settings:
1. Select puppet-vpc as the VPC to launch the EC2 instance in.
2. Select app-subnet for the subnet.
3. For the security groups use the Select existing security group option and in the Select security groups dropdown select puppet-sg.
Expand the Advanced details and select for IAM instance profile puppet-drs-role.
Use the Launch instance button to launch the Puppet Server.

Configuring Security Group

On the service page for Amazon EC2 scroll down in the navigation to Network and Security to go to the Security Groups.
In the list of security groups select the puppet-sg security group and in the Actions menu edit inbound rules as follows:
1. Add rule:
  1. Type: Custom TCP
  2. Port range: 8140
  3. Source: Custom
  4. puppet-sg security group as the source
  5. Description: Allow Puppet communication
2. Save rules.

Connect to Amazon EC2 instances

Got back to the EC2 instance dashboard and select puppet-server.
Choose Connect and use Session Manager to start a session. Now, you are connected to the EC2 instance.
Got back to the EC2 instance dashboard and select application-server.
Choose Connect and use Session Manager to start a session.

Installing Puppet Server

Open your Session Manager connection to your puppet-server Amazon EC2 instance.
Use sudo -s to start a root user session.
Run yum update -y && rpm -Uvh https://yum.puppet.com/puppet7-release-el-7.noarch.rpm to add the Puppet release repository.
Run yum install puppetserver -y && source ~/.bashrc to install the Puppet server software.
Set the hostname to improve readability when referencing the Puppet server hostnamectl set-hostname puppet.example.com.
Run exit to apply the change.
Use sudo -scommand to get root privileges.
Modify the Puppet configuration file with the following command to configure the Puppet server DNS name:
cat >> /etc/puppetlabs/puppet/puppet.conf << EOF
dns_alt_names = puppet,puppet.example.com
[main]
certname = puppet.example.com
server = puppet.example.com
EOF
Execute puppetserver ca setup to create a certificate (CA) on the Puppet server.
Running puppet resource service puppetserver ensure=running starts the Puppet server.
Copy the output of the command echo `hostname -I` `hostname`. The output should look something like 10.0.XX.XX puppet.example.com and you need to copy it for configuring the Puppet agent in the next section.

Installing Puppet agent on Application Server

In Amazon EC2Instance Connect to your application-server Amazon EC2 instance.
Use sudo -s to start a root user session.
Run yum update -y && rpm -Uvh https://yum.puppet.com/puppet7-release-el-7.noarch.rpm to add the Puppet release repository.
Run yum install puppet -y && source ~/.bashrc to install the Puppet agent software.
Replace 10.0.XX.XX puppet.example.com with your output from configuring the Puppet server and execute echo 10.0.XX.XX puppet.example.com >> /etc/hosts for networking between the Puppet agent and Puppet server.
Use hostnamectl set-hostname puppet-agent1.example.com to change the DNS name of the application server for easier readability.
Run exit to apply the change.
Use sudo -s command to get root privileges.
Modify the Puppet configuration file with the following command to give Puppet agent the DNS name of the Puppet server:
cat >> /etc/puppetlabs/puppet/puppet.conf << EOF
[main]
server = puppet.example.com
EOF
Start the Puppet agent with this command puppet resource service puppet ensure=running.
Run puppet agent --test --ca_server=puppet.example.com to establish a first connection with the Puppet server during which the Puppet agent sends a certificate signing request (CSR).

Accepting the CA signing request on the Puppet Server

Open the connection to your puppet-server Amazon EC2 instance.
Execute sudo -s to start a root user session.
Run puppetserver ca list which shows the CA signing request from your Puppet agent.
Execute puppetserver ca sign --certname puppet-agent1.example.com to sign the CSR.

Now you have a basic Puppet setup. The next step is to follow the Installation of AWS Replication agent by Puppet.

Installation of AWS Replication Agent using Puppet

Before you can use AWS Elastic Disaster Recovery, you have to initialize it in your desired recovery Region, as well as your source Region. Follow the AWS Elastic Disaster Recovery Quick start guide before you continue. Change the following configuration and use the defaults for all the other configurations:

If you followed the Optional: Puppet setup then select staging-subnet for the staging area subnet in the replication server configuration section.
On the configure additional replication settings page select Use private IP for data replication (VPN, DirectConnect, VPC peering) in the data routing and throttling section.
If you followed the Optional: Puppet setup select replication-subnet for the subnet in the basic settings section on the Set default EC2 launch template.

The Installation Instructions guide explains how to install the AWS Replication Agent manually on Linux or Windows systems. Let’s take a look at how these instructions translate to a Puppet manifest file. In Puppet the manifest file defines the desired state of your infrastructure.

If you did not follow our optional Puppet setup and instead are using your existing application server, make sure that the application server has an IAM instance profile with AWSElasticDisasterRecoveryEc2InstancePolicy permissions.

Puppet uses Manifest files with the file main.pp to define the desired state of all the application servers that are under management of a Puppet server, for example the manifest file might require the application server to download a certain file or to execute a command.

Prepare the Puppet Server

Open the connection to your puppet-server Amazon EC2 instance.
Use sudo -s to start a root user session.
Create a manifest file main.pp which includes the recoveryagent module. You will create the recoveryagent to install the AWS Replication agent in the next step:
cat > /etc/puppetlabs/code/environments/production/manifests/main.pp << EOF
node default {
include recoveryagent
}
EOF
Run yum install -y pdk and use export PATH=/opt/puppetlabs/pdk/bin:$PATH to make it executable. You will use the Puppet Development Kit (PDK) to create the recoveryagent module. To learn more about Puppet modules, check out the Puppet Modules Fundamentals documentation.
Change your working directory to
cd /etc/puppetlabs/code/environments/production/modules.
Run pdk new module recoveryagent --skip-interview to create a new module called recoveryagent.
The AWS Recovery Agent installer uses Python. Add Python from Puppet Forge as a dependency of the recoveryagent module. To do so edit recoveryagent/metadata.json (for example with vim) so that the metadata.json file contains the following dependencies array:
// recoveryagent/manifest.json
// […]
"dependencies": [
        {
"name":"puppet/python",
        "version_requirement":">= 6.4.0"
        }
],
// […]
Create an environment variable with the AWS Region that you have designated as your recovery Region: export AWS_DRS_REGION=us-east-1. If you are using a different Region then change us-east-1 to the Region that you initialized AWS Disaster Recovery Service in.
Run the following command to create a new manifest file with the content to install the AWS Replication agent:cat > recoveryagent/manifests/init.pp << EOF
# @summary Backups your server to AWS.
#
# This class installs the AWS Replication Agent
#
# @example
# include recoveryagent
class recoveryagent {
file { 'aws-installer-hashes':
path => '/usr/tmp/aws-installer.sha512',
ensure => 'file',
mode => 'a+',
source => "https://aws-elastic-disaster-recovery-hashes-${AWS_DRS_REGION}.s3.${AWS_DRS_REGION}.amazonaws.com/latest/linux/aws-replication-installer-init.py.sha512",
}
file { 'aws-replication-installer-init':
path => '/usr/tmp/aws-replication-installer-init.py',
ensure => 'file',
mode => '700',
source => "https://aws-elastic-disaster-recovery-${AWS_DRS_REGION}.s3.${AWS_DRS_REGION}.amazonaws.com/latest/linux/aws-replication-installer-init.py",
checksum => 'sha512',
validate_cmd => 'asset_hash=\$(cat /usr/tmp/aws-installer.sha512); computed_hash=\$(sha512sum % | cut -d" " -f1); if [\"$computed_hash\" != \"$asset_hash\" ]; then echo \"Checksum validation failed\"; exit -1; else echo \"Checksum validation correct\"; exit 0; fi',
require => File['/usr/tmp/aws-installer.sha512']
}
exec { 'python-aws-replication-installer-init':
require => File['/usr/tmp/aws-replication-installer-init.py'],
path => '/usr/bin',
command => "sudo python3 /usr/tmp/aws-replication-installer-init.py --no-prompt --region ${AWS_DRS_REGION}",
cwd => '/usr/tmp',
subscribe => File['/usr/tmp/aws-replication-installer-init.py'],
refreshonly => true,
}
}
EOF
Run puppet parser validate recoveryagent/manifests/init.pp to validate the manifest.

Install the AWS Replication Agent on the application server

Before installing the AWS Replication Agent, make sure that you followed the steps in Installation of AWS Replication Agent using Puppet.
Open the connection to your application-server Amazon EC2 instance.
Use sudo -s to start a root user session.
Run puppet agent --test on the application-server Amazon EC2 instance. The application-server will pull the manifest from the Puppet server and apply it. In a production setting, you can automate the agent pulling the latest desired state from the Puppet server.

Let’s understand what the init.pp manifest does. The file resource aws-installer-hashes in the manifest tells the Puppet agent to download a file with the AWS Replication Agent Installer hashes. The Puppet agent use the hashes to validate the integrity of the aws-replication-installer-init file. The file resource aws-replication-installer-init in the manifest is the Python script that AWS provides to install the AWS Replication Agent. Lastly, the exec resource python-aws-replication-installer-init instructs the Puppet agent to run the aws-replication-installer-init.py script which will install and initialize the Replication Agent on the application server.

The Puppet agent passes the –no-prompt flag to the AWS Replication Agent Installer so that it runs without human input. Without the flag the AWS Replication Agent Installer queries the user for input. The AWS Replication Agent Installer parameters guide documents the other command line input for the AWS Replication Agent Installer. For example, you could append –devices /dev/xvda to the python-aws-replication-installer-init exec command if you only want to replicate specific drives. By default the AWS Replication Agent automatically discovers and replicates all physical disks.

The Puppet agent on the application server pulls the latest desired state from the Puppet server and updates its state in accordance to your main.pp manifest. This will result in the installation and initialization of the Replication Agent on the application server. During the initialization of the Replication Agent, it will connect to AWS Elastic Disaster Recovery. You can go to the AWS Elastic Disaster Recovery console which will show the replication steps that your application server is in.

Data replication status when the replication runs successfully

Figure 3: AWS Elastic Disaster Recovery console application server replication status

Now your application server is replicating to AWS Elastic Disaster Recovery. To get ready for a failover, you can follow the AWS Elastic Disaster Recovery Workshop Configure Launch Settings section to configure the launch settings for your application server. Read the AWS Elastic Disaster Recovery Failover and fallback overview guide to learn how to prepare and perform a failover.

Cleaning up

To avoid incurring future charges, delete the resources.
Navigate to AWS Elastic Disaster Recovery console in the target Region. Select the source servers you created. In the Actions menu select Disconnect from AWS. The AWS Replication Agent on these instances is uninstalled and the staging resources are deleted. You can click on Delete Server to delete this server as a source for AWS Elastic Disaster Recovery.
On your puppet-server delete the manifest file: rm /etc/puppetlabs/code/environments/production/manifests/main.pp and
rm -r /etc/puppetlabs/code/environments/production/modules/recoveryagent/manifests

If you created Amazon EC2 instances in the Optional: Puppet setup paragraph go to the Amazon EC2 console terminate the instances. Delete the NAT Gateway in your puppet-vpc. Delete your VPC puppet-vpc in the VPC console. In the IAM console delete the puppet-drs-role IAM role.

Conclusion

In this blog post, we demonstrated how to use Puppet to automate the management of AWS Elastic Disaster Recovery. This solution reduces operational overhead to manage source servers at scale and simplifies the implementation of a disaster recovery strategy.

Try the AWS Elastic Disaster Recovery workshop to explore step-by-step the functionality AWS Elastic Disaster Recovery. In case, you want to use a different automation tool you can use AWS Systems Manager.

AWS Cloud Operations & Migrations Blog