AWS Storage Blog
Managing AWS Elastic Disaster Recovery launch templates at scale
It’s important to have a disaster recovery (DR) plan in place that helps operations continue in the event of a natural, physical, or technology-based disaster. To increase the chances of success in the case of an outage event, recovery should be repeatable, scalable, and tested often. Without the proper tools in place, setting up such a recovery can potentially become burdensome, as infrastructure continually changes based on business demands, making ensuring repeatability and scalability difficult.
AWS Elastic Disaster Recovery (DRS) is the recommended service for DR to AWS. Operated from the AWS Management Console, Elastic Disaster Recovery helps you recover all of your applications and databases that run on supported Windows and Linux operating systems. They can then run natively within Amazon Elastic Compute Cloud (Amazon EC2) in the event of a DR event or drill.
Elastic Disaster Recovery uses unique EC2 launch templates to manage the configuration of the failed over infrastructure. To manage these launch templates at scale, you can use automation to create a repeatable and auditable process.
In this post, I demonstrate a solution for automating the setup of DRS launch templates using AWS Lambda functions and AWS Simple Storage Service (S3). The proposed architecture allows replicating machines, that share the same tag, to use the same launch template.
Solution overview
The following diagram illustrates the solution workflow.
This solution is comprised of the following components:
- An Amazon S3 bucket for storing launch templates in the form of json files.
- An AWS Lambda function that pulls down a json launch template from the bucket. It then updates DRS servers that are tagged with the prefix of that json. This function is called set-drs-templates.
- An AWS Lambda function that runs on a schedule and scans for any new replicating servers with a tag that matches one of the existing templates in the bucket. This allows new servers that are added to AWS Elastic Disaster Recovery to inherit the tagged launch template.
Full code for the solution can be found in the aws-samples GitHub repository.
Prerequisites
This solution requires active servers in DRS. For more information on getting started with DRS reference the quick start guide.
The command line interface (CLI) provides deployment instructions that use the AWS CLI.
This solution includes creating Lambda functions that make application programming interface (API) calls to DRS, EC2, and S3. You must have a role with the proper permissions to access all three services. You can create a role with the provided policy.json to give the solution the proper API access.
The policy has been created to only allow the minimum required permissions to ensure the solution is functional.
Deployment
You can deploy the solution in three main steps: create the Lambda functions, create the S3 bucket trigger, and create a template.
Create the AWS Lambda functions
- Clone the repository.
git clone https://github.com/aws-samples/drs-template-manager.git
- Create the zip deployment package of the set-drs-templates
cd drs-template-manager
cd cmd-template
zip template.zip drs-template-manager
- Create the zip deployment package of the schedule-drs-templates
cd ../cmd-cron
zip cron.zip template-cron-automation
- Create the schedule-drs-templates Replace $INSERTROLEARN with the Amazon Resource Name (ARN) of the role you created for the solution.
aws lambda create-function \
--function-name schedule-drs-templates \
--role $INSERTROLEARN \
--runtime go1.x \
--handler template-cron-automation \
--package-type Zip \
--zip-file fileb://cron.zip
- Create the set-drs-templates function, replace $INSERTROLEARN with the ARN of the role you created for the solution.
cd ../cmd-template
aws lambda create-function \
--function-name set-drs-templates \
--role $INSERTROLEARN \
--runtime go1.x \
--handler drs-template-manager \
--package-type Zip \
--zip-file fileb://template.zip
- Once the scheduler is created, you must determine how often you would like it to run. Then create a CloudWatch cron event to trigger it. For this example, I create an event rule that triggers once per day at 12:00 PM UTC. Once I make the rule, it needs to be added to the Lambda function as a trigger.
aws events put-rule \
--schedule-expression "cron(0 12 * * ? *)" \
--name template-cron-rule
- Add the schedule-drs-templates function as a target for the rule. Replace $FunctionARN with the ARN of the schedule-drs-templates Lambda function.
aws events put-targets \
--rule template-cron-rule \
--targets "Id"="1","Arn"=$FunctionARN
Create the S3 bucket trigger
- Create an S3 bucket in the same region as the Lambda function.
aws s3api create-bucket \
--bucket $SOMEUNIQUEBUCKETNAME
- Create an Event Notification in the bucket you just created.
- Navigate to the bucket and select the Properties
- Select Create event notification.
- Event name: DRS Template Automation
- The suffix should be .json
- Check the box for All object create events
- Set the destination as the previously created Lambda function.
- Update the cron function to take in the bucket created earlier as an environment variable.
aws lambda update-function-configuration \
--function-name schedule-drs-templates \
--environment Variables={BUCKET=$SOMEUNIQUEBUCKETNAME}
Create a template
The repo comes with an example launch template called Name.json in the cmd-template directory. The prefix of the .json file indicates which tag is updated.
For Example:
- All servers with the tag key Name are updated when Name.json is uploaded to the S3 bucket. Because DRS tags all servers with a Name tag by default, all servers have their template updated.
- Add the tag key DB to all replicating databases. Rename Name.json to DB.json. Change the fields in the template to the values you would like for databases. Then upload DB.json to the bucket you created.
Cleanup:
In order to clean up any changes made:
- Delete the two Lambda functions.
- Delete the bucket that was used for template deployments.
- Delete the CloudWatch rule used to schedule the Lambda functions.
- Delete the new Elastic Disaster Recovery launch template version that was created during this exercise.
Conclusion
A solid DR plan helps operations continue in the event of a natural, physical, or technology-based disaster. Having a repeatable and scalable recovery process in place increases the chances of success in the case of an actual outage. Deploying this solution allows you to group together servers by tag and have them inherit the same launch template. It also allows new servers that are added to Elastic Disaster Recovery and tagged to inherit the associated launch template. This ensures that, in the event of a disaster, the template is appropriately set.
Grouping replicating servers by tag allows you to create unique application or infrastructure groups that share the same template. For example, servers tagged as “WebServer” may use a t3.large instance type, and servers tagged as “DB” may utilize an m5.large instance type. This solves the problem of creating a scalable automated process to manage DRS launch templates.
This solution is a great first step in adding automation to your DRS deployment. I recommend automating the infrastructure failover recovery plan next. This blog post provides one example of how to do so.
More information on DRS automation can be found in the DRS API Specification.