AWS Storage Blog
Automate post-recovery actions using Amazon Elastic Disaster Recovery
Disaster recovery (DR) and business continuity planning (BCP) are critical activities for any organization. During DR operations, after workloads are restored in the DR site, there are a series of steps and activities, such as application configurations and validations, that must be properly orchestrated and coordinated among many teams and functions in order to ensure that the recovered workload is functional and ready to serve production traffic. This effort is often complex and can be error-prone. The ability to automate and validate post recovery procedures and activities would greatly simplify DR planning and create predictability in the recovery process. Some examples of post recovery validation could include installation of agents or verifying the configuration state of the instance.
AWS Elastic Disaster Recovery is the recommended service for DR on AWS that helps you recover all of your applications and databases that run on supported Windows and Linux operating systems. Elastic Disaster Recovery automates the recovery of your applications utilizing Amazon Elastic Compute Cloud (EC2) and Amazon Elastic Block Store (EBS) during a DR event. Elastic Disaster Recovery allows you to define actions that run automatically after you launch a recovery instance utilizing the post-launch action framework. The post-launch action framework allows administrators and operators to automate any action required to run after an instance is launched to help validate, test, or configure it as part of the recovery plan.
In this post, we cover how to enable the post-launch action framework and create a custom post-launch action that is automatically run in response to a DR event. With the post-launch action framework, operators and administrators are empowered to automate any action required to run after an instance is launched to help validate, test, or configure it as part of the recovery process.
Feature overview
The post-launch action framework allows you to speed up the recovery process by automating manual tasks, such as verification, configuration, and modernization actions. With the post-launch action framework, you can create custom actions for any AWS Systems Manager command or automation document, including documents that you created, were shared with you, or were published by Amazon. In addition, Amazon provides a set of predefined actions, such as the ability to install an Amazon CloudWatch agent or create an Amazon Machine Image (AMI) from the instance.
Prerequisites
If you intend to follow along with this step-by-step guide, the following prerequisites must be met:
- A source machine that has the Elastic Disaster Recovery agent installed
- Elastic Disaster Recovery service initialized in the target AWS Region
Note that to run the post-launch actions, including the execution of Systems Manager documents on a launched recovery instance, certain AWS Identity and Access Management (IAM) roles are required. These roles are usually installed into an AWS account as part of the process to initialize Elastic Disaster Recovery within an AWS Region for the first time.
If you initialized Elastic Disaster Recovery in your account before September 13, 2023, then you do not have the required IAM roles to allow using post-launch actions.
1. To install the required IAM roles navigate to the Elastic Disaster Recovery console, select Default post-launch actions from the Settings menu, then select Install post-launch IAM roles.
Figure 1: Installing post-launch IAM roles
2. Select Confirm on the pop-up window.
Figure 2: Confirming the creation of post-launch IAM roles
Walkthrough
Once the prerequisites have been met, you must activate the post-launch action feature. You have the choice to make it active by default for all newly added sources servers, which we will cover in the subsequent steps. Alternatively, you can activate this feature for a specific source server.
- Activate default post-launch actions
- Add a source server
- Create an AWS Systems Manager command document
- Create a post-launch action corresponding to the command document and set as active for the source server in step 2
- Perform a recovery drill and monitor completion status of the active post-launch actions.
1. Activate default post-launch actions
1.1. Navigate to the Default post-launch actions menu, and then select Edit from the Post-launch action settings pane.
Figure 3: Editing the default post-launch settings
1.2. Select the checkbox next to Post-launch actions active, then select Save.
Figure 4: Saving changes made to default post-launch settings
Once you have activated default post-launch actions, any newly added source servers inherit the settings configured within.
Note that the Enable SSM action is activated automatically and cannot be deactivated. This action installs the Systems Manager Agent that is used to run post-launch actions on your recovery instances after they are launched.
Figure 5: Reviewing the automatically activated Enable SSM action
2. Add a source server
2. Next, select the Source servers menu. Here you can see we have added a new source server. View this AWS service guide for more information on how to add a source server.
Figure 6: Selecting the source servers menu
2.1 Select the newly added source server and select the post-launch settings tab. Note that the settings we configured within the Default post-launch actions menu have been inherited by the source server.
Figure 7: Reviewing the post-launch action setting for a newly added source server
From here we can activate one of the AWS published actions or create a custom one. Let’s create a custom action to perform file permissions, network connectivity, and application checks.
3. Create an AWS Systems Manager command document
3.1. Navigate to the AWS Systems Manager console in the recovery Region, then select Documents under the Shared resources menu.
3.2. For the purposes of this demo, we create a Systems Manager command document. Select Create document, then Command or Session. To supply a name for the document, we use: “DR-Failover-Validation Checks.”
Add the following to the Content pane using the YAML format, then select Create document.
---
schemaVersion: "2.2"
description: "This document performs sanity tests for our failover into AWS."
mainSteps:
- action: aws:runShellScript
name: ValidateLinuxConnectivity
precondition:
StringEquals:
- platformType
- Linux
inputs:
timeoutSeconds: '3600'
runCommand:
- |
#!/usr/bin/env bash
echo 'Verifying connectivity to amazonaws.com'
ping -c 1 amazonaws.com
if [ "$?" -eq '0' ];
then
echo 'Connectivity to amazonaws passed'
else
echo 'Connectivity to amazonaws failed'
exit 1
fi
echo 'Verifying ec2-user home folder has 'drwx------' permissions'
ping -c 1 amazonaws.com
if [[ `ls -ld /home/ec2-user/` == *'drwx------'* ]];
then
echo 'ec2-user home folder contains the expected permissions'
else
echo 'ec2-user home folder contains the unexpected permissions:'
echo `ls -ld /home/ec2-user/`
exit 1
fi
echo 'Verifying aws cli is installed'
which aws
if [ "$?" -eq '0' ];
then
echo 'aws cli is installed'
else
echo 'aws cli is missing'
exit 1
fi
4. Create a post-launch action corresponding to the command document and set as active for the source server in step 2
Next, we create a post-launch action that corresponds to the Systems Manager command document that we just created.
4.1. Navigate to the Elastic Disaster Recovery console and select the previous source server from the Source servers menu. Select Add Action from the Actions menu.
Figure 8: Adding a new post-launch action
4.2. Give the Action a name. In this example, we use DR-Failover-Validation-Checks.
4.3. Leave the Activate this action checkbox selected, and select the Systems Manager document that we created in the preceding step. Select Add action.
Figure 9: Add Post-launch action window
With the Filter by set to Activation status of Active, we now have two Actions that are active for this source server.
Figure 10: Filter post-launch actions by Activation status = Active
5. Perform a recovery drill and monitor completion status of the active post-launch actions
Let’s perform a recovery drill to see how the post-launch actions are in use.
5.1. Select Initiate recovery job and then Initiate recovery drill.
Figure 11: Initiating a recovery drill
5.2. Select the Use most recent data point in time.
Figure 12: Selecting the most recent point in time
5.3. To review the status of the recovery job, navigate to the Recovery job history menu.
Figure 13: Selecting to the Recovery job history menu
5.4. Select the latest recovery job.
Figure 14: Selecting the latest recovery job
5.5. Review the job card and wait for the recovery job to reach the Status of Completed.
Figure 15: Reviewing the Recovery job history log
5.6. Once the recovery job has completed, let’s check the status of the post-launch actions on the recovery instance that has just launched. Navigate to the Recovery instances menu and select the newly created recovery instance.
Figure 16: Selecting the newly created recovery instance
5.7. From the recovery instance, instance information, we can view the status of the Post-launch actions status from the Overview pane (point 1). We can also see the run results of each individual action (point 2) along with links to any associated CloudWatch logs for diagnostics (point 3).
Figure 17: Recovery instance post-launch actions menu
Cleaning up
To avoid incurring unwanted AWS costs after performing these steps, delete the AWS resources created. These include any source servers or recovery instances that were created for this exercise.
Conclusion
In this post, we introduced you to the Amazon Elastic Disaster Recovery post-launch action framework. Through a step-by-step guide, we covered how to first enable post-launch actions, and then how to use existing actions or create your own custom actions. Finally, we demonstrated the recovery process and how you can monitor and review the completion status of active post-launch actions.
The ability to automate and validate post recovery procedures and activities simplifies DR planning and creates predictability in the recovery process. The post-launch action framework allows you to define actions that run automatically after launching recovery instances. With the post-launch action framework, operators and administrators are empowered to automate any action required to run after an instance is launched to help validate, test, or configure it as part of the recovery process.
Thanks for reading this post. If you have any comments or questions, don’t hesitate to leave them in the comments section.