AWS Storage Blog

How to perform non-disruptive CloudEndure Disaster Recovery tests on AWS

By performing frequent disaster recovery (DR) tests and drills, your organization can prepare for unexpected IT outages caused by ransomware, human error, and other disruptions. Some organizations avoid DR testing because their testing procedures are time-consuming or costly, or because they cannot test without disrupting business. This can mean that they are unprepared to implement their DR plan if an IT outage occurs. CloudEndure Disaster Recovery on AWS enables a simple mechanism for you to test recovery of your source environment at scale, without performance impact. You can use the same process to test and recover physical, virtual, and cloud source servers on AWS. Testing your DR plan regularly helps you verify that you can meet recovery objectives if you must recover applications following an IT disruption. In this post, I review the straightforward steps and best practices to perform non-disruptive DR tests using CloudEndure Disaster Recovery on AWS.

How CloudEndure Disaster Recovery testing works

During normal business operation, CloudEndure Disaster Recovery continuously replicates data on your source servers to a low-cost staging area in your target AWS Region. When you launch source servers for tests or recovery, CloudEndure Disaster Recovery triggers a highly automated machine conversion process and a scalable orchestration engine. These enable you to quickly spin up your source servers in your target AWS Region. Since data replication to AWS occurs at the block level, you can use the same, simple process to test and recover all of your applications and databases running on a wide range of supported OS versions. If a security incident, hardware failure, or other disruption causes an IT outage, you can quickly recover your environment by following the same set of simple steps that you use for DR tests. The servers you launch during CloudEndure Disaster Recovery tests will operate in the same way on AWS as the servers you launch for recovery on AWS.

Implementing a Well-Architected Framework approach to DR involves regularly testing and validating your DR plan. When you launch source servers for DR testing, you are launching a copy of your servers in your target AWS Region from the point in time you select. You can perform secure tests in an isolated environment in your CloudEndure Disaster Recovery Blueprint, which defines the settings and configuration of your launched test instances. Your Blueprint settings enable isolating your test instances by using a separate subnet or different security groups to avoid network conflicts. You can also launch test instances in a separate AWS account to further isolate your testing and production environments. CloudEndure Disaster Recovery automatically provisions the resources needed to launch your servers on AWS.

You can use CloudEndure Disaster Recovery to run a virtually unlimited number of tests, as often as you choose. There are no additional fees for testing, beyond payment for the provisioned resources generated during tests. You can minimize costs by performing some of your DR tests using smaller Amazon EC2 instances, rather than fully provisioning resources at scale.

You should perform DR tests as your final implementation step when you deploy CloudEndure Disaster Recovery in your environment. Maintain disaster readiness during ongoing operations by performing DR tests anytime you make changes to your replicated source environment, such as when you add new servers or modify resource properties.

CloudEndure Disaster Recovery testing steps

You are able to test server recovery after your source servers have completed initial sync to your target AWS Region.

Navigate to the Machines page on the CloudEndure console. Here you will see the following indications that your source servers can be tested:

  • The DATA REPLICATION PROGRESS column indicates that replication has reached Continuous Data Protection mode.
  • The DISASTER RECOVERY LIFECYCLE column indicates Ready for Testing.
  • Machines that have not yet been tested display an orange flag icon in the STATUS column.

Navigate to the Machines page on the CloudEndure User Console. Here you will see the following indications that your source servers can be tested

You can then perform the following steps to launch source servers on AWS for DR testing:

1. Select machines to launch in Test Mode.

Select machines for DR testing by checking the box to the left of each machine. You can select your entire environment, a group of machines comprising one or more applications, or a single machine to launch for testing in your target AWS Region.

Then open the Launch Target Machines menu and choose Test Mode.

Then open the Launch Target Machines menu and choose Test Mode.

The servers you select will be launched according to the networking and security groups you previously defined in your CloudEndure Disaster Recovery Blueprint.

2. Confirm Target machine launch.

Click NEXT on the confirmation message to confirm launch of the Target machine you selected for testing. Note that this action deletes any Target machines you previously launched to test this source machine.

Click NEXT on the confirmation message to confirm launch of the Target machine you selected for testing.

3. Select a Recovery Point.

You can select the latest recovery point, which is the current machine state. You can also select a previous point in time from which to launch the Target machine.

Testing recovery from a previous point in time enables you to prepare for an IT disruption caused by data corruption. This includes use cases such as ransomware, accidental system changes, or database corruption, to recover your source machines on AWS to a point in time before data corruption occurred.

After you select a Recovery Point, click CONTINUE WITH LAUNCH.

After you select a Recovery Point, click CONTINUE WITH LAUNCH.

4. Verify Target machine launch.

You will see the following indications on the Machines page that the Target machine launch is complete:

    • The left edge of each tested Source machine is green.
    • A purple icon appears in the STATUS column, indicating that a Target machine has been launched for the Source machine you selected. The orange flag icon also disappears from this column.
    • The DISASTER RECOVERY LIFECYCLE column shows Tested Recently.

Verify Target machine launch

5. Validate operation of launched instances.

Navigate to the TARGET tab on the Machine Details pane. Here you will see an indication that the Test machine launched under the Machine Dashboard column.

Validate operation of launched instances.

Log in to the Console, and open the EC2 dashboard to see the running instances and perform any necessary validation of your launched applications.

open the EC2 dashboard to see the running instances and perform any necessary validation of your launched applications

If necessary, make any configuration updates and then revalidate.

Cleaning up

After your DR testing is complete, you can delete your launched test machines through the CloudEndure User Console to avoid incurring future costs. CloudEndure Disaster Recovery will automatically remove resources created during tests when you request to do so or when you launch a new test. You can prevent this cleanup by enabling termination protection.

Conclusion

You can follow the simple steps I described in this post to perform frequent, non-disruptive DR tests using CloudEndure Disaster Recovery on AWS. The purpose of DR testing is to validate that your organization can maintain business continuity with minimal downtime in the event of an IT disruption. Any operational issue that arises when you perform DR tests is an opportunity to identify and resolve issues that could arise during an actual disruption.

Familiarity with testing and recovery processes enables your organization to verify that you can respond quickly if you must recover workloads on AWS. While this post described DR testing for a single server, you can facilitate DR testing at scale by automating testing and recovery processes.

Visit the CloudEndure Disaster Recovery product page to learn more.

Thanks for reading this blog post, if you have any comments or questions, please leave them in the comments section.

Alex Berkov

Alex Berkov

Alex is the manager of the CloudEndure Solutions Architecture team. He joined AWS in early 2019 as part of the CloudEndure acquisition. Alex is focused on helping customers shift and operate their disaster recovery strategy in AWS. A native New Englander, Alex spends his time off with his family on the slopes during the winter and at the beach during the summers.