AWS Architecture Blog
Field Notes: Protecting Domain-Joined Workloads with CloudEndure Disaster Recovery
Co-authored by Daniel Covey, Solutions Architect, at CloudEndure, an AWS Company and Luis Molina, Senior Cloud Architect at AWS.
When designing a Disaster Recovery plan, one of the main questions we are asked is how Microsoft Active Directory will be handled during a test or failover scenario. In this blog, we go through some of the options for IT professionals who are using the CloudEndure Disaster Recovery (DR) tool, and how to best architect it in certain scenarios.
Overview of architecture
In the following architecture, we show how you can protect domain-joined workloads in the case of a disaster. You can instruct CloudEndure Disaster Recovery to automatically launch thousands of your machines in their fully provisioned state in minutes.
Scenario 1: Full Replication Failover
Walkthrough
In this scenario, we are performing a full stack Region to Region recovery including Microsoft Active Directory services.
Using CloudEndure Disaster Recovery to protect Active Directory in Amazon EC2.
This will be a lift-and-shift style implementation. You take the on-premises Active Directory, and failover to another Region. Although not shown in this blog, this can be done from either on-premises, Cross-Region, or Cross-Cloud during DR or Testing.
Prerequisites
For this walkthrough, you should have the following:
- An AWS account
- A CloudEndure Account
- A CloudEndure project configured, with agents installed and replicating in ‘Continuous Data Replication’ Mode
- A CloudEndure Recovery Plan configured to boot the Active Directory Domain controller first, followed by remaining servers
- An understanding of Active Directory
- Two separate VPCs, with matching CIDR ranges, and no connection to the source infrastructure.
Configuration and Launch of Recovery Plan
1. Log in to the CloudEndure Console
2. Ensure the blueprint settings for each machine are configured to boot either in the Test VPC or Failover VPC, depending on the reason for booting,
a. These changes can be done either through the console, or by using the CloudEndure API operations.
b. To change blueprints on a mass scale, use the mass blueprint setter scripts (Zip file with instructions).
3. Open “Recovery Plans” section for the project
a. Create a new Recovery Plan following these steps
b. Tip: Add in a delay between the launch of the Active Directory server, and the following servers, to allow Active Directory services to come up before the rest of the infrastructure.
4. Once you have created the Recovery Plan, you can either launch it from the CloudEndure console, or use the CloudEndure API Operations.
*Note: there is full CloudEndure failover and failback documentation.
There are different ways to clean up resources, depending on whether this was a test launch, or true failover.
- Test Launch – You can choose the “Delete x target machines” under the “Machines” tab.
- This will delete all machines created by CloudEndure in the VPC they were launched into.
- True failover – At this time, you can choose to failback as needed.
- Once failback is completed, you can use the same preceding steps as to delete the infrastructure spun up by CloudEndure.
Scenario 2: Warm Site Recovery
Walkthrough
In this scenario, we perform a failover/recovery into a Region with a fully writeable and online Active Directory domain controller. This domain controller is running as an EC2 instance and is an extension of the on-premises, or cross cloud/region Active Directory infrastructure.
Prerequisites
For this walkthrough, you should have the following:
- An AWS account
- A CloudEndure Account
- A CloudEndure project configured, with agents installed and replicating in Continuous Data Replication Mode
- An understanding of Active Directory
- A deployment of Active Directory with online writeable domain controller(s)
Preparing AWS and Active Directory:
For our example us-west-1 (California) will be the source environment CloudEndure is protecting. We have specified us-east-1 (N.Virginia) as the target recovery Region aka “warm site”.
- The source Region will consist of a VPC configured with public and private (AD domain) subnets and security groups
- AD Domain Controllers are deployed in the source environment (DC1 and DC2)
Procedure:
1. Set up a target recovery site/VPC in a Region of your choice. We refer to this as the warm site.
2. Configure connectivity between the source environment you are protecting, and the warm site.
a. This can be accomplished in multiple ways depending on whether your source environment is on-premises (VPN or Direct connect), an alternate cloud provider (VPN tunnel), or a different AWS Region (VPC peering). For our example the source environment we are protecting is in us-west-1, and the warm recovery site is in us-east-1, both regions VPCs are connected via VPC peering.
3. Establish connectivity between the source environment and the warm site. This ensures that the appropriate routes, subnets and ACLs are configured to allow AD authentication and replication traffic to flow between the source and warm recovery site.
4. Extend your Active Directory into the warm recovery site by deploying a domain controller (DC3) into the warm site. This domain controller will handle Active Directory authentication and DNS for machines that get recovered into the warm site.
5. Next, create a new Active Directory site. Use the Active Directory Sites and Services MMC for the warm recovery site prepared in us-east-1, and DC3 will be its associated domain controller.
a. Once the site is created, associate the warm recovery site VPC networks to it. This will enforce local Active Directory client affinity to DC3 so that any machines recovered into the warm site use DC3 rather than the source environment domain controllers. Otherwise, this could introduce recovery delays if the source environment domain controllers are unreachable.
6. Now, you set DHCP options for the warm site recovery VPC. This sets the warm site domain controller (DC3) as the primary DNS server for any machines that get recovered into the warm site, allowing for a seamless recovery/failover.
Test or Failover procedure:
Review the “Configuration and Launch of Recovery Plan” as provided earlier in this blog post.
Cleaning up
To avoid incurring future charges, delete all resources used in both scenarios.
Conclusion
In this blog, we have provided you a few ways to successfully configure and test domain-joined servers, with their Active Directory counterpart. Going forward, you can test and fine tune the CloudEndure Recovery Plans to limit the down time needed for failover. Further blog posts will go into other ways to failover domain-joined servers.