Protecting domain-joined workloads with AWS Elastic Disaster Recovery

Disaster recovery (DR) solutions for workloads that are domain-joined to Microsoft Active Directory (AD) must take into account the AD requirements of those workloads. A domain-joined workload will expect to find an AD controller to provide keys services like DNS and security related services including user and machine-based authentication. If the AD requirements are not considered, the workload launched can be unusable and lead to extended service outages. Depending on the application criticality, each minute of service downtime can have serious financial impact.

AWS Elastic Disaster Recovery Service (DRS) provides scalable, cost-effective application recovery to AWS. AWS DRS is used to protect workloads, whether they are running on-premises, in AWS, or in another hosting or cloud provider. AWS DRS provides robust, non-disruptive continuous replication that can provide RPOs of seconds and RTOs.

In this post, we discuss two specific scenarios, a full environment replication and a warm site recovery approach that will allow users to protect workloads and make sure that they have a target for DNS and authentication requests. We walk through these scenarios using AWS DRS to launch an AD server or extend AD into an AWS environment over private network connections. By following one of these scenarios, you can provide the AD requirements of the workload during a disaster recovery event.

Architecture overview

The following diagram shows the basic architecture of AWS DRS. Replication agents are installed on the source hosts. The source host volumes are block-level replicated to lightweight replication server(s) running inside of the customer’s VPC. For a list of supported operating systems for AWS DRS, see the Supported operating systems guide in the AWS DRS documentation.

Figure 1: AWS DRS Architecture

Figure 1: AWS DRS Architecture

As part of a disaster recovery plan, in particular with domain-joined Windows workloads, a common question is: “What about Active Directory during testing or an actual failover?” In this post, we explore several options related to the deployment of Microsoft AD as it applies to AWS DRS.

Scenario 1: Full environment replication

In this scenario, we are performing a full lift-and-shift style on-premises to AWS recovery including the application and Microsoft AD services. To avoid issues with DNS or authentication, we utilize resource tags as the organizational mechanism to make sure that the domain controllers are launched and online before any application servers.

Prerequisites

For this walkthrough, you should have the following:

An AWS account
AWS DRS agents installed on workloads and an AD server
Launch templates configured
An understanding of AD
Two separate subnets, one utilized for replication by AWS DRS, and one for DR failover (with a CIDR range that matches your on-premises range)

Procedure

From the AWS Management Console, select the AWS DRS service.
Go to the Source servers tab.
Select your AD server from the list of available source servers.

a. Select Actions.
b. Then select View server details.

4. Navigate to the Tags section.

a. Select Manage Tags.
b. Choose Add new tag.
c. Enter in a custom key:value (For our scenario, we choose wave1:True).
d. Select Save.

5. Repeat these steps for the application servers.

a. For these servers (and further waves) change the tag to wave2:True and so forth.

6. Return to the Source servers tab.

Now we can filter our servers based on their tags that we just implemented.

Use the filter section under the Source servers page heading

a. For the filter, enter your tag that you set for your AD server.
b. For this post, we use wave1:True.

2. Select that server, and choose Initiate recovery job, then Initiate recovery.
3. Choose the Use most recent data option.
4. Choose Initiate recovery.

Once you have launched your AD server, you can wait until you have validated that the instance has launched via the AWS DRS console or the Amazon Elastic Compute Cloud (Amazon EC2) console. Once that instance has been launched, you can repeat the previous instructions for your application workload. Theses steps can also be automated using other AWS services, as shown in this post.

Scenario 2: Warm site recovery

In this scenario, we perform a failover/recovery into an AWS Region with either a fully-writeable or read-only (RODC) domain controller. An EC2 instance is deployed that will be running self-managed AD. This instance will handle AD authentication and DNS for machines that are launched from the AWS DRS service into the recovery VPC.

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account
AWS DRS settings configured
An understanding of AD
A deployment of AD with online-writeable or read-only domain controller(s)

Preparing AWS and AD

In this example, the corporate data center is the source environment that AWS DRS is protecting. For AWS, the us-east-1 (N. Virginia) region is the target recovery site or “warm site”. Network connectivity from the on-premises data center is provided through AWS Direct Connect or AWS Site-to-Site VPN. Network traffic from both AD replication and AWS DRS protected hosts flow through this connection.

After the domain controller is in place and functional, it is available to provide both user and machine authentication. To avoid affecting your production environment during recovery drills, see the post: Avoid affecting your production environment during migration with AWS Application Migration Service. Although this post was written for the AWS Application Migration Service, the same principles also apply to AWS DRS.

Figure 2: Warm site recovery

Figure 2: Warm site recovery

Procedure

Create one or more Amazon Virtual Private Clouds (Amazon VPCs) in the region of your choice creating at least one subnet for AWS DRS staging and one or more subnets to launch recovery instances.
Establish network connectivity between the corporate data center and AWS.
Configure route tables and network ACLs that will allow for replication network traffic. Create a security group that will allow AD replication traffic from on-premises. This will be applied to the EC2 instance that is running AD. On-premises AD will be extended by deploying a domain controller into the VPC created in step 1 which will be known as the recovery VPC.
Prepare AD:
- From PowerShell or the AD Sites and Services MMC, create a new AD site for the recovery VPC.
```
New-ADReplicationSite -Name "<site name>"
```
- Create a new subnet that matches the VPC CIDR range of the recovery VPC.
```
New-ADReplicationSubnet -Name "<Network CIDR Range>" -Site "<site name>"
```
- Move the recovery domain controller into the new AD site.
```
Move- ADDirectoryServer -Identity "<DC name>" -Site "<site name>"
```

Set the DHCP options for the recovery VPC using either the Console or the AWS Command Line Interface (AWS CLI).

aws ec2 create-dhcp-options \
    --dhcp-configuration \
      "Key=domain-name-servers,Values=<DC IP>" \
      "Key=domain-name,Values=<domain name>" \
      "Key=netbios-node-type,Values=2"

Associate the DHCP options to the recovery VPC using either the Console or the AWS CLI.

aws ec2 associate-dhcp-options \
    --dhcp-options-id <dhcp id> \
    --vpc-id <vpc id>

Initiate recovery of your application servers to the recovery subnet as defined by your launch settings.

Cleaning up

If you deployed the infrastructure while following along with this post and want to avoid incurring future charges, delete the resources used in both scenarios. These resources could include EC2 instances, EBS volumes, and EBS snapshots. See the Amazon EC2 user guide to terminate an instance, delete an EBS volume, or delete an EBS snapshot.

Conclusion

In this post, we provided two different scenarios for disaster recovery of domain-joined workloads, along with their AD counterpart. We explored scenarios using either full environment replication or warm site recovery using DRS and provided the procedures required for implementation.

Providing authentication for domain-joined workloads is a critical part of any disaster recovery runbook. Authentication must be provided for individual servers, applications, and users. To avoid potential extended service outages, understanding and preparing for how AD authentication and DNS is going to be provided during a disaster recovery event should be a top consideration of your runbook. By following the scenarios outlined in this post, you can ensure workloads launched during a disaster event will have the required AD services available at launch time. With the required AD services in place during a disaster event, you can confidently launch workloads and further limit down time during a disaster event.

Ready to get started? Read more about AWS Elastic Disaster Recovery (AWS DRS), AWS Application Migration Service and other posts related to migrations on the AWS Cloud Enterprise Strategy Blog and the AWS Architecture Blog.

Looking for more architecture content? AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, AWS Well-Architected best practices, patterns, icons, and more!

AWS Storage Blog

Protecting domain-joined workloads with AWS Elastic Disaster Recovery

Architecture overview

Scenario 1: Full environment replication

Prerequisites

Procedure

Scenario 2: Warm site recovery

Prerequisites

Preparing AWS and AD

Procedure

Cleaning up

Conclusion

Resources

Follow