Microsoft Workloads on AWS

Hybrid Active Directory: Disaster recovery, cyber resiliency, and high availability solutions on AWS

In this blog post, we will present comprehensive strategies and techniques to strengthen disaster recovery and bolster cyber resilience in hybrid Microsoft Active Directory (AD) environments running in the AWS Cloud by leveraging AWS Elastic Disaster Recovery (AWS DRS) , AWS Backup, and solutions for extending AD onto Amazon Elastic Compute Cloud (Amazon EC2) instances.

Introduction

In the realm of identity management, where many corporate organizations depend on AD, customers often seek guidance on implementing a disaster recovery strategy for their self-managed AD infrastructure in AWS. Ensuring the resilience of critical systems is imperative, with AD serving as a fundamental component for user and computer authentication.

This post presents three strategies and techniques for creating a resilient disaster recovery plan for hybrid AD running in the AWS Cloud. Each providing benefits, considerations, and configurations for disaster recovery strategies based on specific requirements, risk profiles, and resources.

Solution overview

Solution 1: High availability for self-managed AD on AWS

For organizations desiring custom configurations and greater control over their AD infrastructure, a customer-managed hybrid AD setup is advantageous. This approach is particularly effective when used with AD sites. This ensures high availability and seamless transitions between on-premises AD domain controllers and those hosted on AWS (Figure 1).

Figure 1: Hybrid AD in AWS

Refer to this post, Securely extend and access on-premises Active Directory domain controllers in AWS for step-by-step instructions on extending your self-managed AD to AWS.

Prerequisites

  1. Prerequisites and components necessary for setting up a hybrid AD configuration.
  2. Primary AD infrastructure in an on-premises datacenter.
  3. Customer-managed AD domain controllers running on Amazon EC2.
  4. Secure and low latency connectivity between on-premises and AWS.
  5. Configuration of AD replication topology using AD Sites and Services.
  6. Implement security best practices for hardening Amazon EC2 instances hosting AD domain controllers.

Walkthrough

Step 1: User authentication request initiation.

  • The user initiates an authentication request, such as logging into a system or accessing a resource.
  • This request triggers the authentication process within the Active Directory (AD) environment.

Step 2: AD Sites and Services determines the suitable AD domain controller

  • AD Sites and Services evaluates network topology and site configurations to determine the closest and most suitable AD domain controller.
  • This ensures efficient authentication by routing requests to the appropriate domain controller based on network location.

Step 3: Seamless user experience during failover to self-managed AD running in AWS.

  • In case of a failover event, where on-premises AD services are unavailable, the system seamlessly switches to a self-managed AD instance running in the AWS environment.

Benefits

This solution leverages AD sites for automatic and seamless redirection to self-managed AD running on AWS, ensuring continuous availability even during on-premises AD disruptions. This approach significantly enhances the availability objective, enabling swift resumption of services with minimal downtime, thus reinforcing the resilience of corporate applications dependent on AD for authentication.

The synchronization between on-premises AD and self-managed AD on AWS using native AD replication ensures continuous data availability and integrity and maintaining operational continuity even in adverse conditions.

Solution 2: Using AWS Backup for hybrid AD disaster recovery

To safeguard AD data, we can implement a two-tiered daily backup strategy.

First, perform a system state backup to an Amazon Elastic Block Store (Amazon EBS) volume, essentially serving as an offsite copy of your backup. For more details on system state backup see this post, Active Directory: Automate System State Backup.

Second, automate the creation of Amazon Machine Image (AMI) backups of the domain controller with AWS Backup. These Amazon AMIs facilitate the restoration of the identical operating system instance during forest recovery, in line with Microsoft’s recommendations.

Reference architecture for solution (Figure 2).

Figure 2: Hybrid AD disaster recovery with AWS Backup

Refer to the post, Automate disaster recovery for your self-managed Active Directory on AWS for detailed steps on executing recovery drills using AWS Backup.

Walkthrough

Step 1: Configure daily system state backup of AD to Amazon EBS

  • Create an encrypted Amazon EBS volume and attach to one of the domain controller on Amazon EC2 instance, ensuring Amazon EBS volume is available for windows
  • Set up an automated system state backup to the Amazon EBS volume.

Step 2: Protect Amazon EBS volume and domain controller instance with AWS Backup

  • Utilize AWS Backup to create a new backup plan to take daily snapshots of the Amazon EBS volume containing AD data.
  • Establish a second backup plan for monthly Amazon AMI backups of the domain controller, aligned with your Windows patch schedule, ensuring all patches are included in the Amazon AMI.

Step 3: Restore the Amazon EBS volume from AWS Backup that contains the system state backup.

  • Restore the domain controller from AWS Backup and attach the restored Amazon EBS volume that contains the system state backup to your Amazon EC2 instance.
  • Restore the AD forest to an isolated environment.

Benefits

AWS Backup presents distinct advantages for securing AD against ransomware attacks and bolstering resilience in disaster recovery situations. Scheduled backups can be configured at desired intervals, allowing organizations to tailor backup frequencies to business needs. This empowers them to define a precise Recovery Point Objective (RPO) and establish an acceptable data loss window.

AWS Backup offers a variety of restore options tailored for different recovery scenarios. This flexibility enables businesses to achieve a targeted Recovery Time Objective (RTO) and ensures a robust and resilient approach to AD protection in the event of a disaster.

Solution 3: Using AWS DRS for hybrid AD disaster recovery

AWS DRS minimizes downtime and data loss by offering fast, reliable recovery for both on-premises and AWS applications. This is achieved using affordable storage, minimal compute resources, and point-in-time recovery. AWS DRS enhances IT resilience for on-premises or cloud-based applications running on supported operating systems.

Reference architecture for solution (Figure 3).

Figure 3: Hybrid AD disaster recovery with AWS DRS

Key Features of AWS DRS

  • Conduct non-disruptive recovery and failback drills at regular intervals.
  • Seamlessly transform servers to boot and operate natively on Amazon EC2 during instance launches for drills or recovery.
  • Swiftly launch recovery instances on AWS within minutes, utilizing the most current server state or a designated previous point in time for application recovery.

Once applications are running on AWS, you have the flexibility to either maintain them there or initiate data replication back to your primary site once the issue is resolved. The option to fail back to your primary site is available whenever you deem it appropriate.

In the upcoming section, we will explain how to protect AD domain controllers using the AWS DRS as the chosen disaster recovery strategy in AWS.

Refer to this post, Protecting domain-joined workloads with AWS DRS for detailed steps on setting up AWS DRS for Microsoft workloads.

Walkthrough

Step 1: Deploying the AWS DRS agent

  • Install the AWS DRS agent on AD servers to enable continuous block-level monitoring of attached volumes, aiming for near-zero Recovery Point Objective (RPO).

Step 2: Performing disaster recovery drills

  • Utilize AWS DRS to conduct non-disruptive disaster recovery drills in an isolated environment, ensuring no impact on the production setup.

Step 3: Initiating recovery during catastrophic events

  • In case of a disaster, use the AWS DRS console to initiate the recovery of AD servers using point-in-time snapshots.

Step 4: Initiating recovery during catastrophic events

  • Post-disaster, implement failback to the original source server or an alternate server by installing and using the AWS DRS failback client.

Benefits

AWS DRS ensures that your recovery systems are ready in case of a disaster. The actual failover is a networking operation performed outside of AWS DRS. Launch your recovery instances with AWS DRS, up to the latest or a specific point-in-time (PIT) snapshot. When you are ready to resume operations on your primary system, perform failback replication.

Conducting drills is crucial for disaster preparedness. In the event of an actual disaster, you can promptly perform a failover by launching recovery instances in AWS based on a selected PIT snapshot.

After the disaster, execute a failback to your original source server or any other server meeting prerequisites by installing the AWS DRS failback client. To use the failback client, generate AWS DRS specific credentials.

AWS DRS enhances your organization’s resiliency by efficiently recovering critical AD applications with minimal data loss, achieving a rapid Recovery Time Objective (RTO) in minutes. Additionally, AWS DRS provides flexibility in selecting precise PIT snapshots, offering recovery options down to seconds.

AWS DRS enables failback to your original source server or any other server within the source infrastructure, providing flexibility and adaptability in post-disaster recovery scenarios.

Important:

If you have many applications and more than one domain controller in your environment, or if you plan to fail over a few applications at a time, in addition to replicating the domain controller with the AWS DRS, we recommend that you set up an additional domain controller on the target AWS site. During testing failovers, the domain controller replicated by the AWS DRS service can be utilized. However, for actual failovers, it’s recommended to utilize the additional domain controller on the target site.

Conclusion

In the realm of disaster recovery and cyber resilience for AD, selecting the appropriate solution is a critical decision that significantly impacts an organization’s ability to maintain business continuity, ensure data integrity, and enhance security. The options: an active-active configuration, AWS Backup, or AWS DRS, each offer distinct advantages and considerations, empowering businesses to tailor their approach based on unique requirements and priorities.

Organizations must thoughtfully evaluate their business needs, infrastructure complexity, budget constraints, and the criticality of AD services when choosing a solution. A pragmatic approach often involves combining these solutions. The goal is to safeguard AD, mitigate data loss, ensure business continuity, and fortify cyber resilience. Achieving this requires implementing robust security measures, conducting regular testing, and staying vigilant to evolving threats in the dynamic IT landscape.


AWS has significantly more services, and more features within those services, than any other cloud provider, making it faster, easier, and more cost effective to move your existing applications to the cloud and build nearly anything you can imagine. Give your Microsoft applications the infrastructure they need to drive the business outcomes you want. Visit our .NET on AWS and AWS Database blogs for additional guidance and options for your Microsoft workloads. Contact us to start your migration and modernization journey today.

Thiruvengadam Viswanathan

Thiruvengadam Viswanathan

Thiruvengadam Viswanathan is a Sr. Cloud Infrastructure Architect at Amazon Web Services with over 19 years of experience in IT consulting. He is passionate about helping customers architect, migrate, and modernize complex enterprise systems onto the AWS Cloud.

Balakrishnan Ramasamy

Balakrishnan Ramasamy

Balakrishnan Ramasamy is a Cloud Infrastructure Architect with over 20 years of experience in project management and IT consulting. His expertise lies in AWS cloud services and Azure, supplemented by a wealth of knowledge in Microsoft technologies such as Active Directory and Office 365.