Protect Hyper-V workloads with AWS Elastic Disaster Recovery

Disaster Recovery (DR) is a critical process for any organization wanting to maintain business continuity in the event of a disaster such as flood, power failure, or ransomware attack. The DR strategy adopted by organizations is commonly driven by a trade-off between cost and the business impact of the time it takes for the workloads to be made available, as measured by the Recovery Time Objective (RTO), or the amount of tolerable data loss, as measured by the Recovery Point Objective (RPO). Customers often use virtualization to reduce costs and improve the operational efficiency of their infrastructure investments. Hyper-V is one option to implement hardware virtualization provided by Microsoft. When using virtualization, you must still plan for disaster recovery and business continuity in the event of an unforeseen failure.

AWS Elastic Disaster Recovery provides DR for applications on Amazon Web Services (AWS) from physical and virtual infrastructure, as well as cloud infrastructure from other cloud providers. Elastic Disaster Recovery can protect Hyper-V workloads that run on-premises and allow you to achieve RTOs in the minutes and RPOs in the seconds. Elastic Disaster Recovery uses a light-weight staging environment with minimal resources to keep costs down and maintain an up-to-date copy of your source servers on AWS. This contrasts with traditional on-premises DR strategies, which need duplication of resources at a recovery site, often remaining idle and costly. With Elastic Disaster Recovery, the resources in your recovery site are only scaled to production capacity in the event of a disaster.

In this blog, we show how Elastic Disaster Recovery can protect workloads running on Hyper-V. We demonstrate how to set up Elastic Disaster Recovery, perform a recovery into AWS, and then failback to the on-premise environment. Save costs by removing idle recovery site resources, recover your applications within minutes, and use a unified process to test, recover, and fail back a wide range of applications, all without specialized skillsets.

Solution overview

The solution covers two DR flows:

A failover of a Hyper-V Virtual Machine (VM) from an on-premises environment to a recovery Region in AWS.
A failback of the recovered Hyper-V VM on an Amazon Elastic Compute Cloud (Amazon EC2) instance from the recovery Region back to the on-premises environment.

Note: For simplicity, we use the public internet for connectivity in this solution, but you can also configure private connectivity for your replication path as shown in this blog.

Failover solution flow

Figure 1 Failover solution architecture overview

Figure 1: On-premise Hyper-V Disaster Recovery solution architecture using Elastic Disaster Recovery

In the failover flow, the solution is composed of the following components:

An on-premises Hyper-V environment running a supported operating system. This post uses a Windows VM, but the process is also applicable for Linux. Refer to the Elastic Disaster Recovery documentation for a list of supported operating systems.
An Elastic Disaster Recovery replication agent installed on the source server performs block level replication and sends data directly from the source server to the replication server in the ‘blog-staging-vpc’.
The staging area uses a lightweight EC2 instance for the replication server and identically sized low-cost Amazon Elastic Block Storage (Amazon EBS) volume for the staging volume.
When a recovery is initiated, Elastic Disaster Recovery automatically launches a Windows VM in the ‘blog-recovery-vpc’ based on the configured launch settings.
The Elastic Disaster Recovery service provides the interface to configure the replication settings and perform actions such as initiating a recovery and launching failback.

Failback solution flow

Figure 2 Failback solution architecture overview

Figure 2: Failback solution architecture from AWS to the on-premise Hyper-V environment

The failback flow is composed of the following components:

The recovery instance running in the ‘blog-recovery-vpc’.
Use the Failback Client ISO to create a VM on Hyper-V that acts as the target for the recovery instance in the ‘blog-recovery-vpc’. You have the option of failing back to your original source server or to a different server. In this post, we show the failback process to a new VM in the on-premises environment. For more details, refer to ‘Failback Client detailed walkthrough’.

Walkthrough

The following list summarizes the steps to deploy Elastic Disaster Recovery for Hyper-V on premises and validate the failover/failback process:

Set up the AWS Elastic Disaster Recovery service
Install the AWS Replication Agent on Windows VM
Initiate a recovery for the VM
Validate the recovered instance
Perform an on-premises failback of the VM

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account
An on-premises Hyper-V environment running a VM
A staging VPC, staging public subnet, and Internet Gateway (IGW)
A recovery VPC, recovery public subnet, and IGW
An AWS Identity and Access Management (IAM) user with the ‘AWSElasticDisasterRecoveryAgentInstallationPolicy’ and ‘AWSElasticDisasterRecoveryFailbackInstallationPolicy’ managed policies. Note the access key ID and secret access key for the IAM user, as this is necessary for the replication agent and failback client installation process. Refer to creating an IAM user in your AWS account for details. Note that we have used an IAM user for simplicity. As a security best practice, refer to ‘Generating the required AWS credentials’ for generating temporary credentials.
A recovery security group in the ‘blog-recovery-vpc’ that allows TCP inbound access on port 3389 (for RDP) and 1500 (for replication)

Note that if you do not have an available Hyper-V environment, then refer to this GitHub repository for a sample AWS CloudFormation template (used as-is and not supported by AWS) to deploy an infrastructure on AWS that simulates an on-premises Hyper-V environment, along with the supporting Elastic Disaster Recovery network architecture components outlined in this solution.

1. Set up AWS Elastic Disaster Recovery

1. If initializing AWS Elastic Disaster Recovery for the first time, follow the steps in the Set up AWS Elastic Disaster Recovery wizard as shown in the following. This step needs the admin user of your AWS account or an IAM user with roles listed in Initializing AWS Elastic Disaster Recovery. If you have previously initialized AWS Elastic Disaster Recovery, then update the default replication and launch settings with the input parameters shown in the following. In Step 1, set the Staging area subnet to your staging public subnet (such as ‘blog-staging-public-subnet’). Keep the Replication server instance type as ‘t3.small’.

Figure 3 Step 1 of setting up AWS Elastic Disaster Recovery – replication servers

Figure 3: Step 1 of setting up AWS Elastic Disaster Recovery – replication servers

2. Steps 2 (Specify volumes and security groups), 3 (Configure additional replication settings), and 4 (Set default DRS launch settings) configure the replication volumes, security groups, data throttling, retention policy, and default launch settings. Keep the default values for these steps.

3. In Step 5, set the Subnet to your recovery public subnet (such as ‘blog-recovery-public-subnet’) and Security groups to the security group you created.

Figure 4 Step 5 of setting up AWS Elastic Disaster Recovery – default Amazon EC2 launch template

Figure 4: Step 5 of setting up AWS Elastic Disaster Recovery – default Amazon EC2 launch template

4. In Step 6, Review and initialize, review the summary screen and select Configure and initialize.

5. Edit the Default EC2 Launch Template. In Advanced settings, make sure that the Auto assign public IP is set to Yes.

2. Install the AWS Replication Agent on the Hyper-V VM to add it as a source server

Download the exe from ‘https://aws-elastic-disaster-recovery-<REGION>.s3.<REGION>.amazonaws.com/latest/windows/AwsReplicationWindowsInstaller.exe’, replacing the <REGION> with the AWS Region into which you are replicating. If your source server is a Linux VM, then refer to the agent installation instructions for Linux. Using PowerShell, download the replication agent by running:
```
Invoke-WebRequest “https://aws-elastic-disaster-recovery-<REGION>.s3.<REGION>.amazonaws.com/latest/windows/AwsReplicationWindowsInstaller.exe” -outfile “AwsReplicationWindowsInstaller.exe”
```
In PowerShell, run the agent installer file AWSReplicationWindowsInstaller.exe as an Administrator, using the access key ID and secret key ID for your IAM user:

.\AwsReplicationWindowsInstaller.exe –-region ap-southeast-2 –-aws-access-key-id <DRSIAMAccessKeyId> --aws-secret-access-key <DRSIAMSecretAccessKey>

When prompted, press ‘Enter’ to replicate all disks. The following image shows the output for a successful installation of the AWS Replication Agent.

Figure 5 Command line output for the replication agent installation on the windows 2019 server

Figure 5: Command line output for the replication agent installation on the windows 2019 server

3. After the replication agent has been successfully installed, you should see the Windows VM added as a source server in the AWS Elastic Disaster Recovery console.

Figure 6 Windows VM under source servers of the Elastic Disaster Recovery console

Figure 6: Windows VM under source servers of the Elastic Disaster Recovery console

4. After the initial sync is complete, the source server is ready for recovery and we can initiate a recovery.

Figure 7 Status of the Windows VM source server

Figure 7: Status of the Windows VM source server

3. Initiate a recovery

Now that the source server is ready for recovery, we can select the VM hostname and initiate a recovery job by selecting the Initiate recovery option.

Figure 8 Initiation of a recovery in the Elastic Disaster Recovery console

Figure 8: Initiation of a recovery in the Elastic Disaster Recovery console

2. Choose a point in time from which to launch the recovery instances for the source server.

Figure 9 Select a point in time to initiate the recovery job

Figure 9: Select a point in time to initiate the recovery job

3. You can also view the history for every recovery job.

Figure 10 Recovery job history

Figure 10: Recovery job history

4. Once the recovery is complete, a new instance is created for the VM. Select the instance ID for more details on the recovered instance.

Figure 11 Windows VM recovered instance

Figure 11: Windows VM recovered instance

4. Validate the recovered instance

Now that the source server has been recovered, validate that it is functional by using a Remote Desktop Protocol Client (RDP) client to connect to the Window 2019 VM. You should be presented with the Server Manager on the first successful login.

Figure 12 Server Manager after successful login

Figure 12: Server Manager after successful login

2. Create an empty text file (for example, ‘test.txt’), which we use to validate that the failback process contains updates made on the recovered instance.

5. Perform a failback of the Windows VM to the on-premises environment

Under Recovery instances, the Windows VM has a pending action of Use failback client, which is needed to failback the Windows VM to the on-premises environment.
From the Hyper-V bare metal instance, download the Failback Client ISO from ‘https://aws-elastic-disaster-recovery-<REGION>.s3.<REGION>.amazonaws.com/latest/failback_livecd/aws-failback-livecd-64bit.iso’ replacing the <REGION> with the AWS Region into which you are replicating. Using PowerShell, download the Failback Client ISO by running:
```
Invoke-WebRequest “https://aws-elastic-disaster-recovery-<REGION>.s3.<REGION>.amazonaws.com/latest/failback_livecd/aws-failback-livecd-64bit.iso” -outfile “aws-failback-livecd-64bit.iso”
```
Open the on-premises Hyper-V Manager and create a new VM to failback the recovery VM. The post parameters used for the New Virtual Machine Wizard are:

a. Name: Elastic Disaster Recovery Failback VM

b. Choose the generation of this virtual machine: Generation 1

c. Startup memory: 10240 MB

d. Connection: Hyper-VSwitch

e. Installation Options

i. Install an operating system from a bootable CD/DVD-ROM. Image file (.iso): C:\<path to file>\aws-failback-livecd-64bit.iso

4. Start the VM. Once the Linux VM boots, you are prompted for the following:

a. Enter AWS Region to fail back from: ap-southeast-2

b. Enter a custom Amazon Simple Storage Service (Amazon S3) endpoint (leave empty if not relevant):

c. AWS Access Key: Use the access for your IAM user

d. AWS Secret Access Key: Use the secret key for your IAM user

e. Instance ID (such as i-xxxx). Enter input: <instance-id of the Windows VM>

5. The failback process maps the recovery volumes, install the replication software, pair the replication agent with the failback client and perform the replication (as shown in the following).

Figure 13 Command line outputs for the failback process on the VM

Figure 13: Command line outputs for the failback process on the VM

6. After the data replication has completed, select the instance and perform the Complete failback action through the Elastic Disaster Recovery console.

Figure 14 Completed failback action in the Elastic Disaster Recovery console

Figure 14: Completed failback action in the Elastic Disaster Recovery console

7. After the failback has completed successfully, the failback VM becomes the new source server to initiate a recovery. Use an RDP client to log in to the failback VM and verify that the empty text file ‘test.txt’ (created in an earlier step) exists.

Figure 15 Successful failback of the Windows 2019 VM

Figure 15: Successful failback of the Windows 2019 VM

Cleaning up

To avoid incurring future charges, remove the resources that were created in setting up the environment. This includes terminating any launched recovery instances and replication servers created as part of the Elastic Disaster Recovery service. See the Amazon EC2 user guide for details on terminating an instance, deleting an EBS volume, or deleting an EBS snapshot.

Conclusion

It’s critical for organizations to maintain business continuity in the event of a disaster, such as flood or ransomware attack. In this post, we demonstrated how you can protect and recover Microsoft Hyper-V workloads running in an on-premise data center using Elastic Disaster Recovery. We started by initializing the Elastic Disaster Recovery service, followed by installation of the Elastic Disaster Recovery agent. Once this was complete, we were able to achieve continuous data protection between our source server and the replication server. We then initiated a recovery to the target AWS region, in response to a DR event and validated our recovery instance. Once we validated our recovery, we proceeded to failback to the original source environment, completing the DR lifecycle.

AWS Elastic Disaster Recovery takes a pilot light approach to DR, that minimizes downtime and data loss with fast, reliable recovery of on-premises and cloud-based applications using affordable storage, minimal compute, and point-in-time recovery. With Elastic Disaster Recovery customers can achieve RTOs in the minutes and RPOs in the seconds.

Refer to the getting started guide for more details on how you can minimize downtime and data loss with fast, reliable recovery of on-premises and cloud-based applications using Elastic Disaster Recovery. Thanks for reading this post. If you have any comments or questions, then don’t hesitate to leave them below.