AWS Architecture Blog
Field Notes: Building a Disaster Recovery site on AWS for your Azure Workload
Customers running workloads on other clouds, like Azure or GCP, can increase resilience and meet compliance requirements by using AWS as their disaster recovery site. CloudEndure Disaster Recovery provides an easy cross-cloud solution for replicating and recovering workloads from other cloud providers to AWS. It automatically converts your source machines so that they boot and run natively on AWS.
In this blog post, I show you how to use CloudEndure Disaster Recovery to build a DR site on AWS if your primary workload is on Azure. I will build connectivity between Azure and AWS according to CloudEndure requirements, install CloudEndure Agent, failover a server from Azure to AWS, and finally return to normal operations by failing back from AWS to Azure.
When using CloudEndure Disaster Recovery, your source location can be AWS (for Region to Region replication within AWS), VCenter, or ‘Other Infrastructure’. Other Infrastructure includes physical servers, virtual machines on private cloud, or virtual machines on public cloud providers like Azure or GCP. Using CloudEndure Disaster Recovery, you can achieve a Recovery Point Objectives (RPO) of 1 second, and a Recovery Time Objective (RTO) of 5-20 minutes (depending on the operating system used).
The failover process to AWS is the same regardless of source infrastructure. It starts by setting up your CloudEndure Disaster Recovery project, installing CloudEndure Agent and completing the initial replication from source (Azure) to target (AWS), and then performing a failover to target (AWS). However, the failback process (discussed later), will vary depending on your source infrastructure. During failback, CloudEndure DR reverses the replication to the opposite direction, in our case from AWS to Azure, so that the data written on your recovered servers during the disaster can be restored to the original source environment (Azure).
The following architectural diagram depicts the solution I build in this post.
For the purpose of this post, I’m using a sample VM that runs Windows Server 2019 datacenter on Azure in East US 2 Region. I choose US-East-1 as a recovery target Region in AWS. The process is identical for any number of servers and any other supported operating systems.
I perform the following steps:
- Establish connectivity between Azure and AWS
- Complete the networking requirements for CloudEndure Disaster Recovery
- Register for a CloudEndure DR account, create a DR project and get an agent installation token.
- Install CloudEndure Agent on Azure VM
- Configure Blueprint for target server in the CloudEndure DR User Console
- Perform failover from Azure to AWS
- Perform failback from AWS to Azure
- Return to normal operations
Let’s review the details of each step.
1- Establish connectivity between Azure and AWS
CloudEndure supports connectivity using public internet, VPN, or DirectConnect. For this demo, I will create a VPN connection between the two sites. Creating a VPN connectivity between Azure and AWS is no different than a normal Site-to-Site VPN connection setup between AWS and on-premises. I have to do the configurations in parallel on both sides so that both ends of the connection will be synced.
Assuming you have created a Virtual Network and a Virtual Private Cloud (VPC) on Azure and AWS sides respectively, following are the high-level steps to create a VPN connection between the two. For more details review “Create a Site-to-Site connection in the Azure portal.”
On Azure
1-1- Create Virtual Network Gateway with static routing
1-2- Take a note of the public IP of the gateway created. We will use it in the next step.
On AWS
1-3- Create a Virtual Private Gateway (VPG). Select Amazon default ASN for ASN.
1-4 Attach the VPG created to the VPC you plan to use to host your DR.
1-5- Create a Customer Gateway (CGW). Select Static for Routing and provide the public IP address for the Virtual Network Gateway on Azure side noted in step 1-2. For Static IP Address enter the network address of your primary site on Azure.
1-6- Create Site-to-Site VPN Connection. Select VGW and customer gateway created in the preceding step. Select Static routing and provide the IP address for VNet network on Azure.
1-7- Download the Configuration Template
1-8- We must locate the Pre-Shared Key, and the Virtual Private Gateway IP address in the configuration file. Review the file carefully to understand its content. See my example in the following image.
Let’s go back to the Azure Portal
1-9- Create a Local Network Gateway. Provide the Outside IP address noted in step 1-8. In the Address space field provide the DR VPC network address on AWS side.
1-10- Create a connection. You must provide the pre-shared key noted from configuration file in step 1-8.
1-11- Finally, remember to modify the routing table of the subnet on AWS side so that it routes VPN traffic to VGW.
The VPN connection we created on AWS side will have 2 tunnels. I recommend you setup both tunnels for high availability. You must follow the same steps to configure the second tunnel.
At this point, you have a VPN connection between Azure and AWS. In the next step we must add the necessary ports needed for CloudEndure management and replication between Azure and AWS.
2- Complete the networking requirements for CloudEndure
Adjust the security settings on both sides to allow the ports necessary for CloudEndure DR replication and authentication.
Communication over TCP Port 443:
- Between the Source Machines and the CloudEndure Service Manager.
- Between the Staging Area and the CloudEndure Service Manager.
Communication over TCP Port 1500:
- Between the Source Machines and the Staging Area
3- Register for a CloudEndure DR account, create a DR project, and get an agent installation token
Before using CloudEndure DR you need to subscribe to the product on AWS Marketplace. The subscription process is as follows:
- You create a CloudEndure DR account by providing an email address
- Confirm the registration sent to your email.
- Check details in Registering to CloudEndure Disaster Recovery
The first step in using CloudEndure is creating a project. You have two types of projects:
- Migration
- Disaster Recovery
Select Disaster Recovery.
After you create a project you must setup source and destination environments, replication settings and retrieve an authentication token. You use the token to install CloudEndure Agent on source environment servers, in this case Azure VM. You do that by providing credentials (Access key and Secret access key) for an Identity and Access Management (IAM) user in your AWS account that has the necessary permissions to perform CloudEndure related API. The details are listed in the CloudEndure IAM policy.
Next step is to set your disaster recovery source and target. Source can be AWS Region, if you want to perform a Region to Region disaster recovery within AWS. The source is Other Infrastructure, if your source environment is a physical datacenter, virtual environment or any other public cloud.
For Azure, I choose Other Infrastructure. You also must provide the subnet in which the replication server on AWS will be launched in the staging area and the security group it will use. I keep the instance type for the replication server as default.
After you complete the setup in the replication setting screen you are presented with a How to Add Machines page that will give you the installation instructions for CloudEndure Agent and the installation token. Take a note of these details and get ready to switch to the Azure portal.
4- Install CloudEndure Agent
My source environment is a VM that runs Windows 2019 datacenter and has two disks of size 8 and 15 GB. As shown in the preceding screenshot, I download the agent installer for Windows and then RDP to the VM. I run the command to install the agent. This is the output from my screen:
After installation has completed successfully, CloudEndure will start creating the replication server in the staging area using configurations you set up in the Blueprint (more about Blueprint in the next step). It will authenticate the replication server with the CloudEndure Service Manager.
- Download the replication software
- Create staging disks
- Pair CloudEndure Agent on the source VM with the replication server,
- Establish communication between the CloudEndure Agent and the replication server
The replication traffic is encrypted in transit using AES 256 bit, and is carried on TCP port 1500. Make sure you have this port open on Azure side in the egress direction to AWS. CloudEndure then will start to initiate replication and when it finishes, the source VM will show in the dashboard with Continuous Data Protection CDP state in the Data Replication Progress column. The VM at this point is ready to be tested for disaster recovery.
5- Set up the Blueprint
While I’m waiting for the initial data sync to complete and enter Continuous Data Protection state, I choose Machines in the CloudEndure dashboard to go to the Blueprint page.
The Blueprint is a set of instructions on how to launch a target machine. You can define Blueprint properties such as instance type, subnet, security group, IP address and others.
I select my target VM to run on a machine type that’s a copy of source. This means CloudEndure will match an instance type on AWS that provides at least the compute and memory properties on the Azure VM. I also select a subnet that I had already created in my disaster recovery VPC and choose a new IP to be assigned to the target server from that subnet.
6- Perform Failover from Azure to AWS
The overall process for failing over starts with the replication. As we saw in the previous step, the replication starts with initial sync and then it enters the Continuous data protection state.
Multiple Point in Time (PIT) snapshots will then be created. Before you perform an actual failover, I recommend that you launch your target server in Test mode first.
You do that by selecting the machine and then select Launch Target Machine> Test Mode. You then must select the snapshot you want to recover from. CloudEndure DR by default allows you select from 60 snapshots per disk per month. I select Latest and then Recover.
After the Test machine has been created successfully, the Disaster Recovery Lifecycle column will show “Tested Recently” with a green bar to the left of the machine name. This means the machine is now ready to be failed over.
Steps for failover
- Launch the machine in Recovery mode. This is similar to launching machines in Test Mode. Select the latest snapshot.
- Perform the actual Failover. In this step you direct your users’ traffic from Azure to AWS. The steps here vary depending on the tool you use for directing traffic.
- CloudEndure will create the final failed-over machine using the configurations you provided in Blueprint in step 5. At this point, the Disaster Recovery Lifecycle column will show “Failed Over”.
7- Perform Failback from AWS to Azure
Whether you perform the disaster recovery failover as part of an annual DR testing exercise, or as part of an actual disaster, you may want to failback to the source environment after the outage/disaster is over. This functionality is called failback.
CloudEndure DR performs failback by resetting your project and reversing the replication direction from AWS to Azure. All data that was written to your recovery machine (on AWS) during the disaster, will be copied back to machines in your original source infrastructure (Azure).
The actual steps for failback depend on the source infrastructure. In case your source environment is AWS or VMWare, the steps are fully orchestrated. If your source infrastructure is physical or other public cloud providers, a Failback client will be necessary.
Steps for Failback
- Prepare the project for failback by selecting the Prepare for Failback option in the Project Actions menu in the User Console.
The project will show “Preparing for failback to original Source” next to the project type. After a few minutes, the Data Replication Progress column will show “Pair the CloudEndure Agent with the Replication Server”. This state will change only after you complete the installation of the failback client on Azure side in the next step.
Download the Failback Client from Replication Settings. The failback client is an ubuntu ISO image that acts as a standalone replication server. You must install it on the source VM and boot from it so that it starts to initiate the connection necessary to start the replication in the reversed direction, from AWS to Azure.
Prepare Failback client on Azure
Azure currently supports creating virtual machines from VHD. In order to use CloudEndure failback client, you need first to convert it from ISO to VHD format. The detailed steps to do this are outside the scope of this post, but, at a high-level, this is what you do:
- Deploy the failback ISO image to a VMWare workstation.
- Once the installation is complete, power off the VM and copy the resulted VMDK file
- Use the following PowerShell script to convert the VMDK file into VHD. (You must change file locations to reflect your actual environments)
1. Import-Module 'C:\Program Files\Microsoft Virtual Machine Converter\MvmcCmdlet.psd1<br />
2. ConvertTo-MvmcVirtualHardDisk -SourceLiteralPath 'C:\Users\Administrator\Documents\Virtual Machines\failbackdvd\failbackdvd.vmdk' -VhdType FixedHardDisk -VhdFormat Vhd -DestinationLiteralPath C:\DR
- Upload the resulted VHD file to your Storage account on Azure using the steps in Create an Azure Storage Account
- Create an image using the VHD file you uploaded
- Create a new VM using the image you created. This will act as the replication server. Later I will use it to restore the machine data from AWS to Azure after the replication completes. Make sure you attached the same disks to it as in the source machine on AWS described in step 4.
- When I start the newly created VM, I see the following screen indicating that CloudEndure failback client has been loaded and is waiting to for CloudEndure authentication token to authenticate with CloudEndure Console. Details on where to retrieve your authentication token from CloudEndure Console is available in Installing the Agents.
I enter the authentication token and then the ID of the machine I want to replicate. Details to check a machine ID in CloudEndure Console are available in the Machines Page. The replication server will authenticate with CloudEndure Console, and will connect to the source machine on AWS on port TCP 1500 to initiate the replication. Make sure you have a security group to allow to allow this connectivity on Azure and AWS side.
I wait for the replication to complete and enter Continuous Data Protection state on the CloudEndure Console. This indicates the replication direction has been reversed successfully.
I go back to CloudEndure and follow the same steps as in step 6 to launch a Test and then a Recovery instance. One thing to consider here is when launching Test/Recovery instance, the failback client will reboot as part of the launch process. Make sure to manually detach the volume that contains the failback client.
The CloudEndure Console will now show “Pair with CloudEndure Agent”. This status will change only when you return your project to normal operation in the next step.
8- Return to normal operation
At this point, we failed over from Azure to AWS as part of a disaster recovery exercise or an actual disaster. We then performed a failback from AWS to Azure by reversing the replication direction so that all data written during the outage/disaster on AWS side will be replicated back to Azure.
The last step in the process is to return back to normal operations. This will again reverse the replication from the original primary location in Azure to the original recovery target in AWS. You do so by selecting Project Actions> Return to Normal Operation.
The failback has now completed and your replication has been reversed back to its original direction from Azure to AWS.
Clean Up
By the end of this exercise, please make sure you delete the resources you created on both AWS and Azure sides, so you don’t incur charges in the future.
Conclusion
In this blog post, we built a demo environment to demonstrate CloudEndure DR functionality between Azure as a DR source site, and AWS as a DR target. I showed you how to create a VPN tunnel between the sites, start replication, failover from Azure to AWS and then failback from AWS to Azure (using CloudEndure failback client). Finally, I showed you how to return your project to its normal operation by reverting the replication to its original direction that we started with from Azure to AWS. More details can be found in CloudEndure Disaster Recovery.
If you have questions, feel free to ask in the comments. I look forward to hearing from you.