AWS Storage Blog

Private cross-Region disaster recovery with AWS Elastic Disaster Recovery

Editor’s note: Before reading about cross-Region disaster recovery using private connectivity in this blog, learn how to install an AWS DRS agent in a secured network in this blog.

Update 5/7/2024: Steps for creating a security group that allows port 443 from the source VPC added to section “2. Establish private connectivity” and steps for creating an S3 interface endpoint with private DNS enabled in the staging VPC added to section “3. Initialize Elastic Disaster Recovery and install replication agents.”


Many companies are moving their workloads to AWS to benefit from its highly resilient, cost-effective nature. Although the cloud is highly resilient, it’s still beneficial to prepare for rare, but highly impactful disaster scenarios. Some companies want to put a disaster recovery (DR) solution in place but must maintain secure and private connectivity for security or compliance reasons.

In this post, I show how you can use AWS Elastic Disaster Recovery to set up a cross-Region disaster recovery solution that uses private connectivity — connectivity that doesn’t traverse the public internet. With this solution, you can make sure your critical data and workloads are safe with a DR solution that meets security and compliance needs.

Solution overview

The following is an architectural diagram that shows what I build in this post. Note that all of the communication is within the AWS Cloud backbone and never goes outside to the public Internet. VPC Peering is used in this architecture, but it can be replaced with an Amazon Transit Gateway if desired. See the blog post “Building a global network using AWS Transit Gateway Inter-Region peering” for more information.

Private cross-Region disaster recovery with AWS Elastic Disaster Recovery

Prerequisites

For this walkthrough, you should have the ability to:

Walkthrough

At a high level, you’ll perform the following steps:

  1. Create an AWS account, Amazon Virtual Private Clouds (VPC), and subnets for DR purposes.
  2. Establish private connectivity between the source and target Region, as well as with Elastic Disaster Recovery.
  3. Initialize Elastic Disaster Recovery and install replication agents.

Terminology

When going through the steps in this post, I switch back and forth between different AWS accounts, Regions, and VPCs. It’s important to be aware of which account, Region, and VPC you’re changing. The following table helps explain some of the terminology included in this post.

Term Description
Source account AWS account that you want to setup for DR
Target account AWS account that will house your DR resources (different from your source account)
Source Region AWS Region that contains resources that you want to include in DR. Example: us-east-1
Target Region AWS Region that will house your DR resources (different from the source Region). Example: us-west-1
Source VPC VPC that contains the resources you’re setting up for DR
Staging VPC VPC that will be used for Elastic Disaster Recovery purposes (not the same as the VPC that will be used to deploy the recovered resources)
Target VPC VPC that will contain the recovered resources when a drill or recovery event is performed

1. Create AWS DR account and VPCs

First, create a new AWS account (target account) to house the resources that will be created for DR purposes.

Next, create two new VPCs within the target account.

  1. Create a staging VPC (target Region)
    • You only need to create private subnets in this VPC (NAT gateway not required).
    • This VPC can’t have the same Classless Inter-Domain Routing (CIDR) block as the source VPC.
    • Both DNS hostnames and DNS resolution must be enabled.
  2. Create a target VPC (target Region)
    • Note that if you have a requirement that your recovered resources in the target VPC have the same IP as they had in the source VPC, then this target VPC must have the same CIDR as the source VPC.

2. Establish private connectivity

There are two places to be concerned about when it comes to private connectivity (i.e., none of the communications go over the public Internet):

  1. Communications between the source VPC and the staging VPC
  2. Communications between the staging VPC and Elastic Disaster Recovery

The following steps set up this private communication:

  1. Establish VPC peering between the source VPC (source Region) and the staging VPC (target Region).
    • This allows the replication from the source VPC (source Region) to the staging VPC (target Region) to take place privately.
    • At this point, you must choose which staging VPC (target Region) subnet into which you want the Elastic Disaster Recovery replication server to deploy. Note this subnet ID, as you will use it later.
    • While establishing your VPC peering connection, you must set up the VPC route tables to allow traffic to flow from source VPC to staging VPC via the peering connection. See the following table for guidance on what the route tables must look like for each of those VPCs.
Route table Destination Target
Source VPC <Source VPC CIDR> local
<Staging VPC CIDR> pcx-<staging VPC peering connection ID>
Staging VPC <Staging VPC CIDR> Local
<Source VPC CIDR> pcx-<source VPC peering connection ID>
  1. Create a security group (target Region) for the VPC Endpoints that are set up in the next step.
    • Ensure that the security group allow the HTTPS traffic from source VPC CIDR as well as the staging area subnet.
    • Configure the inbound and outbound rules with the following:
Type Protocol Port range Source/Destination
HTTPS TCP 443 <Staging VPC CIDR>
HTTPS TCP 443 <Source VPC CIDR>
DNS UDP 53 <Staging VPC CIDR>
Custom TCP TCP 1500 <Staging VPC CIDR>
  1. Create VPC Endpoints (target Region).
    • These allow the communication between your staging VPC subnet and Elastic Disaster Recovery to take place privately.
    • Create the following VPC Endpoints and use the previously created security group.
      1. The VPC interface endpoint for Elastic Disaster Recovery that is associated with the staging VPC subnet you noted previously.
        • Service name: amazonaws.<target-region>.drs.
        • Make sure to check the box for Additional settings -> Enable DNS name.
      2. VPC interface endpoint for Amazon Elastic Compute Cloud (Amazon EC2) that is associated with the staging VPC subnet you noted previously.
        • Service name: <target-region>.amazonaws.com.
        • Make sure to check the box for Additional settings -> Enable DNS name.
      3. VPC interface endpoint for Amazon S3 that is associated with the staging VPC subnet you noted prior.
        • Service name: amazonaws.<target-region>.s3.
        • Make sure to check the box for Additional settings -> Enable DNS name.
      4. VPC gateway endpoint for Amazon Simple Storage Service (Amazon S3) that is associated with the route table that is assigned to the staging VPC subnet you noted previously.
        • Service name: amazonaws.<target-region>.s3.

If you’re familiar with VPC peering and want to use the same IPs in your target VPC, you may wonder how the source VPC and target VPC CIDR blocks won’t conflict. This is part of the benefit of using Elastic Disaster Recovery. Elastic Disaster Recovery uses a staging area to replicate the servers into. This means you only need to VPC peer from your source VPC to the staging VPC (not the target VPC). Elastic Disaster Recovery (with the help of standard Amazon EC2 launch templates) takes care of deploying the staged servers into the target VPC during a drill or recovery event.

3. Initialize Elastic Disaster Recovery and install replication agents

Now that the accounts, VPCs, and private communication paths are established, set up Elastic Disaster Recovery and get the replication agents installed.

  1. Initialize Elastic Disaster Recovery (target Region) by setting up the default replication settings
    • In the Staging area subnet setting, choose the same staging VPC (target Region) subnet with which you previously peered.
    • In the Data routing and throttling settings, choose “Use private IP…”

use-private-ip

2. Follow these steps to install the replication agent on your source VPC resources (source Region).

3. Optional: If recovered resources in the target VPC are required to have the same IP as they did in the source VPC, then once the source server has been established in Elastic Disaster Recovery:

    • Edit the launch settings and change Copy private IP to Yes.

copy-private-ip

    • Edit the Amazon EC2 launch template and change the subnet to the target VPC subnet that you set up earlier. This should be a subnet that you set up with the target VPC earlier and must fall within the same CIDR as the source VPC.

subnet

Troubleshooting

You may see the following issue.

Sync fails on: authenticate with service

The Elastic Disaster Recovery replication server that is deployed in your staging VPC is failing to connect to the Elastic Disaster Recovery service endpoint on port 443. This means that something has gone wrong when setting up your VPC Endpoints. Check that you’ve set up your VPC Endpoints to point to the same subnet into which you’ve configured the replication server to deploy. Moreover, make sure that the security group that controls traffic inbound and outbound of your VPC Endpoint is setup to allow this traffic.

Cleaning Up

To avoid incurring unwanted future charges, delete the VPC endpoints you created and stop replication in Elastic Disaster Recovery for any source servers you set up.

Conclusion

In this post, I showed you how to use Elastic Disaster Recovery to set up a cross-Region disaster recovery solution that uses private connectivity. This is important for companies that want a DR solution in place, but cannot use the public Internet for security reasons. Elastic Disaster Recovery allows these companies to plan and prepare for unexpected outages by creating a resilient architecture.

As a bonus: The DR account and resources that we set up could also be used to replicate on-premises servers to AWS for DR infrastructure that is only spun-up for a drill or recovery event. Elastic Disaster Recovery provides a cost-efficient way to maintain a reliable DR infrastructure for your on-premises footprint as well as your cloud footprint.

Thank you for reading this post. If you have any comments or questions, then you can enter them in the comments.