Networking & Content Delivery

Building highly resilient applications using Amazon Route 53 Application Recovery Controller, Part 2: Multi-Region stack

This is the second in a two-part blog post series about using the recently launched Amazon Route 53 Application Recovery Controller (Route 53 ARC) service. In Part 1, we introduced a single-Region stack, and set up Route 53 ARC features like routing controls, readiness checks, and safety rules to simplify recovery.

In this post, we add a Disaster Recovery (DR) Region to the stack, in the AWS US West (Oregon) Region or us-west-2. By configuring a multi-Region stack with Route 53 ARC, you can build resilient failover recovery mechanisms.

Sample multi-Region infrastructure stack

The multi-Region stack design, as shown in Figure 1, supports an active-standby setup. In this design, the primary (active) Region is US East (N Virginia) or us-east-1, and the recovery (standby) Region is us-west-2.

Figure 1: Diagram illustrating a multi-Region active-standby AWS deployment

The following describes the architecture of the multi-Region stack in AWS:

  1. We configured our domain in Amazon Route 53 with the same combination of weighted and failover-based routing policies that we used for our single Region deployment, described in Part 1 of this blog post series. Similarly, we leveraged Route 53 ARC routing controls and Route 53 health checks to allow application failover.
  2. The primary and recovery Regions each have cell boundaries defined around Availability Zones (AZs) in the respective Regions. Each AZ has its own Network Load Balancer (NLB), and an associated Auto Scaling group (ASG) that launches web application servers.
  3. We expanded the single Region Aurora database cluster that we used for the single-Region stack to an Amazon Aurora global database, with three additional readers in the standby Region.

We’ve provided a CloudFormation template (infra-stackset) that you can use to deploy this multi-Region stack in your AWS account. You can use the template to create the stack, and then follow along with this post as we set up Route 53 ARC structures with the sample stack. For setup instructions, see the readme file that comes with the template.

Now let’s see how the Route 53 ARC features apply to the multi-Region stack. The Route 53 ARC features and components that we discuss throughout this post are also available to deploy in your account by using a second CloudFormation template (arc-stack). If you experiment and explore for yourself while you follow along with the steps in this blog post, it can help you understand the features and behaviors of Route 53 ARC at a deeper level.

Failover readiness

Let’s begin by configuring and using the readiness check feature in Route 53 ARC with our multi-Region stack.

Set up readiness checks and cells

It’s straightforward to expand readiness checks from a single-Region stack to a multi-Region stack. We add a Regional cell to the existing recovery group, to represent the entire recovery Region’s stack. The new regional cell has three nested cells, each representing one of the AZs in the recovery Region. The readiness check setup in Route 53 ARC for our multi-Region application, with the recovery group and Regional and zonal cells, is shown in Figure 2.

Figure 2: Recovery group and readiness check setup for the multi-Region stack

Next, we add Amazon Resource Names (ARNs) of the new resources in this Region to the existing resource sets that we created for single-Region stack: the NLB, Auto Scaling group, and Aurora database (DB) Cluster. Route 53 ARC readiness check now continuously evaluates the corresponding readiness rules for the resources deployed across the cells in both Regions. The readiness checks return a status of Ready, Not ready, Unknown or Not authorized for each cell, for each resource set, and for the recovery group.

The readiness check status for each Regional and zonal cell, as well as the status for each resource set, is shown in Figure 3. The readiness check for each cell has evaluated to Ready across all the different readiness rules applied for the NLB, Auto Scaling group, and Aurora DB Cluster in both Regions. Because the statuses have evaluated to Ready, this helps confirm that the configurations are matching and that the runtime checks for these resources are passing.

Figure 3: Readiness checks in the AWS console showing all statuses as Ready

Readiness scenario

With our readiness checks set up for the example multi-Region stack, let’s look at how they can help us evaluate failover readiness. We’ll need to keep in mind that, because Route 53 ARC readiness checks are not highly available, we should not depend on the checks being accessible during an outage. In addition, the resources that are checked might also not be available during a disaster event. Readiness checks are most useful for verifying, on an ongoing basis, that application replica configurations and runtime states are aligned. A readiness check shouldn’t be relied on to be a primary trigger for failover during a disaster event.

Now let’s consider, for example, a scenario where the Recovery Time Objective (RTO) and Recovery Point Objective (RPO) for our application give us the flexibility to optimize for cost and use fewer resources in our recovery Region.

With these flexibilities in mind, we operate the recovery Region of our multi-region stack in a pilot light mode, as shown in Figure 4. We configure the Auto Scaling groups in the Region to have zero instances and set up the Aurora global database cluster to have only one active reader instance.

Figure 4: Recovery readiness with a pilot light readiness scenario

In this pilot light mode, the Route 53 ARC console quickly shows us that the resources and configurations in the two Regions don’t match, as shown in Figure 5. The console status shows Not ready for all resources.

Figure 5: Readiness checks in the AWS console in pilot light mode with all statuses Not ready

Let’s understand how readiness rules are influencing the Cell and Readiness status. As expected, the ELB readiness check status is Not ready because the NLBs in the recovery Region have no registered target instances. The Aurora database readiness check is Not ready because there is a difference in the deployed runtime capacity between the two Regions. And finally, the Auto Scaling group readiness check is Not ready because of the difference in the Minimum-Maximum instances in the Auto Scaling group configuration between the primary and recovery Regions.

Note that the readiness check status is Not ready for resources across all cells, even though we only scaled down the Auto Scaling group in the recovery Region. This is because either the primary or the recovery Region configuration might be the baseline that you want for your application, and therefore Route 53 ARC marks all the cell resources for both Regions as Not ready.

With a pilot light setup, by design, the configuration across both Regions does not match at steady state, so the Route 53 ARC readiness check statuses are Not ready, as we’ve seen. However, you still can use Route 53 ARC to check if the resources in your recovery Region are ready before you consider a fail over. When you need to fail over, scale up the pilot light environment in your recovery Region to match the capacity and configuration in your primary Region. After the scale up is complete, the Route 53 ARC readiness checks all show as Ready.

This pilot light example intentionally includes a large difference in the resources between the two Regions, to help us clearly illustrate the effect on Route 53 ARC readiness checks for this blog post. But you would see similar status differences in other scenarios. For example, the readiness check statuses in your application are also affected when you roll out a new version of your application to a Region, or when there’s an application infrastructure event in one Region.

You should proactively monitor the status of readiness checks using Route 53 ARC ReadinessChecks metric in CloudWatch, so that you’re aware of drift between cells. CloudWatch Alarms can be configured at an account level to track the number of readiness checks in each state and issue alerts via Simple Notification Service (SNS) when thresholds are breached. You should still make decisions about whether to fail away from or to a cell based on your monitoring and health check systems, and consider readiness checks as a complementary service to those systems.

Multi-Region failover

Route 53 ARC enables us to manage failover for multiple layers of the stack from one central location, in addition to acting as a single pane of glass to view our stack’s failover readiness, as we described earlier in the readiness scenario. Let’s see how Route 53 ARC can facilitate multi-Region failover.

Failure scenario

Consider a scenario where your monitoring systems have detected that users of the multi-Region application stack are observing errors. The current primary Region that receives all user traffic is US East (N. Virginia) or us-east-1. When you troubleshoot further, you learn the following details:

  • The errors are occurring across all cells in the Region.
  • Instances are failing load balancer health checks and are being terminated by the Auto Scaling group.
  • Instances are having trouble coming back online.
  • Preliminary checks don’t find anything wrong with your application code.

With no cause identified and without more information immediately available, you decide to limit further customer impact by triggering a failover to your recovery Region, us-west-2.

To perform the failover with minimal impact and downtime for users, you plan to take the following steps, in order:

  1. Stop accepting traffic in the primary Region (us-east-1), to prevent writes to the database while you’re failing it over.
  2. Fail over your database to the recovery Region (us-west-2) and make sure it’s ready to accept writes.
  3. Route user traffic to us-west-2, making it the new active Region.

First, let’s discuss how Route 53 ARC supports the failover steps. The setup for the multi-Region stack in Route 53 ARC builds on the configuration that we set up for the single-Region stack in Part 1 of this blog post series. For multi-Region, however, we create two additional sets of routing controls: one set to fail over the database and one set to fail over regional user traffic.

Failing over your database

Typically, before you fail over your application, you must have your database failed over, and ready to serve users in the recovery Region. So, our first step demonstrates how to fail over your database, using Route 53 ARC routing controls and a Lambda function.

As a part of our multi-Region stack, we’ve set up an Aurora global database cluster with the writer in the primary Region (us-east-1), and cross-Region read replicas in recovery Region (us-west-2). Before we failover user traffic to the recovery Region, we must first promote the reader instance there to become the writer.

To facilitate database failover using Route 53 ARC, we create routing controls for the database, as shown in Figure 6, for each Region: DB Routing Control 1 and DB Routing Control 2. However, we don’t create corresponding Route 53 health checks for the routing controls because we don’t manage global database failover with DNS. You only add health checks to routing controls in Route 53 ARC when you want to use DNS to reroute traffic for fail over. For database failover, we instead use a Lambda function with Aurora API operations.

First, the Lambda function polls the state of Route 53 ARC routing controls for the database in each Region, to see if the state is ON or OFF, by calling the highly available Route 53 ARC data plane endpoints. Then, it checks the Aurora global database cluster status, to see which Region has the Primary. Finally, based on the states of the routing controls and database cluster for the primary and recovery Regions, the Lambda function initiates failover by using an Aurora API operation if required.

Figure 6: Database routing controls and failover Lambda functions per Region

It’s critical to build resilience for failover operations. Route 53 ARC reads and updates routing control states as a data plane operation, and it hosts routing controls redundantly in five Regions. For maximum resilience, the Lambda function queries each Region until it receives a valid response with a routing control state. In addition, we deploy two identical functions, referred to here as Failover Lambda, across both Regions. The redundant functions improve resilience and availability during failure events.

To prepare for failover, the Lambda function runs every minute using a CloudWatch Events rule. On each trigger, the function does the following:

  1. It queries the Aurora global database cluster status API to learn which cluster is currently Primary.
  2. Then, it queries the Regional endpoints in Route 53 ARC cluster for the state of the database routing controls. For example, the database routing control state in the recovery Region, DB Routing Control 2, can be either OFF or ON.

After the Lambda function retrieves the information for both the Aurora database clusters and the database routing controls, it compares them to determine whether to fail over. For example, the following is the condition to fail over to the recovery Region:

  • If the Aurora database cluster status in the primary Region is Primary but DB Routing Control 2 is set to ON, then fail over the global database to the recovery Region (us-west-2).

When this condition is met, the Lambda function then calls an Aurora API operation to fail over. Because the global database failover request is an idempotent API call, both functions execute in the two Regions simultaneously, providing maximum reliability with no conflicts. To avoid database failover loop, we use Route 53 ARC Safety Rules, as described later in Scenario 2 of this blog post, to ensures that a maximum of one database routing control can be ON at any time.

You can deploy the Failover Lambda function by using this CloudFormation template (lambda-stackset). The template also deploys the Status Dashboard Lambda function and a public-facing Application Load Balancer (ALB) that we introduced in Part 1 of this blog post series. When you use the template provided here, the dashboard and ALB include the same basic functionality as before in Part 1, but here they are deployed in both Regions and offer specific views for the multi-Region stack.

Failing over user traffic

Next, we set up the Route 53 ARC user traffic routing controls, together with the Route 53 hosted zone configuration, required to fail over user traffic. There are three components in Route 53 ARC that work together to fail over user traffic by redirecting traffic flow: Route 53 ARC routing controls, routing control health checks in Route 53, and Route 53 record sets. For failing over user traffic with the multi-Region stack, we create aggregate Route 53 health checks.

Routing controls: For the multi-Region stack, we expand the Route 53 ARC routing control configuration to add four new routing controls, which we use to control user traffic to the Regional and zonal cells in the recovery Region, as shown in Figure 7. One routing control manages overall user traffic to the Region, Routing Control 2, and additional routing controls manage traffic to each availability zone in the Region: Routing Control 2a, Routing Control 2b, and Routing Control 2c. These routing controls are in addition to the new database routing controls outlined earlier.

Figure 7: User traffic routing controls for the multi-Region stack

Routing control health checks in Route 53: As with the setup for our single-Region stack, we have corresponding routing control health checks for each user traffic routing control. The health check for the Regional routing control, is HC2, and there’s also a health check for each zonal routing control: HC2a, HC2b, and HC2c.

Aggregate Route 53 health checks: We also create several aggregate Route 53 health checks, for both the primary and recovery Regions.

  • First, we create an aggregate health check, HC12, for Region level control. This health check is configured to be healthy when either of the two Regional health checks is healthy.
  • Second, we create an aggregate health check for each Region, HC1abc and HC2abc, that map to all the zonal health checks in that Region. So HC1abc aggregates health for HC1a, HC1b, and HC1c, and HC2abc aggregates health for HC2a, HC2b, and HC2c. These aggregate health checks are each healthy when any of the corresponding zonal health check that is rolled up under them, in that Region, is healthy.

Route 53 record sets: Finally, we add the DNS record set configuration. For the DNS configuration in our Route 53 hosted zone, we start by adding a DNS record for the recovery Region, r2w-app.arcblog.aws, similar to the record that we created for the primary Region, r1w-app.arcblog.aws.

The weighted DNS record configuration for the zonal cells for both Regions is shown in Figure 8. For more information about how to configure this record, see the detailed explanation in Part 1 of this blog post series.

Figure 8: Route 53 weighted record setup with routing control health checks

For the next step in our record set configuration, we create two new DNS records, one for each Region, called r1-app.arcblog.aws and r2-app.arcblog.aws, as shown in Figure 9. These are configured as failover records, with the value of the corresponding weighted Regional records (r1w-app.arcblog.aws and r2w-app.arcblog.aws) configured as the primary and with the maintenance record, maintenance-app.arcblog.aws, configured as the secondary. The primary record is tied to a composite health check, which is configured so that if any one of the zonal health checks is healthy, then the primary record continues to resolve. However, if all the zonal health checks are unhealthy, then user traffic is directed to the maintenance record.

Figure 9: Route 53 failover routing policies for top-level DNS records

The last DNS record that we need to create is a Region selector record, rs-app.arcblog.aws, as shown in Figure 10. This record has a primary record of r1-app.arcblog.aws associated with the HC1 health check, and a secondary record of r2-app.arcblog.aws.

Figure 10: Route 53 failover routing policy for the Region selector record

Finally, we update our parent record, app.arcblog.aws, as shown in Figure 11, to change the primary record to route to the Region selector record, rs-app.arcblog.aws, based on the status of the HC12 health check. The HC12 health check indicates whether at least one Region is available. We set the secondary record to be maintenance-app.arcblog.aws, which is where traffic is directed when neither Region can receive traffic.

Figure 11: Route 53 failover routing policy for the parent record for the multi-Region stack

With this final configuration in place, let’s see how the DNS logic works to resolve DNS records in different failure scenarios.  The numbers in Figure 12 align with the steps of the description that follows the diagram.

Figure 12: Example DNS routing behavior in failover scenarios for the multi-Region stack application
  1. If either Regional health check (HC1 or HC2) is enabled, meaning that one of the Regions is available, then route traffic to the Region selector record logic (rs-app.arcblog.aws) to send traffic to one of the Regions (primary or recovery). If not, direct traffic to the maintenance record. This handles application-wide maintenance events, such as when you are failing over traffic from one Region to another.
  2. For the Region selector record (rs-app.arcblog.aws), if Region 1 is enabled, then route traffic to Region 1. Otherwise, route traffic to Region 2. This ensures that you always send traffic to the primary Region, if it’s available.
  3. If any of the zonal cells within the selected Region is available, distribute traffic across the active cells. If all are unavailable, direct traffic to the maintenance record. This is a not a likely scenario, given the health check and routing control configuration, but if it occurs in certain error cases, it’s handled gracefully.

Let’s look at this configuration at steady state with the Status Dashboard app, with the primary Region as us-east-1. As shown in Figure 13, app.arcblog.aws resolves to the three NLBs in the AZs in the primary Region (us-east-1a, us-east-1b-, us-east-1c) in just about an even split, as we would expect.

Figure 13: Status Dashboard shows even distribution across three AZs of primary Region

How can I route user traffic to a healthy Region?

We’ve set up our multi-Region stack with Route 53 ARC and Route 53 so you can successfully fail over from the primary Region to the recovery Region in our failure scenario. As we mentioned earlier, there are three steps to take so that you can fail over with minimal impact to users. Now let’s go over how you perform the failover steps by changing your Route 53 ARC routing controls.

  1. Stop accepting traffic in the primary Region

You want to prevent new writes to the database writer in the primary Region as a pre-cursor to failing over the database. To prevent writes, you must stop accepting user traffic to the primary Region. We stop user traffic to the primary Region, us-east-1, by setting the state for its Regional routing control, Routing Control 1, to OFF, which makes health check HC1 unhealthy.

Now both of the Regional routing control states are set to OFF. With our Route 53 DNS routing setup, this means that DNS routes user requests to the Maintenance page record, maintenance.arcblog.aws.

You can confirm this by checking the Status Dashboard, as shown in Figure 14. After making the routing control state change, for new timestamps, that records transition from showing the individual AZs for responses to consistently showing the responses as Maintenance, as we would expect.

Figure 14: NLBs in primary Region AZs stop resolving and user traffic now resolves to Maintenance page
  1. Failover the database to the recovery Region

Next, you fail over your writer database to the recovery Region, us-west-2. To do this, set the state of the primary Aurora routing control, DB Routing Control 1, to OFF, and then set the state of the recovery Aurora routing control, DB Routing Control 2, to ON.

Now, the database failover condition is met, and the Failover Lambda function makes one of the database readers in the recovery Region the writer for the database.

  1. Route user traffic to the recovery Region, making it active

After you’ve tested the recovery Region to make sure your application is up and running there, you’re ready to redirect your user traffic. To redirect traffic, do the following:

    • First, set the state of each zonal routing control to ON. That is, set the state for each of the following routing controls to ON: Routing Control 2a, Routing Control 2b, and Routing Control 2c.
    • Second, set the state of the Regional routing control, Routing Control 2, to ON.

Now user traffic is directed to the recovery Region, us-west-2. You can confirm this by checking the Status Dashboard after you make the routing control state changes. As shown in Figure 15, the records transition from the Maintenance page to showing responses from the individual AZs in the recovery Region, us-west-2.

Figure 15: Recovery Region is now active and user traffic is resolving to AZs in us-west-2

Failing over safely

In Part 1 of this blog post series, we explained how Route 53 ARC safety rules can help you avoid unintended consequences when you work with routing controls. Now let’s look at some safety rules that can help us to fail over safely in the multi-Region stack scenario.

Scenario 1: Only one Regional routing control state can be set to ON

Since our multi-Region stack is operating in an active-standby mode, the Regional routing control for both Regions must not be ON at the same time. If we don’t prevent this, user traffic is always directed to Region 1 as primary, even when Region 2 is intended to be active.

A relevant safety rule to prevent this is the following:

Rule Type

Asserted Control

Config Type

Threshold

Inverted?

Assertion

Routing Control 1, 2

At least

2

True

This rule asserts that for the two Regional routing controls, both cannot be set to ON simultaneously. Let’s look at the components of the rule in more detail, from left to right. The rule asserts that for the two routing controls specified (Routing Control 1 and Routing Control 2), at least two of them are ON (Threshold is set to 2). However, we then invert the result (Inverted is set to True). This means that if we attempt to set more than one of the routing controls to ON, then the safety rule prevents it. However, you can still set both controls to OFF.

Scenario 2: Only one database routing control state can be set to ON

With an Aurora global database in a multi-Region stack, only the cluster with Primary status can have one writer instance of the database. In our design, this means only one of the two database routing controls (DB Routing Control 1 and DB Routing Control 2) can be set to ON at once. If both database routing controls are ON, the Failover Lambda function can’t determine which Region should have Primary database cluster and therefore can’t take action to fail over.

A relevant safety rule here is the following:

Rule Type

Asserted Control

Config Type

Threshold

Inverted?

Assertion

DB Routing Control 1, 2

At least

2

True

As in the first scenario, this rule asserts that for the two routing controls specified, at least two of them must be set to ON. But because Inverted is set to True, the result is inverted. This means that if you try to set more than one of the database routing controls to ON, the safety rule prevents it.

Scenario 3: Database cluster must be Primary before user traffic is allowed

As we discussed in the section on routing user traffic to a healthy Region, typically when you fail over to a recovery Region, you fail over your database cluster first. During failover, however, an operator might accidentally set the Regional routing control to ON and allow user traffic to the recovery Region before the database cluster failover is complete.

To be safe, we should add a rule that only allows the Regional routing control that controls user traffic to all the NLBs in the Region to be set to ON if the database routing control for the Region is set to ON. Without this rule, user traffic might be directed to a Region that doesn’t have the Primary database cluster, which can potentially cause unexpected behavior.

The relevant safety rules here are the following two rules, one for each Region:

Rule Type

Gating Control

Target Control

Config Type

Threshold

Inverted?

Gated

DB Routing Control 1

Routing Control 1

At least

1

False

Rule Type

Gating Control

Target Control

Config Type

Threshold

Inverted?

Gated

DB Routing Control 2

Routing Control 2

At least

1

False

With gated rules, your ability to change the routing controls specified under Target Control is gated by the rule that you set for the routing controls in Gating Control. The way that we have defined these rules, they work as follows:

  • With the first rule, Routing Control 1, which controls user traffic to the primary Region, can only be set to ON if the DB Routing Control 1, for the database cluster in the primary Region, is set to ON.
  • In the second rule, the same restriction applies for Routing Control 2. So, user traffic to the recovery Region is only allowed (by setting Routing Control 2 to ON) if DB Routing Control 2, for the database cluster in the recovery Region, is also set to ON.

Scenario 4: User traffic must be stopped before failing over database cluster

Reversing the previous scenario, we also don’t want an operator to start a database cluster failover when the Region where the cluster is still Primary continues to receive user traffic. So, an operator should not be able to set a database routing control to OFF when the Regional routing control that controls user traffic to the cells in the Region is ON.

The relevant safety rules here are the following two rules, one for each Region:

Rule Type

Gating Control

Target Control

Config Type

Threshold

Inverted?

Gated

Routing Control 1

DB Routing Control 1

At least

1

True

Rule Type

Gating Control

Target Control

Config Type

Threshold

Inverted?

Gated

Routing Control 2

DB Routing Control 2

At least

1

True

With gated rules, your ability to change the routing controls specified under Target Control is gated by the rule that you set for the routing controls in Gating Control. The Inverted rule flips the result. So, the way we’ve defined the rules, they work as follows:

  • With the first rule, DB Routing Control 1, for the database cluster in the primary Region, can only be set to OFF if the Routing Control 1, which controls user traffic to the primary Region, is set to OFF. But because we set Inverted to True, DB Routing Control 1 can only be set to ON instead of OFF (the result is inverted).
  • Similarly, with the second rule, DB Routing Control 2, for the database cluster in the recovery Region, can only be set to OFF if the Routing Control 2, which controls user traffic to the recovery Region, is set to OFF. But because we set Inverted to True, DB Routing Control 2 can only be set to ON instead of OFF (the result is inverted).

Scenario 5: A Region must have at least one zonal cell receiving traffic

Finally, you could have a situation where an operator sets a Regional routing control to ON when none of the routing controls for the zonal cells in the Region are set to ON and as such may not be ready to receive user traffic.

To prevent this situation, you can add the following two safety rules, one for each Region:

Rule Type

Gating Control

Target Control

Config Type

Threshold

Inverted?

Gated

Routing Control 1a, 1b, 1c

Routing Control 1

At least

2

False

Rule Type

Gating Control

Target Control

Config Type

Threshold

Inverted?

Gated

Routing Control 2a, 2b, 2c

Routing Control 2

At least

2

False

Once again, we set up gating rules to establish safety in the scenario. The first gating rule is aimed at the Regional routing control in the primary Region, Routing Control 1, and is gated by the zonal routing controls for that Region: Routing Control 1a, Routing Control 1b, and Routing Control 1c. The rule states that Routing Control 1 (the target) can only be turned ON if two (the threshold) of the zonal routing controls (the gating controls) are turned ON.

The second rule works similarly for the recovery Region, with Routing Control 2 as the target, gated by the zonal routing controls for the recovery Region: Routing Control 2a, Routing Control 2b, and Routing Control 2c.

Cleanup

If you used the CloudFormation templates that we provided to create AWS resources to follow along with this blog post, we recommend that you delete them now to avoid future recurring charges.

Conclusion

In this second blog post in our two-part series on Route 53 ARC, we broadened our scope to apply Route 53 ARC’s features to a multi-Region stack. We stepped through an example of configuring and observing readiness and failover recovery in a failure scenario. We then outlined several safety rules to show how you can help ensure that routing control works smoothly for failover in key scenarios.

The CloudFormation templates that we link to throughout this post are all hosted on GitHub for you to deploy in your own AWS account.

By walking through examples of using Route 53 ARC to fail over a single-Region and multi-Region infrastructure stack, we hope that these two posts provided you with guidance that you can use when planning and implementing a resiliency strategy for your own environment.

Related information

About the Authors

Gerrard Cowburn

Gerrard Cowburn is a Solutions Architect with AWS based in London. Gerrard supports Global Financial Services customers in greenfield and migration based architectural deep dives and prototyping activities. In his free time, Gerrard enjoys exploring the world through food and drink, road trips, and track days.

Harsha W Sharma

Harsha W Sharma is a Principal Solutions Architect with AWS in New York. Harsha joined AWS in 2016 and works with Global Financial Services customers to design and develop architectures on AWS, and support their journey on the cloud.

Shiv Bhatt

Shiv Bhatt is a Global Account Solutions Architect at AWS helping customers in their journey to the cloud. He is passionate about helping customers build well-architected solutions and innovate in their space.