Launch a failover sequence deployment across multiple AWS Regions to protect workloads
This Guidance demonstrates how to build highly resilient web applications that can withstand disruptions, minimizing impact on revenue and application downtime. By leveraging a multi-Region architecture, automated failover orchestration, and comprehensive monitoring, this Guidance helps ensure critical web applications remain available and consistent, even in the face of significant impairments. You can reduce the blast radius of affected users, maintain data integrity, and make informed decisions on when to failover between primary and standby Regions to maximize uptime and protect business continuity.
Please note: [Disclaimer]
Architecture Diagram

-
Active/Active State
-
Failover Sequence
-
Active/Active State
-
This architecture diagram shows the active/active state across two AWS Regions. For the failover sequence, open the other tab.
Step 1
Amazon Route 53 failover records use Amazon Application Recovery Controller managed health checks to route requests to the active Regions.Step 2
Application Load Balancers (ALBs) send requests to the user interface (UI) tasks on Amazon Elastic Container Service (Amazon ECS). Depending on the page being accessed, the UI will make a service call to the appropriate service through Amazon ECS Service Connect.Step 3
As records are written to the writer instances of the Catalog and Orders Amazon Aurora global databases, they are replicated to the standby clusters.Step 4
As records are written to the Carts Amazon DynamoDB global table in one Region, they are replicated to the table in the other Region.Step 5
The Checkout service uses Amazon ElastiCache for Redis for temporarily caching the contents of the cart until the order is placed.Step 6
The Orders service leverages Amazon MQ for RabbitMQ broker to publish order creation events for any downstream consumption purposes.Step 7
Amazon CloudWatch Synthetics from each Region sends requests from the application in each Region (using the ALB’s address) to the DNS name resolved through Route 53 and pushes the metrics, logs, and traces to CloudWatch.Step 8
AWS Systems Manager automation runbooks automate the enabling and disabling of the Amazon Application Recovery Controller routing controls and the failing-over of the Aurora global databases. -
Failover Sequence
-
This architecture diagram shows the failover sequence when the workload fails over to us-west-2 from us-east-1 AWS Region. For the active/active state, open the other tab.
Step 1
Systems Manager runbook (invoked by an operator manually) toggles the Amazon Application Recovery Controller routing control “off,” which causes the managed health check for the Region to enter a “failed” state.Step 2
Route 53 returns only the remaining healthy Region as a client to resolve the application’s fully-qualified domain name.Step 3
Systems Manager runbook executes Aurora global database managed failover, which promotes the standby Region to the primary for writes.Step 4
The former primary Region is rebuilt as a secondary Region by Aurora.Step 5
Systems Manager runbook recovers a copy of the old primary database from a snapshot and compares the data in the new primary database to the old, and then creates a missing transaction report.
Get Started

Deploy this Guidance
Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
AWS X-Ray traces application calls from Amazon ECS tasks, visualizing communication flows of microservices and analyzing user requests as they travel through the UI to underlying microservices. CloudWatch Synthetics generates traffic to the application, creating metrics for setting thresholds and alerting if issues arise. Systems Manager runbooks automate failover and failback processes, minimizing human error and ensuring the application meets recovery time objective (RTO) and recovery point objective (RPO) requirements.
-
Security
AWS Identity and Access Management (IAM) roles and policies secure microservices' interactions with AWS services, enforcing robust security through meticulously defined permissions. AWS Key Management Service (AWS KMS) encrypts data at rest across services, including Aurora and DynamoDB.
-
Reliability
Elastic Load Balancing (ELB) routes traffic requests from the application's web interface to healthy Amazon ECS tasks, while Amazon ECS replaces unhealthy tasks and adds more tasks to handle increased load. Amazon Application Recovery Controller reliably enables and disables AWS Regions based on application traffic. DynamoDB global tables and Aurora global databases keep application data consistent within the RPO requirements across multiple AWS Regions. Systems Manager runbooks orchestrate components that need to be changed when shifting traffic from one AWS Region to another. Together, these services help ensure the application experiences minimal service interruptions.
-
Performance Efficiency
ELB distributes incoming traffic across multiple targets, preventing any single instance from becoming overwhelmed and maintaining high performance. Aurora read replicas offload read traffic from the primary database instance, distributing the workload and improving overall performance. Aurora global databases extends the benefits of read replicas across multiple Regions, enabling read scaling and improved performance for geographically distributed applications. DynamoDB global tables replicate DynamoDB tables across multiple AWS Regions, enabling low-latency data access for users worldwide.
-
Cost Optimization
Auto scaling automatically adjusts the number of Amazon ECS tasks based on demand, so that you only pay for the resources needed. AWS Fargate for Amazon ECS eliminates the need to provision and manage servers, allowing you to run containers without the overhead of managing Amazon Elastic Compute Cloud (Amazon EC2) instances, leading to improved efficiency and reduced costs.
-
Sustainability
Auto scaling and DynamoDB On-Demand add capacity when needed and scale down when not required. On-demand services minimize the environmental impact of the workload by efficiently using only the necessary resources to meet the application's demands.
Related Content

[Title]
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.