AWS Cloud Resilience

Build and run resilient, highly available applications in the AWS cloud

Cloud resilience refers to the ability for an application to resist or recover from disruptions, including those related to infrastructure, dependent services, misconfigurations, transient network issues, and load spikes. Cloud resilience also plays a critical role in an organization’s broader business resilience strategy, including the ability to meet digital sovereignty requirements.

Resilient applications are those built with high availability—the percentage of time the application is available for use—and also those with a disaster recovery or continuity of operations plan in place.

Millions of customers trust that AWS is the right place to build and run their business and mission-critical applications with high availability.

AWS has made significant investments in building and running the world’s most resilient cloud. We have designed a unique and highly available global infrastructure, built safeguards into our service design and deployment mechanisms, and instilled resilience into our operational culture. AWS also makes it easier for you to build and run resilient applications in the cloud, with a comprehensive set of purpose-built resilience services, solutions, architectural best practices, and guidance.

Capital One improves cloud resilience with help from AWS


Highest network availability

AWS delivers the highest network availability of any cloud provider and is the only cloud provider to offer three or more Availability Zones (AZs) in all Regions, providing more redundancy and better isolation to contain issues.

Comprehensive resilience services and guidance

AWS makes it easier for customers to design, build, and run highly available applications through its comprehensive portfolio of purpose-built resilience services, integrated resilience features, and expert guidance.

Unparalleled operational expertise

AWS has over 17 years of proven operational expertise and unmatched scale helping millions of customers in regulated and non-regulated industries meet their resilience requirements.

Use Cases

Designing and Building

Leverage the best practices in the Reliability and Operational Excellence Pillars from the AWS Well-Architected Framework to build resilient applications.

Evaluating and Testing

Continuously measure and test your workload performance against your resilience goals with AWS Resilience Hub and AWS Fault Injection Service.

Monitoring and Observability

Implement monitoring and observability services like Amazon CloudWatch to quickly detect, investigate, and remediate issues impacting your applications.

Failover and Failback

Use Amazon Route53 Application Recovery Controller, AWS Elastic Disaster Recovery, and AWS Backup to ensure your applications recover quickly.

AWS Resilience Hub

Define, test, and track the resilience of your applications to ensure you are able to meet your recovery objectives.

Learn more »

AWS Fault Injection Service

Improve application performance, observability, and resilience through controlled fault injection experiments.

Learn more »

AWS Elastic Disaster Recovery

Minimize downtime and data loss with fast, reliable recovery of on-premises and cloud-based applications.

Learn more »

AWS Backup

Protect data at scale using this cost-effective, fully managed, policy-based service.

Learn more »

Amazon Route53 Application Recovery Controller

Automate management and coordination of recovery for your applications across AWS AZs or Regions.

Learn more »

Amazon CloudWatch

Collect and visualize real-time logs, metrics, and event data in automated dashboards to streamline your infrastructure and application.

Learn more »

AWS Well-Architected

Build and run resilient applications with architectural and operational best practices and measure improvement over time.

Learn more »

AWS Trusted Advisor

Improve resilience of your AWS resources with automated resilience best practice checks.

Learn more »

AWS Health

Monitor the health of your AWS resources and take the necessary actions.

Learn more »

AWS Solutions

Leverage pre-built AWS Solutions, Partner Solutions, and resilience guidance in the AWS Solutions Library.

Learn more »


Nasdaq logo

“At Broadridge, we have critical systems that can’t afford to be down. We developed an ‘always on’ program using AWS services to ensure we were having near-zero recovery time objectives and recovery point objectives.”

-Todd Peterson, Vice President of Broadridge

Broadridge taps AWS to help improve resilience of their critical systems

Resilience Lifecycle Framework

A continuous approach to resilience improvement

Improving the resilience posture of an application is not a one-time effort; it is a continuous process that should be incorporated into how you build and operate your applications. This whitepaper shares strategies, services, and mechanisms you can use to drive continuous resilience into your organization.

Read more »