AWS Architecture Blog

Let’s Architect! Creating resilient architecture

The AWS Well-Architected Framework defines resilience as “the capability to recover when stressed by load (more requests for service), attacks (either accidental through a bug, or deliberate through intention), and failure of any component in the workload’s components.”

The need for resilient workloads transcends all customer industries, but it can often can be misunderstood, which can lead to workloads that do not incorporate resilient architecture at all or workloads that are over-engineered.

Resilience is a technical problem, but it’s also about people and culture. It’s a continuous process that requires us to learn by iterating. Customers need to understand, from a business perspective, what their SLA requirements are, and from technical perspective, how they achieve this with their architecture. In this post, we share resources to help you build resilience into your AWS architecture.

Amazon’s approach to building resilient services

Building a resilient architecture is not only about the technical implementation of the system, but also about the solutions for observability, operations, and people.

This video shows the Amazon approach for designing resilient systems, where individual teams build and own a service. This way, everyone has operational responsibility. You’ll learn how to deploy often, move fast, and design solutions for automatic rollback, which allows teams to revert their workload to a previous iteration if needed.

The pillars adopted by the engineering teams building services at Amazon

The pillars adopted by the engineering teams building services at Amazon

Five design patterns to build more resilient applications

Resilience is an important consideration for developers. For instance, if a downstream service is not available, how can the software handle the situation? Which mechanisms should you use to implement retries? How can you prevent overloading the downstream service?

This video focuses on five strategies and design patterns that developers can use to build resilient applications. You’ll learn how to add timeouts, retries, exponential backoff with randomness, and circuit breakers into your code. These patterns are powerful because they can be abstracted and implemented in different scenarios.

Software developers can implement different strategies in their application code to design for resiliency

Software developers can implement different strategies in their application code to design for resiliency

Building Resilient Well-Architected Workloads Using AWS Resilience Hub

This blog post shows you how AWS Resilience Hub can help you evaluate the resilience of your architecture. It gives you a central place to monitor, track, and evaluate your application’s resiliency based on your business goals. For example, after you define your RPO and RTO SLAs, Resilience Hub will evaluate your current architecture against them and show you whether you’ve met your goals. If you haven’t met your goals, it recommends changes to help you meet them.

Multi-AZ architecture incorporating data backup features

Multi-AZ architecture incorporating data backup features

Incorporating continuous resilience in your development ecosystem

Resilience encompasses a broad range of considerations, including infrastructure, application patterns, data management, and application building and monitoring. And after you incorporate resilience, it is essential to continuously maintain it.

This video provides useful principles for building continuous resilience in your applications. It also explores various considerations for implementing processes designed to provide continuous improvement through a DevOps methodology and shows you services you can use to incorporate resilience in the development process in a nearly continuous manner.

Software architects can implement several patterns to prevent failures or being fault-tolerant

Software architects can implement several patterns to prevent failures or being fault-tolerant

See you next time!

Thanks for joining our discussion on resilient architecture! See you in a couple of weeks with our content about governance in the cloud!

Looking for more architecture content? AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more!

Other posts in this series

Luca Mezzalira

Luca Mezzalira

Luca is Principal Solutions Architect based in London. He has authored several books and is an international speaker. He lent his expertise predominantly in the solution architecture field. Luca has gained accolades for revolutionizing the scalability of front-end architectures with micro-frontends, from increasing the efficiency of workflows, to delivering quality in products.

Laura Hyatt

Laura Hyatt

Laura Hyatt is a Solutions Architect for AWS Public Sector and helps Education customers in the UK. Laura helps customers not only architect and develop scalable solutions but also think big on innovative solutions facing the education sector at present. Laura's specialty is IoT, and she is also the Alexa SME for Education across EMEA.

Vittorio Denti

Vittorio Denti

Vittorio Denti is a Machine Learning Engineer at Amazon based in London. After completing his M.Sc. in Computer Science and Engineering at Politecnico di Milano (Milan) and the KTH Royal Institute of Technology (Stockholm), he joined AWS. Vittorio has a background in distributed systems and machine learning. He's especially passionate about software engineering and the latest innovations in machine learning science.

Zamira Jaupaj

Zamira Jaupaj

Zamira is an Enterprise Solutions Architect based in the Netherlands. She is highly passionate IT professional with over 10 years of multi-national experience in designing and implementing critical and complex solutions with containers, serverless, and data analytics for small and enterprise companies.