Skip to main content

Boosting resilience using a multicloud strategy with Monzo

Learn how digital bank Monzo achieved multicloud resilience by building Monzo Stand-in, aligned with the AWS Resilience Analysis framework.

Benefits

1%
of the cost of primary stack
1%
or less of total engineering changes are for Monzo Stand-in

Overview

As a digital bank, Monzo Bank (Monzo) must provide highly reliable and resilient service to its customers, who rely on Monzo’s services to buy groceries, make bank transfers, and pay bills. To reduce the risk of impairments impacting its customers, Monzo—which is built on Amazon Web Services (AWS)—created a solution called Monzo Stand-in, an independent set of systems that can take over for the primary stack in case of an incident. With this multicloud solution, Monzo cost-effectively complies with regulations and keeps customers connected to their money 24/7.

 

Missing alt text value

About Monzo Bank

Monzo aims to make money work for everyone. Monzo offers retail and business banking across the United Kingdom and Europe; over 50 percent of UK adults aged 25–34 bank with Monzo.

Opportunity | Using multicloud for resilient recovery solution for Monzo

Monzo is a digital-only bank serving 15 million personal and business customers. The bank’s main stack runs on AWS for reliable, scalable cloud infrastructure. “Monzo has used AWS since its early days when the vast majority of financial institutions were still running in on-premises data centers,” says Miles Bryant, senior staff engineer at Monzo. “We’ve used AWS to scale elastically over that time, expanding the range of services we use and the complexity of the products we offer.”

In creating a recovery solution, however, Monzo wanted to use a separate cloud provider and different software so that no single issue with its code or infrastructure could affect both the primary and the backup system. This aligns with the AWS resilience analysis framework, which, among other principles, aims to avoid single points of failure by creating redundancies.

Monzo needed to implement robust disaster recovery, both to protect customers and to comply with the European Union’s Digital Operational Resilience Act. “If we have an incident, that means our customers can’t spend their money,” says Andrew Lawson, senior staff engineer at Monzo. “They’re stuck in the shop or they can’t get their train home. So we knew we needed a backup system that could process payments while we fix the issue.”

Solution | Building a simplified recovery system to protect customers

Monzo created the Monzo Stand-in solution, hosted on a separate cloud provider, to take over critical functions in case of an outage. While Monzo’s primary stack has over 3,000 services hosted on AWS, Monzo Stand-in has only 18. These services let customers perform the most time-sensitive and crucial tasks—such as making payments, sending and receiving bank transfers, and viewing their balance—when the main app encounters issues. The functions in Monzo Stand-in are built using different code than the primary stack to increase redundancy.

The application is intentionally simple. Initially, the company thought the Stand-in solution would be a multi-year project. By strategically selecting only the most critical customer functions to replicate in Stand-in, the team reduced the timeline to only 9 months. Keeping the app simple also reduces risk, minimizes overhead, and simplifies management. “A lot of active disaster recovery solutions are inherently baked in technical complexity and are difficult to pull off well. Building something that looks like the Stand-in platform helps keep complexity low and solve problems quicker,” says Lawson.

Monzo Stand-in is continuously, rigorously tested. Each day, a small percentage (currently about 500 users) of customers are routed to Monzo Stand-in when they open the application. This lets the team see how the solution responds to regular use, while still giving customers the option to opt out if they need to perform functions that aren’t available in Monzo Stand-in. The team also tests how Monzo Stand-in responds in comparison to the primary AWS stack on a wider set of transaction data. The solution was designed with several levers that Monzo can use to switch traffic over to Monzo Stand-in automatically when needed or intentionally for testing. Monzo can also control how quickly people are switched back to the main solution after resolution.

Outcome | Providing robust reliability at only 1 percent additional running costs

The Monzo Stand-in solution has already served customers during both partial and major impairments. For one impairment that lasted for an hour, Monzo implemented the Stand-in solution shortly after detecting a problem so customers could still access their money during that time. “If you’re only building a backup solution to tick the box of resiliency, you need to think again,” says Lawson. “It’s a customer experience investment as well, because a customer might not be able to get home if they couldn’t pay a fare during an outage.”

The solution is simple to maintain and accounts for less than 1 percent of the company’s major deployments since launch, meaning that developers aren’t overburdened with maintaining two different solutions. Running costs for Stand-in equal only 1 percent of the total costs of the primary Monzo stack. The solution is also compliant with the European Union’s Digital Operational Resilience Act.

“We are focused on making the experience as good for customers as possible,” says Bryant. “We built a solution that meets customer experience, regulatory compliance, and platform resiliency requirements without trading off one of those aims for another.”

Missing alt text value
We built a solution that meets customer experience, regulatory compliance, and platform resiliency requirements without trading off one of those aims for another.

Miles Bryant

Senior Staff Engineer, Monzo Bank

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages