Intuit’s implementation of Amazon Aurora mixed-configuration cluster: Achieving high availability, disaster recovery, and up to 55% cost savings

This post was co-written with Rajesh Saluja, Principal Engineer at Intuit.

Intuit is the global financial technology environment that powers prosperity for 100 million consumer and small business customers with TurboTax, Credit Karma, QuickBooks, and Mailchimp.

Intuit has built on AWS since 2013 and has taken an “all-in” approach in its move to the cloud. The company has been at the forefront of innovation in the cloud, adopting AWS technologies for infrastructure, machine learning (ML), data analytics, and more. It also helps in integrating its own capabilities to create best-in-class customer experiences.

Amazon Aurora Serverless v2 is an on-demand, auto-scaling configuration for Amazon Aurora. Aurora Serverless v2 scales capacity up or down based on the application’s needs. With the Aurora Serverless v2 setup, a customer can effortlessly operate their database in the cloud without having to manually manage database capacity.

In this post, we talk about Aurora Serverless v2 and the general architecture to implement Aurora mixed-configuration clusters to reduce operational complexity and cost. Additionally, we will talk about Intuit’s journey to implement Aurora mixed-configuration clusters and the cost savings that Intuit observed.

Intuit’s technical requirements

Intuit uses Amazon Aurora Global Database clusters for both production and non-production environments. It uses provisioned instances for writers and readers to achieve the required high availability and disaster recovery capabilities. Intuit was looking for a solution that supports auto scaling, handles variable workloads, and reduces overall cost.

Below are Intuit’s requirements for Aurora Clusters:

High availability – Able to recover within the specified service level agreements (SLAs) from primary node failure in an AWS Region.
Read scaling – The reader instances should be able to handle variable workloads.
Disaster recovery – Able to recover within the specified SLAs in case of a Region failure.
Cost effective – Reduce the overall cost of operation.
Increase resource utilization – Avoid overprovisioning of the resources.

Let’s understand the architecture of an Aurora mixed-configuration cluster and some of the factors to consider when using it.

Aurora Serverless v2 overview

Aurora Serverless v2 is an auto scaling configuration of Aurora. It dynamically adjusts database capacity in precise increments to closely align with your workload’s requirements. Moreover, you only pay for the capacity you consume, resulting in potential cost savings compared to provisioning for peak load. Serverless databases are designed to deliver the full range of cloud computing benefits. By moving to a serverless model, you can accomplish the following:

Reduce operational overhead – Provisioning, configuring, securing, and managing the lifecycle of database clusters usually requires a lot of work and management. Moving to Aurora Serverless can greatly reduce that overhead and free up resources to focus on creative and innovative endeavors that add direct value to the business.
Increase resource utilization and reduce costs – Moving to Aurora Serverless can help decrease operational costs by consolidating workloads into fewer clusters and moving away from custom tooling and automation.
Increase scaling capabilities – Fixed resource databases (on premises or in the cloud) can leave you constrained by the capacity available to your database instances. Workloads with highly dynamic capacity requirements, like shared development and test environments, are underutilized most of the time, but must scale during test cycles and then scale back when no longer needed to avoid unnecessary charges.

If you already have an existing provisioned Aurora cluster, you can create an Aurora Serverless v2 instance within the same cluster. This enables an Aurora mixed-configuration cluster where both provisioned and Aurora Serverless v2 instances can coexist.

Aurora Serverless v2 fully supports the extensive range of features provided by Aurora. For instance, you can create up to 15 Aurora read replicas deployed across 3 Availability Zones. Among these read replicas, any number can be Aurora Serverless v2 instances, serving as failover targets for high availability or for dynamically scaling the read operations.

Additionally, with Aurora Global Database, you have the flexibility to designate an instance as Aurora Serverless v2, incurring minimal costs when idling. Instances in secondary Regions can independently scale to support diverse workloads across different Regions. For a comprehensive list of features, see the Amazon Aurora User Guide.

The provisioned instance’s static capacity is determined by its class. For example, db.r6g.4xlarge has 128 GB of memory, 16 vCPUs, and up to 10 Gbps of network bandwidth. The capacity allocated to a serverless instance is measured in terms of Aurora capacity units (ACUs). One ACU is equal to roughly 2 GiB of memory, and an equivalent CPU and network bandwidth. The allowed minimum ACUs is 0.5, which is roughly 1 GiB of memory, and maximum ACUs can be up to 128 ACUs, which is equivalent to roughly 256 GB of memory.

You specify the serverless instance capacity range (minimum ACU–maximum ACU) at the cluster level—the Aurora Serverless v2 instances in the cluster are assigned the same capacity range. A cluster can contain a mix of serverless and provisioned instances.

Aurora provisioned compared to Aurora Serverless v2: Factors to consider

When you’re considering adding Serverless v2 instances to your architecture,

there are several factors to analyze. Since this guidance is provided purely from a technical perspective, you should test it with a production-like load to determine if you get any scaling or cost benefits with this change. We also suggest that you review Requirements for Aurora Serverless v2 for more details on feature support.

Some important factors are:

Engine version – Aurora Serverless v2 supports MySQL 8.0+ and later and PostgreSQL 13+ and later.
Instance size – Any database instance in an Aurora cluster with memory up to 256 GB can be replaced with a serverless instance. Keep in mind that if an Aurora global database is in use or if Performance Insights is enabled, a minimum of 2 ACUs is recommended.
Scaling rate – The scaling rate of the Aurora Serverless v2 database instance depends on its current capacity. The higher the current capacity, the faster it scales.
Usage characteristics – Database clusters might have different usage characteristics for the reader and writer instances. By looking at the usage pattern, you can gauge if migrating to Aurora Serverless v2 instances would provide any cost reduction benefits.

Aurora offers an auto scaling mechanism for dynamically adding or removing readers in the cluster. If you’re using the auto scaling feature, you have the option to replace it with vertical scaling offered by Aurora Serverless v2. Depending on your use case, you can replace your provisioned readers with a larger number of serverless readers with the appropriate cluster capacity range. The benefit of this setup is that vertical scaling at the instance level is faster as compared to instance launch, and you pay only for the capacity that you use. Follow this link for more details on Aurora Serverless v2 performance and scaling.

Solution overview

The following diagram shows an architecture using an Aurora mixed-configuration cluster. There are several ways you can modify this architecture depending on your specific use case.

High availability and disaster recovery architecture using mixed-configuration

In the Aurora mixed-configuration cluster, the primary instance (writer) and readers can be configured as an Aurora provisioned cluster or Aurora Serverless v2, depending on your use case.

The following table describes the configuration we’re using.

Aurora primary cluster (Region 1)
Provisioned Writer	Serverless v2 Reader	Serverless v2 Reader	Provisioned Reader
Aurora secondary cluster (Region 2)
Serverless v2 Reader
Aurora secondary cluster (Region 3)
Serverless v2 Reader

The advantage of this configuration is that the serverless readers run at the minimum ACU when there is no load, and it can automatically scale up to the maximum ACU when required. The disaster recovery instance runs close to the minimum ACU when running as a secondary cluster, and in case of failover, it can scale up to the maximum ACU based on the workload.

With Aurora Serverless v2, you pay for what you use, and by using this Aurora mixed-configuration cluster, you can save costs while still meeting your high availability and disaster recovery requirements.

Aurora offers you flexibility to choose the mix of provisioned and serverless instances to satisfy the unique characteristics of your workloads.

Provisioned database instances use the choice of tier 0–15 to determine the order in which Aurora chooses a reader database instance to promote to writer during a failover operation. In the case of Aurora Serverless v2 reader database instances, the tier number also determines whether the database instance scales up to match the capacity of the writer database instance or scales independently based on its own workload.

Aurora Serverless v2 readers in tiers 0 and 1 are kept at a minimum capacity equal to the writer database instance. This helps ensure readers can take over from the writer database instance in a failover. If the writer database instance is provisioned, then Aurora estimates the equivalent serverless capacity for the readers.

For tiers 2–15, Aurora Serverless v2 reader database instances have no minimum capacity limit. When readers are idle, it can scale down to the minimum ACU value specified in the cluster’s capacity range.

For more information, see Choosing the promotion tier for an Aurora Serverless v2 reader.

In the following section, we show the way Intuit implemented Aurora global mixed-configuration clusters.

Intuit’s implementation and architecture

Intuit uses Aurora global clusters to achieve disaster recovery and high availability for production and non-production databases. In the older architecture, the disaster recovery instance was created as the same size as the primary instance. This led to increased costs for an instance that was only serving disaster recovery scenarios.

Intuit kept minimal resources on the secondary site for non-critical capabilities. When there was a need to failover to a secondary Region, the secondary instance would first need to be scaled up to make sure it could handle the same workload as the primary instance. This delayed the recovery and extended downtime, as the instance scaling could take several minutes.

The older architecture of Aurora clusters at Intuit is shown in the following figure.

The primary Region (Region 1) consists of an Aurora provisioned writer as the primary database instance, and one provisioned reader that’s the same size as the primary for high availability. The writer can be 8xlarge, 4xlarge, or 2xlarge (depending on the application type), and the reader can be 8xlarge, 4xlarge, or 2xlarge (similar to the writers’ configuration).

In the disaster recovery Region (Region 2), we have one Aurora provisioned instance that’s the same size as the primary instance (8xlarge, 4xlarge, or 2xlarge) to make sure that in case of a failover, the disaster recovery instance is able to handle the workload.

Intuit conducted a Proof of Concept (POC) using Aurora Serverless v2 that yielded significant benefits. It demonstrated that it is beneficial to use Serverless v2. Using Aurora Serverless v2 reduced Intuit’s costs without compromising the company’s high availability and disaster recovery requirements. Furthermore, this approach minimizes the need to determine the appropriate instance size saving Intuit additional effort.

After the POC, it was found that using Aurora Serverless v2 readers instead of provisioned readers can be helpful in situations involving multi-tenant databases, distributed databases, development and test systems, and other places where workloads are unpredictable and change often. Based on the POC, Intuit changed disaster recovery provisioned instances to Serverless v2. Intuit further evaluated the outcome and changed the existing provisioned readers to Serverless v2 in the upcoming phases.

Intuit’s architecture using Aurora Serverless v2 is shown in the following diagram.

In the new architecture, the primary Region (Region 1) consists of an Aurora provisioned writer and one provisioned reader the same size as the primary writer for high availability. The reader is using Aurora Serverless v2 (priority tier 0/1) for high availability in production and (priority tier 2–15) for scaling in non-production. The optional additional reader is Aurora Serverless v2 (priority tier 2–15 for scaling).

Intuit replaced the Aurora provisioned reader in the secondary Region (Region 2) to serverless for disaster recovery purposes. The reader has a minimum of 1.0 ACU to the specified maximum (depending on the writer configuration in Region 1).

By switching the disaster recovery instances from Aurora provisioned to Aurora Serverless v2, Intuit was able to reduce the cost by 55% on the disaster recovery instance by not overprovisioning and still meeting the desired SLAs.

Intuit’s strategy to implement disaster recovery is to keep the minimum ACUs low and right before failover, modify the cluster to provide higher values of ACUs matching or exceeding their primary Aurora cluster instance. This step adds a little more time for failover but helps Intuit scale faster because the scaling rate of the Aurora Serverless v2 database instance depends on its current capacity. The higher the current capacity, the faster it scales.

Benefits observed by Intuit when using Aurora mixed-configuration clusters

The following are some of the benefits that Intuit observed since implementing the Aurora Serverless v2 mixed-configuration clusters:

Reducing guesswork – Provisioning for capacity becomes hassle-free because Aurora mixed-configuration clusters remove the need for predicting database capacity needs.
Scalability for high transaction volumes – Aurora Serverless v2 scalability enables Intuit to scale instantly to help meet application demands or spiky workloads.
Fine-grained scaling – Aurora Serverless v2 scales capacity in small increments required to provide the best performance for the resources consumed.
Efficient resource utilization – When not in use, instances scale down automatically by optimizing resource utilization which reduces costs.
Cost savings – Paying only for the capacity consumed allowed Intuit to achieve a cost benefit of 55% compared to the cost of equivalent provisioned Aurora clusters.
High availability and disaster recovery – By taking advantage of Aurora Serverless v2 features, Intuit was able to meet all the SLAs for high availability and disaster recovery while keeping costs low.

Cost benefits observed by Intuit

Intuit successfully implemented the first phase of this solution across 15 Aurora clusters. In this phase, the Aurora provisioned reader instances and disaster recovery instances were replaced with Aurora Serverless v2 in both production and non-production environments for larger instance types such as 8xlarge, 4xlarge, and 2xlarge.

For smaller instance types, Intuit didn’t observe cost savings for Intuit’s particular use case. However, the ability of Aurora Serverless v2 to automatically scale for sporadic or infrequent loads still provides significant benefits in terms of manageability.

Conclusion

Intuit successfully implemented Aurora mixed-configuration clusters, which resulted in cost benefits, instant scalability, and efficient resource utilization.

In the next phase, Intuit plans on extending this solution by converting the provisioned readers in the primary Region to serverless to provide cost-effective high availability in the primary region. You too can architect your environment based on your requirements in order to experience the benefits of an Aurora mixed-configuration cluster.

If you have comments, leave them in the comments section.

About the Authors

Rajesh Saluja is a Principal Engineer at Intuit

Vineet Agarwal is a Senior Database Specialist SA with AWS. Prior to AWS, Vineet worked for large enterprises in financial, retail, and healthcare verticals, helping them with database and solutions architecture.