Use Amazon Aurora Global Database to set up disaster recovery within India
Disaster recovery (DR) ensures that businesses can continue to operate in the event of an unexpected disruption, such as a natural disaster, power outage, or cyberattack. Maintaining resilience is key to operational success for your business, and disaster recovery planning plays a vital role in that. Many regulators across multiple industries are mandating disaster recovery with strong authentication, advanced encryption methodologies, and data durability with low Recovery Time Objective (RTO) and Recovery Point Objective (RPO).
One such example is the financial service institutions in India, who are mandated to adhere to strict RTO and RPO mandates as well as demonstrate their readiness by periodically running their operations from the disaster recovery site. These institutions need to comply to data sovereignty requirements necessitating the primary as well as the DR site to be within India. In addition, they are required to conduct regular drills to run their operations from the DR site to demonstrate their preparedness.
You can use Amazon Aurora Global Database, which replicates your data typically under a second across Regions, to meet stringent DR requirements. With the launch of the Asia Pacific (Hyderabad) Region, you can now set up an Aurora global database between it and the Asia Pacific (Mumbai) Region, all within India, to comply with the data sovereignty requirements.
In this post, we discuss how you can use Aurora Global Database to achieve your RPO and RTO requirements, and meet data sovereignty while reducing the operational overheads associated with conducting periodic DR drills.
Benefits of Aurora Global Database
Aurora Global Database allows you to create a single database that spans multiple Regions, giving you the ability to replicate your data with no impact on database performance, and enabling fast local reads with low latency. Failovers can be completed in under a minute to the DR site, which is the secondary Region in the Aurora global database cluster. It’s especially useful in the event of a disruption in the primary Region.
Aurora Global Database typically moves data across Regions in under a second through storage-based replication, minimizing the impact to database performance. Dedicated infrastructure in the purpose-built storage layer of Aurora is responsible for the replication, which keeps the database fully available for application workloads. You can also fail over and fail back between clusters in the primary and secondary Region. Aurora Global Database supports the following options:
- Managed planned failover – You can promote the secondary Region to start accepting writes. Aurora will block writes until the replication catches up, and then promote the secondary Region to start accepting writes. This solution can help with performing DR drills without data loss and switching the primary Region to another Region.
- Manual unplanned failover – Aurora Global Database supports manual unplanned failover. This “detach and promote” feature is intended to help recover from any unplanned outage. Failover to the secondary Region typically occurs in under a minute.
Unlike a traditional disaster recovery setup, with Aurora Global Database, you can utilize the secondary Region to scale read workloads. You can access the information globally with low latency. The secondary cluster is read-only, so it can support one additional read replica and allows you to run a cluster with 16 read-only replica instances.
Aurora Global Database also supports write forwarding from secondary Regions. Writes issued by the application are forwarded to the primary Regions. You can build applications that are agnostic of the primary Region as long as they can tolerate the higher latency of cross-Region writes.
Architecture design choices with Aurora Global Database
Aurora Global Database supports various compute configurations for disaster recovery to meet varying degrees of RTOs:
- Provisioned instances – You can pre-provision instances in the secondary Region, sizing the instances based on their anticipated peak loads. You can use disaster recovery using the provisioned global database cluster to serve predictable, constant read workloads from the secondary Region.
- Headless – A cluster without any allocated compute instance is called a headless cluster. A headless Aurora cluster in the secondary Region is an option to optimize cost with a tradeoff that failovers (RTO) could be several minutes.
- Serverless – You can also create a secondary Region using Amazon Aurora Serverless v2 instances. After a failover, Aurora Serverless scales up quickly to support the application workload. When not in use, the replicas scale down to save costs. Aurora Serverless is also suitable for serving unpredictable read workloads on the secondary Region. Overall, Aurora Serverless offers a balance of RTO and cost tradeoffs when compared to provisioned and headless compute configurations.
Let’s go deeper and understand each configuration for disaster recovery.
Configuration 1: Disaster recovery with a provisioned secondary cluster
A provisioned cluster on a secondary Region is one with the allocated compute for the replica instance. This configuration can help serve constant, predictable read workloads from the secondary Region. The cost of this cluster depends on the number of reader instances and the size of each instance selected for the secondary Region.
This provides your application with an effective RPO of typically less than 1 second and an RTO of less than 1 minute, providing a strong foundation for a business continuity plan. We recommend using a similar sized configuration as in the primary Region. When a failover occurs, this allows the secondary Region to have adequate instances of appropriate size to handle the load.
The following diagram illustrates this architecture.
For more information about using provisioned Aurora global database clusters in a DR solution, refer to Cross-Region disaster recovery using Amazon Aurora Global Database for Amazon Aurora PostgreSQL.
The following is the summary of the provisioned compute configuration, which offers the best RTO among the three configurations:
- Use case – Constant distribution of read workload from the secondary Region
- Advantages – Predefined compute capacity for the secondary Region
- Cost – Fixed predictable cost for disaster recovery
- RTO – Typically less than a minute
Configuration 2: Disaster recovery with a headless secondary cluster
A headless database cluster is one without the compute allocated to the cluster. This configuration can help optimize the cost of the secondary Region because you only pay for the storage, I/O, and replication. You won’t be charged for compute, and you can add compute to a headless cluster any time before serving traffic from the secondary Region. However, the tradeoff is that adding compute can take up to several minutes and increase the RTO.
A secondary Region contains the cluster volume that represents the data for the secondary database cluster. Aurora replicates data to the secondary Region using a dedicated infrastructure over the AWS backbone for a network with latency typically under 1 second. Even with a headless cluster for disaster recovery, your data is highly durable because Aurora maintains six copies of data across three Availability Zones. Durability will always be maintained regardless of compute’s availability to the cluster.
The following diagram illustrates this architecture.
For more information about using Aurora Global Database headless clusters as a part of a DR solution, refer to Achieve cost-effective multi-Region resiliency with Amazon Aurora Global Database headless clusters.
To summarize, this architecture has the following details:
- Use case – Good option when the secondary Region doesn’t have any requirement to serve read traffic and the application can tolerate several minutes of RTO
- Advantages – Significant cost optimization over the other alternatives
- Cost – Secondary Region is achieved without additional compute cost when not in use
- RTO – Typically several minutes as new instances are created before the failover
Configuration 3: Disaster recovery with a serverless secondary cluster
The third and latest addition is disaster recovery using Aurora Serverless v2, which adds one more architecture pattern while using Aurora Global Database.
Aurora Serverless v2 scales instantly in a fraction of a second. As it scales, it adjusts capacity in fine-grained increments to provide the right amount of database resources that the application needs, which allows you to use the secondary Region for serving unpredictable read workloads. Because there is no database capacity for you to manage, you pay only for the capacity your application consumes. Aurora Serverless v2 allows you to save up to 90% of your database cost compared to the cost of provisioning capacity for peak load.
The following diagram illustrates this architecture.
To summarize, this option has the following details:
- Use case – When you have uncertain read workloads to be served from the disaster recovery site or you require less than 1 minute of failover time (RTO) without needing to provision for peak workload
- Advantages – Optimizes cost for serverless on compute
- Cost – Lower cost than a provisioned cluster; Aurora Serverless optimizes the cost by dynamically scaling the cluster up and down based on your traffic
- RTO – Typically less than a minute, with fast scalability to match the application load
Configure Aurora Global Database with Aurora Serverless v2
For this post, we assume that you have an existing primary DB cluster running with a database engine version that supports Aurora Global Database and Aurora Serverless V2, so you can add another Region to it. To set up your DR solution with Aurora Serverless V2, complete the following steps:
- On the Amazon RDS console, choose Databases in the navigation pane.
This page lists the information about the primary Region cluster.
- Select your database and on the Actions menu, choose Add AWS Region.
- For Global database identifier, enter a name for your global database cluster that is unique across all global databases in your account.
- For Secondary Region, choose your secondary Region.
Note that this drop-down menu will only list the Regions that meet the version compatibility with the production or source Aurora cluster.
- For DB instance class, select Serverless.
- Choose Serverless v2.
- For Capacity range, define the capacity range for the Aurora Serverless DR cluster to scale (we need to add the value in Aurora capacity units (1 ACU=2 GiB).
- Additionally, you can configure Multi-AZ deployment, which will create another reader in a different Availability Zone.
Multi-AZ can be useful if you need high availability for read workloads in case of Availability Zone failure.
- For Connectivity, choose the VPC that defines the virtual networking environment for this DB instance. You can choose the defaults to simplify this task.
- Configure Public Access parameters if required.
- Select the security group.
- Choose the AWS Key Management Service (AWS KMS) key for database encryption.
- To minimize endpoints for the Aurora global database, select Turn on global write forwarding.
This feature of Amazon Aurora MySQL-Compatible Edition lets secondary clusters in an Aurora global database forward SQL statements that perform write operations to the primary cluster.
- Complete the remaining setup such as backup, monitoring, and auditing.
- Choose Create cluster to finish.
Use Aurora write forwarding with disaster recovery
Additionally, with write forwarding in an Aurora global database cluster, you can make your application deployment agnostic to endpoint changes. The application deployed in your DR Region can be configured with the secondary Region’s cluster endpoint. This allows you to reduce the number of endpoint changes when a failover occurs.
When the secondary Region receives the write traffic, this feature of Amazon Aurora MySQL-Compatible Edition lets secondary clusters in an Aurora global database forward SQL statements that perform write operations to the primary cluster. The primary cluster updates the source and then propagates the resulting changes back to all secondary Regions. Your application deployment in the secondary Region can point to the secondary cluster with write forwarding enabled. This also allows you to send parallel traffic to the secondary Region while Aurora forwards the write request to the primary Region. In the event of a failover to the secondary Region, application deployment will not require an database cluster endpoint change because the application is already configured to use the read/write cluster endpoint and read-only endpoint for replicas in the secondary cluster.
The write forwarding configuration saves you from implementing your own mechanism to send write operations from a secondary Region to the primary Region. Aurora handles the cross-Region networking setup. Aurora also transmits all necessary session and transactional context to provide read-after-write consistency in secondary Regions.
The following diagram illustrates this setup.
Code samples for Aurora Global Database deployment
To provision a global database seamlessly, you can run the following deployment script. The deployment script completes the following steps:
- Create a global database cluster.
- Create a primary database cluster in
- Create a writer instance in the primary cluster.
- Add a primary cluster in the global database cluster.
- Create a secondary database cluster in
- Add a secondary database cluster in the global database cluster.
To run the script, first create an Amazon Elastic Compute Cloud (Amazon EC2) environment using AWS Cloud9. For instructions, refer to Creating an EC2 Environment. Save the following bash script as
deployment.sh in the AWS Cloud9 environment, then open a terminal and run the following command:
The script prompts you to enter cluster identifiers for the primary, secondary, and global database cluster. Provide a unique cluster identifier as per the naming constraints.
To create a global database cluster with a provisioned secondary, use the following code:
To create a global database cluster with a headless secondary, use the following code:
To create a global database cluster with a serverless secondary, use the following code:
To avoid incurring unwanted charges, delete the resources you created as part of this post:
- Delete your EC2 environment in AWS Cloud9.
- Remove the secondary DB cluster from the global database.
- Remove the primary DB cluster from the global database.
- Delete the Aurora cluster instances.
- Delete the Aurora cluster.
Aurora doesn’t provide a single-step method to delete a DB cluster. This design choice is intended to prevent you from accidentally losing data or taking your application offline. To automate the cluster delete, you can use the following AWS Command Line Interface (AWS CLI) script, if you created the cluster using the previous script.
To run the following script, save the script in a new file in the AWS Cloud9 environment called
./deletedeployment.sh and run the script as mentioned earlier in this post.
The script to delete the DR setup for an Aurora provisioned or serverless global database cluster is as follows:
The script to delete the DR setup for an Aurora headless global database cluster is as follows:
In this post, we demonstrated various architecture patterns using Aurora Global Database for a disaster recovery environment (provisioned, headless, and serverless). We also demonstrated how you can seamlessly deploy all these disaster recovery patterns using the AWS CLI in a bash script. This post covered other available configurations while creating the cluster such as write forwarding and scaling. Remember, Aurora Global Database depends on cluster type, instance type, and Region availability in order to set up a global database for different Regions and versions. For more information, refer to Region and version availability.
Try out the solution and if you have any comments or questions, leave them in the comments section.
About the Authors
Satinder is Head of Solutions Architects team (Digital Companies) for AWS in India. He has over 25 years of experience working with customers across different industries, including fintech, logistics, and ecommerce, helping customers use technology for business transformation. Outside of work, he is a cycling enthusiast and loves to go for long rides.
Arun Pandey is a Senior Database Specialist Solutions Architect at AWS. With over 17 years of experience in application engineering and infra-architecture, Arun helps digitally native companies in India build resilient and scalable database platforms, which aids in solving complex business problems and innovating faster on AWS.