Automate high availability setup in Amazon RDS Custom for Oracle

Amazon Relational Database (Amazon RDS) Custom automates database administration tasks and operations. RDS Custom makes it possible for you as a database administrator to access and customize your database environment and operating system. With RDS Custom, you can customize to meet the requirements of legacy, custom, and packaged applications.

Amazon RDS Custom for Oracle supports read replicas that are built on Data Guard technology, and you can use these replicas to offload reads from the primary and for disaster recovery. With RDS Custom for Oracle, you can create up to five read replicas, which are managed physical standby databases using Data Guard configuration. These read replicas created in mounted mode can be opened in read-only mode with Active Data Guard licenses and can be used for reporting purposes. Read replicas in RDS Custom for Oracle use asynchronous log shipping mode with Max Performance Data Guard configuration.

A high availability (HA) solution for the database stack is an important aspect to consider for enterprise workloads when migrating or deploying Oracle databases in the AWS Cloud to help ensure that the architecture can meet the service level agreement (SLA) of the application. More importantly, customers are looking for managed features for database administrative tasks, including setting up and managing HA. As of this writing, RDS Custom for Oracle doesn’t support a managed HA solution like Multi-AZ. Because you have access to the underlying OS and database, you can set up HA configurations yourself and administer them. Refer to Enabling High Availability with Data Guard on Amazon RDS Custom for Oracle and Build high availability for Amazon RDS Custom for Oracle using read replicas for details on how to set up HA manually using replicas and Oracle Data Guard.

In this post, we provide a solution to automate this HA setup using AWS CloudFormation and AWS Systems Manager documents to offload the heavy lifting tasks in setting up HA.

Solution overview

The solution uses the architecture shown in the following figure to set up HA for RDS Custom for Oracle. The primary RDS Custom for Oracle DB instance is in one Availability Zone (AZ1), and a Data Guard standby instance (RDS Custom for Oracle read replica) is in another Availability Zone (AZ2). Data Guard is configured for max availability with fast start failover (FSFO) enabled. As a best practice, we recommend setting up a Data Guard observer on a standalone Amazon Elastic Compute Cloud (Amazon EC2) instance in a third Availability Zone (AZ3) different from the primary and standby. The Data Guard FSFO will detect and manage the failover in case of any failures on the primary.

The architecture uses the following services and resources:

RDS Custom and read replicas – RDS Custom brings the benefits of Amazon Relational Database Service (Amazon RDS) to a market that can’t easily move to a fully managed service because of customizations that are required with third-party applications. In this solution, we use the RDS Custom for Oracle replica as the Data Guard standby instance.
Amazon EC2 – Amazon EC2 provides scalable computing capacity in the AWS Cloud. You can use Amazon EC2 to launch as many or as few virtual servers as you need, configure security and networking, and manage storage. In this solution, we configure a Data Guard observer on an EC2 Linux instance.
Amazon S3 – Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. In this solution, we use Amazon S3 to store and retrieve Oracle client media to set up the Oracle Data Guard observer.
AWS CloudFormation – AWS CloudFormation is a service that helps you model and set up your AWS resources. You create a template that describes all the AWS resources that you want (like EC2 instances or RDS DB instances), and AWS CloudFormation takes care of provisioning and configuring those resources for you. In this solution, we deploy all the resources like the EC2 instances and RDS Custom for Oracle read replica, and run Systems Manager documents using a CloudFormation script.
AWS Lambda – AWS Lambda is a compute service that lets you run code without provisioning or managing servers. With Lambda, you can run code for virtually any type of application or backend service. In this solution, we use Lambda for error handling.
Systems Manager document – A Systems Manager document (SSM document) defines the actions that Systems Manager performs on your managed instances. In this solution, we have created custom SSM documents to configure and set up Oracle Data Guard for max availability and fast start failover.

CloudFormation template

In this solution, we provide a CloudFormation template that takes minimum input parameters from the user and performs the following high-level tasks:

Provision an EC2 instance that acts as a Data Guard observer instance.
Copy the Oracle client media provided by you in an S3 bucket into the observer instance.
Install an Oracle client on the observer instance.
Provision an RDS Custom for Oracle read replica, which acts as the standby DB instance.
Create tnsnames.ora on the EC2 observer instance to include the primary and standby databases.
Verify connection to the primary database from the observer instance.
Verify connection to the standby database from the observer instance.
Create a wallet to store the Data Guard user password.
Make the following Data Guard configuration changes:
- Set Data Guard protection mode to maximum availability to provide the highest level of data protection.
- Set log transport (LogXptMode) to FASTSYNC, which configures the redo transport using SYNC and NOAFFIRM attributes of the LOG_ARCHIVE_DEST_n initialization parameter.
- Set FastStartFailoverTarget to the standby instance and FastStartFailoverThreshold to 45 seconds.
- Enable fast start failover.
Start the Data Guard observer process.
Verify that the SSM documents are successful using Lambda. If there is a failure in any SSM document, the script errors out and needs manual intervention to fix and proceed. Refer to the GitHub readme for more troubleshooting tips.

Prerequisites

You should have a primary RDS Custom for Oracle DB instance up and available prior to using this solution. This script doesn’t provide capabilities to create and configure a customer VPC, AWS Identity and Access Management (IAM) roles, instance profiles, S3 buckets, and AWS Key Management Service (AWS KMS) keys that are needed for RDS Custom for Oracle because these are generally configured and controlled by different teams in most organizations. Refer to Prerequisites for creating an RDS Custom for Oracle instance and ensure all the RDS Custom for Oracle pre-installation steps are performed. Note the following:

You need an RDS Custom for Oracle primary DB instance up and in an available state.
You need an S3 bucket with the Oracle Client media. For example, LINUX.X64_193000_client.zip for the 19c Oracle Home.
The script is designed to create an RDS Custom for Oracle read replica. Ensure the general requirements and network requirements for the read replica creation are met.
It’s recommended to have the primary DB instance, replica, and observer in three separate Availability Zones, but it’s not a requirement for this solution to work.
The user running this solution should have privileges to run the CloudFormation template either via the AWS CloudFormation console or AWS Command Line Interface (AWS CLI).

Deploy the solution

You can use this solution through the AWS Management Console or run it via the AWS CLI. To deploy this solution in your account, complete the following steps:

Clone the GitHub repository to your local machine or download the script from AWS Samples.
Follow the GitHub readme to verify the prerequisites and deploy the CloudFormation stack.
Verify the completion of the stack deployment.

You should see the following resources deployed in your account:

IAM role – An IAM role called RDS-Custom-Automation-Role-xxxx is created to allow user interaction with AWS APIs from the EC2 observer instance.
RDS Custom for Oracle read replica – An RDS Custom for Oracle read replica is created to act as the standby database instance.
EC2 instance – An EC2 instance (t3.medium) called RDS-Custom_Observer-HA-xxxx is created and set up as the Data Guard observer. Note the following configuration:
- The Oracle client is installed and configured as ec2-user (OS-level user).
- The Oracle client install location is /home/ec2-user/oracle/client.
- All the logs are in /home/ec2-user/logs.
- The bash_profile is set with the Oracle client environment parameters, so you can connect to the primary or standby databases. Tnsnames.ora for both the primary and standby database is also set. Therefore, you can use the normal sqlplus and dgmgrl on the primary and standby databases.
Lambda function – A Lambda function called RDS-Custom-HA-Automation-xxxx is created to provide support operations for the automation.
SSM documents – Three SSM documents are created: RDS-Custom-HA-Automation-Tnsnames-xxxx, RDS-Custom-HA-Automation-Wallet-xxxx, and RDS-Custom-HA-Automation-dgmgrl-xxxx.

Current limitations

This version of the solution has the following limitations:

Oracle database flashback is not enabled, so after a Data Guard failover from primary to standby, the old primary has to be rebuilt from backup or you must recreate the replica. To automatically reinstate the old primary, the flashback database feature should be enabled on both the primary and standby prior to a failover. Use the following steps to enable flashback database; these are sample commands and sample values for flashback-related parameters used in the below example. You should set these values as per your workload requirements.
- Pause RDS Custom for Oracle database automation for the primary and standby database and disable FSFO if it has been enabled previously.
- Log in to the underlying RDS Custom for Oracle EC2 instances on both primary and standby and create a directory using user /rdsdbdata for storing the flashback logs:
```
$ mkdir /rdsdbdata/fra
$ ls -ld /rdsdbdata/fra
drwxrwxr-x 2 rdsdb rdsdb 4096 Feb 2 16:40 /rdsdbdata/fra
```
- Enable flashback database on the primary database:
```
SQL> alter system set db_recovery_file_dest='/rdsdbdata/fra';
SQL> alter system set db_flashback_retention_target=60;
SQL> alter system set db_recovery_file_dest_size=10G;
SQL> alter database flashback on;
```
- Enable flashback database on the standby instance:
```
DGMGRL> disable fast_start failover;
DGMGRL> edit database "ORCL_B" set state=apply-off;
SQL> alter system set db_recovery_file_dest='/rdsdbdata/fra';
SQL> alter system set db_flashback_retention_target=60;
SQL> alter system set db_recovery_file_dest_size=10G;
DGMGRL> edit database "ORCL_B" set state=apply-on;
DGMGRL> enable fast_start failover;
```
The EC2 observer instance is not configured for high availability. If there is a failure of the observer instance, it can be restored from a snapshot backup.
This solution doesn’t flip the database connection endpoint automatically to the standby DB endpoint after a failover. Therefore, manual intervention is needed at the application level to handle this change. You can follow one of the following options to handle the DB endpoint flip more seamlessly for the applications:
- Implement Oracle’s recommendation to use a database startup trigger to direct TNS-based clients to the correct DB instance post-switchover or failover.
- Create an Amazon Route 53 target with both primary and standby database endpoints. Use a shell script or a Lambda function along with a database trigger to flip the CNAME at Route 53 when a failover happens.

Considerations

Note the following additional considerations:

This solution is intended to help you automate the manual steps needed to set up a high availability configuration in RDS Custom for Oracle using read replicas and a Data Guard observer. This tool is not a replacement for managed Multi-AZ for RDS Custom for Oracle when it becomes available.
Although this solution is non-invasive, make sure to test it in your dev and test environments before implementing this in production.
Take extra precaution while deleting the CloudFormation stack. The deletion policy for the EC2 observer instance and the read replica is set to retain. This is to ensure there is no accidental deletion of the new primary after a failover.
Due to a known issue in the RDS Custom for Oracle framework, instances that have undergone a flashback operation can’t be deleted. Therefore, if you performed a flashback database operation and reinstated the old primary, you may not be able to delete the stack.
You are responsible for the cost of the services that are deployed as part of this solution, including Oracle licensing costs.

Clean up

To avoid ongoing costs, clean up the resources you no longer need that you created as part of this solution.

Summary

In this post, we showed how you can use a solution to help you create and manage a high availability setup for your RDS Custom for Oracle databases using CFN and SSM for automation.

If you have any questions or comments, leave them in the comments section.

About the Authors

Yamuna Palasamudram is a Senior Database Specialist Solutions Architect with Amazon Web Services. She works with AWS RDS team, focusing on commercial database engines like Oracle. She enjoys working with customers to help design, deploy, and optimize relational database workloads on AWS.

Sharath Chandra Kampili is a Database Specialist Solutions Architect with Amazon Web Services. He works with AWS RDS team, focusing on commercial database engines like Oracle. Sharath works directly with AWS customers to provide guidance and technical assistance on the database projects , helping them improve the value of their solutions when using AWS.

Arnab Saha is a Senior Database Specialist Solutions Architect at AWS. Arnab specializes in Amazon RDS, Amazon Aurora and Amazon Elastic Block Store. He provides guidance and technical assistance to customers thus enabling them to build scalable, highly available and secure solutions in AWS Cloud.