AWS Architecture Blog
Enable transparent connectivity to Oracle Data Guard environments using Amazon Route 53 CNAME records
Customers choose AWS for running their Oracle database workload to help increase resiliency, performance, and scalability of the database layer. A high availability (HA) solution for the database stack is an important aspect to consider when migrating or deploying Oracle databases in AWS to help ensure that the architecture can meet the service level agreement (SLA) of the application. Customers who run their Oracle databases on Amazon Elastic Compute Cloud (Amazon EC2) commonly choose Oracle Data Guard physical standby databases to help meet the HA and disaster recovery (DR) for their Oracle database workloads.
As discussed in this Oracle documentation, role-based services with multiple listener endpoints in the connection URL or tnsnames.ora
entry is the preferred way to transparently connect to the database layer that is part of a Data Guard configuration. However, some application components and driver configurations don’t support multiple hostnames in the connection URL. Those applications require a single hostname or IP for the clients to connect to the Data Guard environment.
This post talks about the concept of using an Amazon Route 53 CNAME record in a Data Guard environment on EC2 and lists the artifacts to automatically route the connection between primary and standby environments in a Data Guard configuration based on the database role.
Solution overview
To help avoid the manual efforts to update DNS entries or tnsnames.ora
file after a failover or switchover operation in a Data Guard environment, the solution uses an AFTER DB_ROLE_CHANGE
trigger to automate the DNS failover process. This trigger runs a shell script on the database host, which in turn updates the CNAME record in Route 53 to point the CNAME records to reflect the role transition. The following diagram illustrates the solution architecture (Figure 1).
The solution discussed in this post covers routing new database connection requests to the right database post a Data Guard switchover activity. However, other factors such as application/client TTL settings and behavior of the connection pool to invalidate the connection handles created prior to the switchover activity can cause the application to connect to the database with a different role (like read-write workloads are connected to standby after switchover) and can generate errors, such as ORA-16000: database or pluggable database open for read-only access. It is a best practice to verify the database role before using the connection handles for transactions to verify that the application is connected to the database with the expected role.
The following workflow depicts the sequence of events that happens during a failover or switchover activity in a Data Guard environment to enable seamless connectivity for the application:
- A role transition event occurs in the Data Guard environment.
- The event triggers the
AFTER DB_ROLE_CHANGE
trigger. - The trigger runs the shell script on the EC2 instance using a scheduler job.
- The shell script updates Route 53 to point the CNAME records to reflect the role transition.
Prerequisites
This post assumes the following prerequisites:
- You should have an existing Data Guard configuration with one primary and one standby DB instance within a single VPC. Refer to the Oracle quick start template to deploy a Data Guard environment on Amazon EC2.
- The steps discussed here are for self-managed Data Guard configuration on Amazon EC2 with Red Hat Linux AMI.
- The scenario discussed in the post involves one primary and one standby database in the Data Guard configuration. For any other configurations, the scripts shown in this example require additional changes.
- A private or public Route 53 hosted zone should be configured in the VPC where the DB environment exists.
- The shell script uses the instance profile of the EC2 instance to run the AWS Command Line Interface (AWS CLI) commands. Make sure that the instance profile of the EC2 instances hosting the primary and standby databases has a policy attached that allows changing the record set in the hosted zone such as the following:
Nslookup
,jq
, andcurl
utilities must be installed on all of the DB hosts. If not installed, you can install the utility on RHEL Linux using the following command:
Environment details
This post assumes a Data Guard configuration with two instances within a single VPC, one primary and one standby, with the following details and naming conventions:
- Oracle database version – 19.10 configured in maximum performance mode with Active Data Guard
- Route 53 domain name –
mydbdomain
- Database name –
orcl
- DB_UNIQUE_NAME –
orcl_a
andorcl_b
- Instance names –
orcl
- Route 53 A record for the host in AZ1 –
orcl-a-db.mydbdomain
- Route 53 A record for the host in AZ2 –
orcl-b-db.mydbdomain
Route 53 configuration
Two A records are created in Route 53 to point to the IPs of the primary and standby hosts. Two CNAME records are also created in Route 53, which are automatically updated during the Data Guard switchover and failover scenarios. The CNAME record orcl-rw.mydbdomain
points to the instance in the primary role that can accept read/write transactions, and orcl-ro.mydbdomain
points to the instance in the standby role that accepts read-only queries.
The A records configuration is as follows:
- DB host IP in AZ1 (10.0.0.5 in this example) –
orcl-a-db.mydbdomain
- DB host IP in AZ2 (10.0.32.5 in this example) –
orcl-b-db.mydbdomain
The CNAME records configuration is as follows:
- orcl-a-db.mydbdomain –
orcl-rw.mydbdomain
- orcl-b-db.mydbdomain –
orcl-ro.mydbdomain
The following screenshot shows the Route 53 console view of the domain mydbdomain
.
TNS configuration
The following tnsnames.ora
file entries show how connections can be made to primary and standby databases using the CNAME records without a dependency on the actual IP address of the EC2 instances that host primary and standby databases. The entry orcl_a
always points to the instance on orcl-a-db.mydbdomain
, and orcl_b
always points to the instance on orcl-b-db.mydbdomain
, regardless of their roles. The entries orclrw
and orclro
direct the connection to the databases playing primary and standby roles, respectively.
To enable connectivity using orclrw
and orclro
TNS entries, you can use either a role-based service or a static listener registration entry in both the primary and standby listener, as shown in the following code:
Implement the solution
To implement an automated DNS update during an Oracle switchover or failover, we use an Oracle database trigger and a shell script. The following are the high-level steps for the entire workflow:
- Create a
DB_ROLE_CHANGE ON DATABASE
trigger on the primary database - The trigger in turn creates a DBMS job that calls a shell script with the
cname_switch.sh
. - The shell script updates the Route 53 CNAME entries.
Database trigger
Use the following code for the database trigger:
Shell script
This script determines the current CNAME, identifies the dependent A records, and maps the CNAME to the correct A records accordingly. This shell script is provided for reference assuming the naming conventions for db_name
and db_unique_name
as used in the sample configuration. You should review and modify the script to meet your specific requirements and organization standards.
As per the example shown earlier, the shell script is placed in the location /home/oracle/admin/bin/cname_switch.sh
.
Note: it’s common to see production databases that are restored or cloned to lower environments.
If the script is run in those environments, it can potentially change the CNAME entries unexpectedly. To mitigate this, the shell script has the function restore_safeguard
. This function checks that the IP assigned to the EC2 instance is actually matching with the A records configured for this database in Route 53. If no match is found, this will not perform CNAME failover.
Test the solution
The following screenshot shows the Route 53 console view of the domain mydbdomain
before the switchover. The primary database is running on orcl-a-db.mydomain
because orcl-rw.mydomain
is pointing to that.
The following SQL displays the current role of both primary and standby databases and host_name they are currently running on.
Let’s initiate the switchover:
Now that the switchover is complete, let’s connect to the database using the orclrw
and orclro
TNS entries using the following code:
The following screenshot shows the Route 53 console view of the domain mydbdomain after the switchover. The primary database is now running on orcl-b-db.mydomain
because orcl-rw.mydomain
is pointing to that.
Conclusion
Application connectivity to a Data Guard environment can be challenging, especially when the application configuration doesn’t support multiple hostnames or listener endpoints. In this post, we discussed step-by-step details to enable seamless connectivity to Data Guard environments using Route 53 CNAME records, a database trigger, and a shell script. You can use these artifacts to direct the DB connections to the database with the right role seamlessly without application changes. If you are using Data Guard Observer for automated failover, another blog, Setup a high availability design for Oracle Data Guard (Fast-Start Failover) using Amazon Route 53 discusses an alternate mechanism to achieve the same result.