AWS Database Blog

Automatic reconnection to Amazon RDS for Oracle read replicas during disaster recovery

One of the key pillars of the AWS Well-Architected Framework is reliability. According to the framework’s design principles, there must be provisions made to detect failures and be able to automatically recover. The recovery procedures deployed must be tested and validated frequently.

Amazon Relational Database Service (Amazon RDS) for Oracle offers ease of operability, maintenance, scalability, and reliability while migrating Oracle workloads to AWS. The read replica in Amazon RDS for Oracle is a physical copy of the primary database that can be set up either in mounted or read-only mode. In mounted mode, the replica instance isn’t open for read operations, however in read-only mode, it allows read operations on the replica instance. The applications connect to the primary database for any read/write operations. The read replica can be used for read-only operations if it’s set up in read-only mode. Applications connect to a read replica the same as they do to any database instance. These connections are set up using a tnsnames.ora connection alias.

During managed Oracle Data Guard Switchover with Oracle RDS for Oracle, the database role reversal takes place between the RDS for Oracle primary database and its replica. You can promote the read replica where the database role of the read replica is changed to PRIMARY, allowing application connections while the database role of the primary database is changed to PHYSICAL STANDBY.

Similarly, in case of a disaster, you can promote the read replica to a standalone instance. The database role of the read replica is changed to PRIMARY, and its instance status is OPEN, allowing new application connections.

In both scenarios, the application needs to point to the endpoint of the new primary database to establish connectivity. In this post, we provide guidance to configure the automatic reconnection of the application to a standby database during a disaster recovery incident while using replication with Oracle Data Guard using Oracle services and a TNS names connection alias.

The reason for using the automatic switchover mentioned in this post is that it can reduce the overall RTO by automatically redirecting the client connection to the newly promoted or switched-over database and saves the time and effort required to update the application tnsnames.ora file manually. If you use this disaster recovery setup for both the application and database across AWS Regions, then you need to modify the tnsnames.ora file in both Regions. For the solution in this post, use a client that connects to databases on both the primary and secondary Regions during the switchover and failover operation.

As a quick refresher, RTO stands for recovery time objective and is a measure of how quickly after an outage an application must be available again. RPO, or recovery point objective, refers to how much data loss your application can tolerate.

Solution overview

The following diagram depicts replication in Amazon RDS for Oracle using read replicas.

The solution assumes that a read replica is already configured in read-only mode for Amazon RDS for Oracle. In cases where there are both mounted and read replicas, the promoted replica has the same data as that of the primary database until the point the promotion request was made.

The following are some of the key concepts related to the Oracle Data Guard replication status:

  • The database role for the primary database is PRIMARY
  • The database role for the standby database or the RDS for Oracle read replica is PHYSICAL STANDBY
  • The status of the primary database instance is OPEN
  • The status of the standby database or the RDS for Oracle read replica is either MOUNTED or e
  • When the managed switchover takes place, the database role of the old primary transitions to PHYSICAL STANDBY, and the database role of the old standby transitions to PRIMARY
  • When the managed switchover takes place,
  • In case of a disaster, when the read replica is promoted to primary, its database role transitions to PRIMARY

The following are some of the high-level steps used in this solution:

  1. Create and start the database service that’s used by the application as part of its connection settings for the database.
  2. Connect as master user
  3. Create a database trigger to start the service based on the database role.
  4. Update the ora file to connect the application to the database using database service name.
  5. Perform the switchover or failover.

Prerequisites

You need the following prerequisites:

Implement the solution

To implement this solution, complete the following steps.

  1. Connect to the primary database using any Oracle client. For this post we have used SQL Developer.
  2. Run the following command to check the open_mode and database_role of the primary database instance:
    select name, status, open_mode, database_role, instance_name from v$database, v$instance;

  3. Create a database service in the primary RDS for Oracle database instance. In this example, we use a service named PRIM that runs only in the primary database instance. The output should indicate that the procedure was successful.
    BEGIN
    DBMS_SERVICE.create_service(
    service_name => 'PRIM',
    network_name => 'PRIM'
    );
    END;
    /

  1. Start the PRIM The output should indicate that the procedure was successful.
    BEGIN
    DBMS_SERVICE.start_service(
    service_name => 'PRIM'
    );
    END;
    /

  2. Check if the service has been started successfully. The output should include PRIM:
    SELECT name,
    network_name
    FROM   v$active_services
    ORDER BY 1;

  3. Grant the required privileges to the user creating the database trigger. This user is the primary user of the RDS for Oracle database instance. In this example, the user is ADMIN:
    BEGIN
    rdsadmin.rdsadmin_util.grant_sys_object(
    p_obj_name  => 'DBMS_SERVICE',
    p_grantee   => 'ADMIN',
    p_privilege => 'EXECUTE');
    END;
    /

  4. Create a trigger to start the database service whenever the database role changes to PRIMARY after a switchover:
    CREATE OR REPLACE TRIGGER start_primaryservice after startup on database
    DECLARE
    role VARCHAR(30);
    BEGIN
    SELECT DATABASE_ROLE INTO role FROM V$DATABASE;
    IF role = 'PRIMARY' THEN
    BEGIN
    DBMS_SERVICE.START_SERVICE('PRIM');
    END;
    END IF;
    END;
    /

  5. Create a TNS entry on the application to include the endpoints of both the primary and the read replica instances. Use the name of the service you created previously for the connection:
    appconn =
    (
    DESCRIPTION =
    (SDU=32768)
    (enable=broken)
    (LOAD_BALANCE=yes)
    (
    ADDRESS_LIST =
    (
    ADDRESS =
    (PROTOCOL = TCP)
    (Host = <endpoint of standby instance>)
    (Port =1521)
    )
    (ADDRESS =
    (PROTOCOL = TCP)
    (Host = <endpoint of primary instance>)
    (Port =1521)
    )
    )
    (
    CONNECT_DATA =
    (SERVICE_NAME =PRIM)
    )
    )
  6. Configure the application connection using the preceding TNS entry (appconn) of the database. Once you add the above entry to ora then the connect string for appconn will be automatically populated.
  7. Check the database role. It should be primary.

Test the solution with Managed Oracle Data Guard Switchover

Switchover is a role reversal between the primary database and one of its standby databases. During a switchover, the primary database transitions to a standby role, and the standby database transitions to the primary role. A switchover guarantees no data loss. The replication between the new primary and standby databases is maintained after the switchover.

When the switchover is performed, the existing read replica instance becomes the new primary. As soon as the replica becomes the new primary, the database trigger starts the PRIM service you created previously. The application will keep trying to connect to the new primary database until the PRIM service has started.

  1. Connect to the primary database.
  2. Run the following command to check the open_mode and database_role of the primary database instance:
    select name, status, open_mode, database_role, instance_name from v$database, v$instance;

  3. Check that the PRIM service isn’t running on this instance. The query result should not include the entry for PRIM service created earlier in step3 of section Implement the solution.
    SELECT name,
    network_name
    FROM   v$active_services
    ORDER BY 1;

  4. Go to Amazon RDS console
    • Choose the Databases in the navigation pane
    • Click on radio button next to your database
    • Choose Actions and select Switch over replica

  5. Choose Switch over replica to confirm.
  6. You will see a status message that the switchover is in progress.

    You can also initiate a switchover using the AWS Command Line Interface (AWS CLI). Run the command rds switchover-read-replica in the Region of the original replica as shown in the following example code.

    aws rds switchover-read-replica --db-instance-identifier mydb01 –region ap-south-1

    mydb01 is the instance ID of the original replica. After the switchover, this database becomes the new primary database. Issue the preceding command in the Region that hosts the current standby database.

  7. Connect to the database using the TNS entry: appconn

The connection is made to the instance that is currently primary because the PRIM service is started there. There is no manual change required in either the tnsnames.ora file or DNS for the application to point to the primary instance.

During the switchover, Amazon RDS interrupts replication for all databases in your Oracle Data Guard configuration. During this time, the original primary database can’t process write requests. For more information, see Stages of the Oracle Data Guard switchover.

Manual promotion of a read replica to primary

There are situations when a read replica needs to be promoted manually to primary. Using manual failover, you can convert a standby read only database to a primary database when the original primary database fails and there’s no possibility of recovering the primary database in a timely manner. There might be data loss depending upon whether your primary and target standby databases were synchronized at the time of the primary database failure.

To monitor and troubleshoot the replication lag:

  1. Run the following query to check the Data Guard lag metrics:
    SELECT name,value,datum_time,time_computed FROM v$dataguard_stats;
  2. Run the following query to check whether there is an archive log gap between the primary and read replica instances:
    SELECT archived.thread# THREAD,max(archived.sequence#) PRIMARY,max(applied.sequence#) STANDBY,(max(archived.sequence#) - max(applied.sequence#)) GAP 
    FROM v$archived_log archived,v$archived_log applied 
    WHERE archived.thread#=applied.thread# 
    AND applied.applied='YES' 
    GROUP BY archived.thread#;
  3. Run the following command to check the open_mode and database_role of the read replica
    select name, status, open_mode, database_role, instance_name from v$database, v$instance;

  4. Check that the PRIM service isn’t running on this instance. The query result should not include PRIM.
    SELECT name,
    network_name
    FROM   v$active_services
    ORDER BY 1;

  5. Promote the read replica to primary:
    You can promote a read replica using either the Amazon RDS console or the AWS CLI.

    Using Amazon RDS console:

    In the Amazon RDS console, choose Databases in the navigation pane. Choose Actions and select Promote.

    Using AWS CLI:

    aws rds promote-read-replica \
    --db-instance-identifier mydb02 –region ap-south-1\

    Mydb02 is the instance ID of the original replica. After the failover, this database becomes the new primary database.

  6. Connect to the database using the tns entry appconn.

The connection is made to the instance that is currently primary because the PRIM service is started there. There is no manual change required in either the tnsnames.ora file or DNS for the application to point to the primary instance.

For more information about managed disaster recovery, see Promoting a read replica to be a standalone DB instance.

Handling errors

If the trigger execution fails, then you can disable the trigger and start the service manually in the primary database. 

  1. Disable the trigger and start the service.
    ALTER TRIGGER start_primaryservice DISABLE;
    BEGIN
    DBMS_SERVICE.START_SERVICE('PRIM');
    END;
    /
  2. Verify that the PRIM service (created in the step3 of section Implement the solution) entry for PRIM.
    SELECT name,
    network_name
    FROM   v$active_services
    ORDER BY 1; 

Clean up

Complete the following steps to clean up your resources:

  1. Manually remove the entry for the standby database in the ora file.
  2. Drop the trigger you created:

    DROP TRIGGER start_primaryservice;
  3. Delete the service:
    begin
    DBMS_SERVICE.STOP_SERVICE(service_name=>' PRIM ');  -- if running currently
    DBMS_SERVICE.DELETE_SERVICE(service_name =>' PRIM ');
    end;
  4. Delete the AWS resources (Amazon RDS for Oracle, read replica, and bastion host if applicable) if they’re no longer needed.

Advantages of using Amazon RDS for Oracle for automatic switchover

Below are the advantages of using the automatic switchover for the Amazon RDS for Oracle replication.

  • Straightforward to implement because it’s based on built-in database functionality.
  • Minimizes the RTO and RPO because the switchover time is reduced by eliminating the need to reconfigure the application connection.
  • Cost-effective because no additional third-party utilities are needed.

Conclusion

In this post, we demonstrated the steps to configure automatic reconnection of an application to a standby database during a disaster recovery incident using Oracle Data Guard. After this configuration, the application will automatically redirect the connection to the standby database during switchover.

If your client doesn’t support the multiple entry in tnsnames.ora file and requires a single hostname or IP address, then you can automate the DNS failover using the steps in Enable transparent connectivity to Oracle Data Guard environments using Amazon Route 53 CNAME records.

Review the best practices for using Oracle Data Guard switchover.

If you have feedback or questions, leave them in the comments section.


About the Authors

Abhishek Kumar Verma is a Senior Database Consultant at Amazon Web Services ProServe. He has deep expertise in database migration and administration on Amazon RDS for Oracle, Amazon Aurora PostgreSQL and Amazon RDS SQL Server databases. He is also a subject matter expert in AWS DMS, Oracle GoldenGate, and Oracle Exadata. He works closely with customers to help migrate and modernize their databases and applications to AWS on a daily basis. When not working, Abhishek loves meditation and spending time with nature.

Puja Audhya is a Senior Lead Consultant with the Database Migration and Modernization team at Amazon Web Services. She helps AWS customers migrate the databases to AWS Cloud database services.