High availability design and solution for SAP NetWeaver installations with Oracle Data Guard (Fast-Start Failover)


Many SAP customers are still running their mission-critical SAP workloads on Oracle database with the different combinations of Operating systems (IBM AIX, HP-UX, Red Hat Enterprise Linux / SUSE Linux Enterprise Server) in their on-premises environment. The challenge in their cloud adoption journey is to migrate Oracle based workloads as-is using a “lift and shift” approach to get the immediate benefit of reduced TCO (Total Cost of Ownership) from running in the Cloud. Customers commonly select this approach as an interim step while deciding on their long term SAP Strategy for beyond 2027.

While preparing for the migration, one of the most asked question is, how do we make our Oracle database highly available in the AWS cloud? The answer is, there are multiple options available with Amazon Web Services (AWS) which are divided into Oracle Native and Third-Party Solutions.

  1. Oracle Data Guard
  2. SIOS Life Keeper
  3. Veritas Infoscale

On premises, the typical approach for making Oracle workloads highly available depends on a “shared storage solution” together with virtual IP for network failover, all under the orchestration of proprietary hardware vendor tools.

In this blog, we will dive into Oracle’s native database high availability solution Oracle Data Guard with fully automated primary to standby failover using Oracle Data Guard Fast Start Failover (FSFO) on AWS.

Note: This particular scenario has been tested on Oracle database release and Oracle Enterprise Linux OL7.9-x86_64. You can select the required combination of Operating system and database by referring to SAP Product Availability Matrix. SAP NetWeaver (7.0x to 7.5) products requiring Oracle Database 18c (min 18.5.0) or 19c (min 19.5.0) must run on Oracle Enterprise Linux (OEL) 6.4 or higher. This applies to both the Database and the Application Servers that require the Oracle Client as per SAP Note 105047 (a valid S-user able to connect to SAP ONE support Launchpad is required to read the mentioned SAP note).

Architecture Diagram

Architecture Diagram of Oracle Data Guard with FSFO

Brief description of Architecture components

  • Primary node – A Data Guard configuration contains one production database, also referred to as the primary database, that functions in the primary node role. This is the database that is accessed by all of your applications servers.
  • Standby node – A standby database is a transactionally consistent copy of the primary database. If the primary database becomes impaired then standby database will be promoted to become primary database.
  • Observer node – is a separate OCI (Oracle client Interface) client-side component that runs on a different server from the primary and standby databases and monitors the availability of the primary database.

In this blog we will focus on how to build the Oracle Data Guard (DG) with Fast Start Failover and Observer node, without covering the SAP Application layer part. The same underlying Oracle DG solution will work for all SAP Application scenarios, regardless if it’s a distributed or highly-available installation. We’ll also cover some of the cost benefits of using the Oracle Data Guard with the Linux environments.

For the application tier, we recommend you to follow the best practice of deploying the application servers across two or more Availability zones. For the Database (DB) tier, to ensure the high availability in the event of AZ failure, you should deploy the primary and standby nodes in two different Availability zones (AZ’s) and the Observer node in the 3rd AZ. Each Availability Zone is fully isolated, and connected through low latency network. If one DB instance fails, an instance in another Availability Zone after failover handle the requests of failed instance.

Oracle Data Guard solution enables customers to deploy a HA cluster across AWS Availability Zones (AZs) in a region for SAP NetWeaver based applications. Data Guard maintains these standby databases as transactionally consistent copies of the production database. Then, if the production database becomes unavailable because of a planned or an unplanned outage, Oracle Data Guard can switch the role of standby database to the primary role, minimizing the downtime associated with the outage.

Fast-Start Failover enables an automated failover to standby database, in case the primary database goes down by incident or network loss. An Observer process is used to monitor the network connectivity and availability of the databases. The observer is a separate OCI client-side component that runs on a different server from the primary and standby databases and monitors the availability of the primary database. It is recommended to place the observer node in another availability zone (not with the Primary and Standby Oracle node) in case of there is an outage for both availability zones.



Three nodes are required to configure Oracle Database HA solution with Fast Start Failover (FSFO)

  1. Two nodes will work as primary and standby nodes for the Oracle database
  2.  Third node within auto scaling group (minimum 1, maximum 1, desired capacity 1, across all AZ in region), will work as an Observer node (build a launch template AMI using the steps from “Observer node setup” section in this document) to maintain and centralize the creation, maintenance, and monitoring of Oracle Data Guard configurations for high availability cluster.

Recommended Sizing Recommended Sizing for Compute

* This is just an example. Instance size is to be adjusted to your Oracle DB size


Following are the recommendations on the Amazon Elastic Block Store (Amazon EBS) Oracle allocation for the on-premises to AWS and needs to be adjusted accordingly if you are looking for the Homogenous or Heterogeneous migrations of the existing SAP workload from on-premises to AWS.

To get the best performance from your database, you must configure the storage tier to provide the IOPS and throughput that the database needs. This is a requirement for Oracle Database running on Amazon Elastic Compute Cloud (Amazon EC2). If the storage system does not provide enough IOPS to support the database workload, you will have sluggish database performance and transaction backlog. However, if you provision much higher IOPS than your database actually needs, you will have unused capacity

Since, you are migrating the Oracle workload from on-premises to AWS and the best way to estimate the actual IOPS that is necessary for your database is to query the system tables over a period of time and find the peak IOPS usage of your existing database. To do this, you measure IOPS over a period of time and select the highest value. You can get this information is from the GV$SYSSTAT dynamic performance view, which is a special view in Oracle Database that provides database performance information.

General Purpose (SSD) volumes (GP3) are sufficient for most Oracle Database workloads. If you need more IOPS and throughput than GP3 can provide, Provisioned IOPS (PIOPS) is the best choice. PIOPS can provide up to 64,000 IOPS per volume for Nitro based instances and 32,000 IOPS per volume for other instance families. Using the logical volumes in the LVM with striping to achieve better EBS throughput and avoid the risk of disk corruption, and the redo log files to distribution across multiple disks to manage.

Follow the disk layout based on the Oracle recommendation, e.g. using logical volumes manager (LVM) with striping to achieve better EBS throughput and best performance at optimal cost. Oracle Database uses disk storage heavily for read/write operations, so we highly recommend that you use only instances optimized for Amazon Elastic Block Store (Amazon EBS). Amazon EBS-optimized instances deliver dedicated throughput between Amazon EC2 and Amazon EBS. Bandwidth and throughput to the storage subsystem is crucial for good database performance. Redo log files distribution across multiple disk will also help to manage the risk due to disk corruptions.

Recommended storage layout and sizingRecommended storage layout and sizing

* This is just an example. Instance size is to be adjusted to your Oracle DB size

Elastic File systems (EFS)

Mount additional two Amazon Elastic File Systems (Amazon EFS) on both the Primary and Standby database nodes – /sapmnt/<SID> and /usr/sap/trans/ .Size of these file system varies from Development system to Production system and totally depends on the system usage.

Oracle Data Guard Filesystem details

Solutions Implementation

Infrastructure Setup

We will use multiple AWS Availability Zones to configure high availability for the Oracle database nodes. AWS Availability Zones which are physically separated by a meaningful distance, many kilometers, from any other Availability Zones, although all are within 100 km (60 miles) of each other. Availability Zones have a low latency network connectivity and meets customers High availability requirement to support synchronous database replication.  A key requirement to setup the Oracle Data Guard is, both Primary and Standby database host must be identical to operate the Oracle Data Guard.  This means:

  1. Same Oracle Enterprise Linux patch level and kernel on both hosts
  2. Same parameter settings on database and operating system (e.g. nfiles)
  3. Firewall on Oracle Enterprise Linux 6.4 and above version is enabled by default and needs to be disabled on database and observer nodes for communication.
  4. Identical Oracle version, with the same database patch set levels are recommended on both the nodes.
  5. Identical file system structure, especially for SAP data and Oracle home.
  6. The databases must be operated in ARCHIVELOG mode
  7. Use of server parameter file (SPFILE)
  8. “SAP ONE Support Launchpad” access is required

Refer to SAP note 105047 on Oracle Data Guard for an SAP environment.

Database Installation on Primary and Standby nodes

Download the following software from the SAP Marketplace with the authorized credentials –

  1. Software Provisioning Manager 1.0 (SWPM) with latest patch level
  2. SAP Netweaver 7.5 DVDs (required version)
  3. ORACLE 12.1 64-BIT RDBMS
  4. ORACLE Client
  5. Latest SAP Kernel
  6. SAP Host Agent

The first step in setting up the SAP system, in a distributed, or High-availability (HA) scenario is to install ABAP central services instance (ASCS). Once that is done, proceed with the database installation. Once that is done, continue with the Primary Application Server instance (PAS) installation.

Complete the Oracle database installation Oracle 12.1 64-bit RDBMS on Linux on x86_64 64bit using SWPM on primary database node. On the standby node, you just need to install the Oracle binaries as rest of the database and redo logs files will be created during the Oracle Standby setup process.

For more details: Refer to SAP Note 1915301 – Database Software 12c Installation on UNIX.

Standby node setup and sync with Primary node

There are many ways of setting up the Standby node. The easiest and recommended way is to use Oracle Recovery Manager (RMAN). The standby database can be created from an OFFLINE or ONLINE backup of the production database. Amazon Simple Storage Service (Amazon S3) can be leveraged to take the database backup for the restoration purpose.

Setup the Oracle standby by setting the environment variables, adding standby logfiles and enable the flashback by following the instructions on Oracle Help Centre. Make sure that both the database nodes can communicate and validate the connectivity by tnsping command.

Oracle Data Guard Fast-Start Failover configuration (FSFO) –

At this point we have a primary database and a standby database, so now we need to configure Data Guard Broker to manage them. Data Guard command-line interface (DGMGRL) enables you to manage a Data Guard broker configuration. There are several options when it comes to setting up the protection mode with Oracle Data Guard. See below:

Different Mode types with Data Guard

To setup Fast-Start Failover, we will be using “Max Availability Mode” feature. The steps are:

  1. Configuring Data Guard Broker – Open primary database and mount standby database first. The DMON process (Data Guard Monitor) on both database nodes is set to active by setting the dg_broker_start to TRUE. Make sure that the listener on the standby database is started and that the database can be accessed with SQL*Net.
  2. Start the DG_BROKER PROCESS (on both databases nodes).

    SQL> ALTER SYSTEM SET dg_broker_start=true;
    System altered.

  3. On the primary DB server, issue the following command to login to dgmgrl
  4. dbprnode3:oraab3 44> dgmgrl sys/"Password"@AB3
    DGMGRL for Linux: Version - 64bit Production
    Copyright (c) 2000, 2013, Oracle. All rights reserved.
    Welcome to DGMGRL, type "help" for information.
    Connected as SYSDBA.

  5. On the primary server, issue the following command to register the primary server with the broker.
  6. DGMGRL> create configuration my_dg_config_1 AS PRIMARY DATABASE IS AB3 CONNECT IDENTIFIER IS AB3;
    Configuration "my_dg_config_1" created with primary database "ab3"

  7. Add the standby database (you can issue these commands from any of the node)
    Database "ab3_stby" added

  9. Now, we enable the new configuration upon registration of both nodes

  11. The following command show how to check the configuration –
    Configuration - my_dg_config_1
    Protection Mode: MaxPerformance
    ab3         - Primary database
    ab3_stby    - Physical standby database
    Fast-Start Failover: DISABLED
    Configuration Status:
    SUCCESS (status updated 23 seconds ago)

  13. By default, MaxPerformance protection mode will be enabled and needs to be altered later on to Max Availability. After the observer node setup, you can change this configuration to Max Availability and include as part of the observer node setup. Check the status of the Standby database from the broker (These commands can be issued from any node once the configuration is enabled)
  14. DGMGRL> show database AB3_STBY
    Database - ab3_stby
    Intended State: APPLY-ON
    Transport Lag: 0 seconds (computed 1 second ago)
    Apply Lag: 0 seconds (computed 1 second ago)
    Average Apply Rate: 1.00 KByte/s
    Real Time Query: OFF
    Instance(s): AB3
    Database Status:

  15. Check the status of the Primary database from the broker
  16. DGMGRL> show database AB3
    Database - ab3
    Role: PRIMARY
    Intended State: TRANSPORT-ON
    Instance(s): AB3
    Database Status:

Note: Database should only be mounted and not opened up anytime on the standby node. The listener must be up and running on both the nodes.

Ensure you can switch over the database from the Primary to Standby Node and vice versa manually to validate the configuration before enabling the FSFO and other parameter settings.

Reconnect SAP instance to database

There are different approaches possible to connect the SAP instance to the Oracle database after the failover/disaster. We recommend to adapt the database level role change by creating a database trigger with the database services startup. This will help to avoid any changes at the SAP profiles and SQL*Net profiles during the failover/disaster situation.

Create the database trigger and execute as per below commands –

SQL> !more /home/oraab3/cre_trig1.sql
create trigger my_sap_trig1 after startup on database
v_role varchar(30);
select database_role into v_role from v$database;
if v_role = 'PRIMARY' then
end if;

SQL> connect /as sysdba;

SQL> @/home/oraab3/cre_trig1.sql
Trigger created.

By default, SAP is configured to connect only to one host/port. With the updates below we have configured the tnsnames.ora file used by the SAP instances to look at both the Primary and Standby database nodes. The SAP instances will connect to database node where the service AB3-HA1 has been started – this service is only started on the database node where the database role is primary.
Update the tnsnames.ora on SAP PAS and any other AAS (Addtional Application Servers) nodes as per below configuration and validate the SAP connectivity –

ascsnode1:ab3adm 85> more /sapmnt/AB3/profile/oracle/tnsnames.ora
(ADDRESS = (PROTOCOL = TCP)(HOST = dbprnode3)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = dbprnode4)(PORT = 1521))

Note: Make sure the StaticConnectIdentifier parameter is correctly reflected in the listener.ora file on both the Primary and Standby database nodes as you will be using the different static descriptor property and can be setup and validated using the DGMGRL command.

DGMGRL> edit database 'AB3' set property

DGMGRL> edit database 'AB3_STBY' set property

DGMGRL> show database AB3_STBY StaticConnectIdentifier

The following SID_DESC entries are recommended for the database and observer in the listener.ora file on both the nodes and make sure you validate these entries after the restart of listener :

Data Guard FSFO Configuration table

Observer Node Setup

In this section, we will setup the observer node and validate the configuration to make sure that both the Primary and Standby nodes can be managed via the observer node as well.

  1. Setup the Observer node using the same Oracle Enterprise Linux image (OL7.9-x86_64-HVM-2020-12-07) and install the Oracle client (Oracle Client)
  2. Add the username “observer” on the observer node
  3. Adapt the sqlnet.ora , tnsnames.ora and environment variables for the observer node from the primary database node
  4. setenv PATH /oracle/AB3/121/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/sbin:/usr/local/sbin
    setenv DB_SID AB3
    setenv dbms_type ORA
    setenv dbs_ora_tnsname AB3
    setenv ORACLE_SID AB3
    setenv ORACLE_HOME /oracle/AB3/121
    setenv ORACLE_BASE /oracle/AB3
    setenv TNS_ADMIN $ORACLE_HOME/network/admin

  5. You should be able to ping both the primary and standby node via tnsping command
  6. [observer@dbprnobs1 ~]$ tnsping AB3
    TNS Ping Utility for Linux: Version - Production on 11-APR-2021 11:11:34
    Copyright (c) 1997, 2014, Oracle.  All rights reserved.
    Used parameter files:
    Used TNSNAMES adapter to resolve the alias
    Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP) (HOST = dbprnode3) (PORT = 1521))) (CONNECT_DATA = (SID = AB3) (GLOBAL_NAME = AB3.WORLD)))
    OK (0 msec)

    [observer@dbprnobs1 ~]$ tnsping AB3_STBY
    TNS Ping Utility for Linux: Version - Production on 11-APR-2021 11:11:29
    Copyright (c) 1997, 2014, Oracle.  All rights reserved.
    Used parameter files:
    Used TNSNAMES adapter to resolve the alias
    Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP) (HOST = dbprnode4) (PORT = 1521))) (CONNECT_DATA = (SID = AB3) (GLOBAL_NAME = AB3.WORLD)))
    OK (0 msec)

  7. Change the protection mode – Check the protection mode first:
    -------------------- --------------------
    Property "logxptmode" updated
    Property "logxptmode" updated

    Then change the protection mode of configuration to Max Availability Protection.

    Then check if the protection level has been changed.
    -------------------- --------------------

  9. Data Guard properties: Set the Delay for applying the redo to 2 mins (you can adjust accordingly as per your database requirements) to apply the logs to the standby node:

  11. Configure Fast-Start Failover: Specify the target standby database with the FastStartFailoverTarget property and standby with primary as FSFO target.
  12. DGMGRL> EDIT DATABASE ab3 SET PROPERTY FastStartFailoverTarget = 'ab3_stby';
    Property "faststartfailovertarget" updated
    DGMGRL> EDIT DATABASE ab3_stby SET PROPERTY FastStartFailoverTarget = 'ab3';
    Property "faststartfailovertarget" updated

  13. Set the FastStartFailoverThreshold property: This property manages the time for failover. The default is set to 30 seconds, setting it to a different value overwrites the default and will give DBA a longer period to possible stop the countdown.
  14. DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold = 600;
    Property "faststartfailoverthreshold" updated

  15. Start DGMGRL and Observer. After all the property and setup , you can issue these commands either from Primary node or Standby node.
  16. [observer@dbprnobs1 ~]$ dgmgrl sys/PASSWORD@AB3
    DGMGRL for Linux: Version - 64bit ProductionCopyright (c) 2000, 2013, Oracle. All rights reserved.Welcome to DGMGRL, type "help" for information.
    Connected as SYSDBA.
    DGMGRL>DGMGRL> connect sys@AB3
    Connected as SYSDBA.
    Observer started

  17. The following commands are recommended to start (in the background) and stop the observer processes in the background:
  18. [observer@dbprnobs1 ~]$ dgmgrl -logfile observer.log sys/Password@AB3 "start observer" & 5666[observer@dbprnobs1 ~]$ dgmgrl -logfile observer.log sys/Password@AB3 "stop observer"

  19. Enable Fast-Start Failover: Make sure flashback is ON on both the primary and standby node before enabling the Fast-Start failover.
    Property "faststartfailoverthreshold" updated
    Enabled.DGMGRL> show fast_start failover
    Fast-Start Failover: ENABLED
    Threshold: 3600 seconds
    Target: ab3_stby
    Observer: dbprnobs1
    Lag Limit: 30 seconds (not in use)
    Shutdown Primary: TRUE
    Auto-reinstate: TRUE
    Observer Reconnect: (none)
    Observer Override: FALSEConfigurable Failover Conditions
    Health Conditions:
    Corrupted Controlfile YES
    Corrupted Dictionary YES
    Inaccessible Logfile NO
    Stuck Archiver NO
    Datafile Offline YESOracle Error Conditions:

Solution Testing

Operational readiness is the key aspect before the Go-live of your mission critical SAP application. As a solution architect, you need to ensure the application meets the RTO and RPO requirements before the Go-live.

Our recommended approach is to validate the following scenario by failing over the primary node to standby database node along with SAP application level testing.

The Data Guard monitor process (DMON) is an Oracle background process that runs for every database instance that is managed by the broker. When you start the Data Guard broker, a DMON process is created and responsible for monitoring the health of the broker configuration and for ensuring that database has a consistent description of the configuration and

In case that the Primary database node goes down due to disk-failure or other network outage, primary database will not be available for querying and the SAP Work process stay in ‘reconnect’ mode waiting for the database to come up again on the standby node. Once the database is moved on the standby node, the SAP work processes are able to re-connect to the standby database and resume normal operation. This proves the Observer is able to move the database between the nodes without any manual intervention.

It is also recommended to manually test the failover first. You can use the following commands to switchover back and forth to validate the failover scenario.

Data Guard Failover Scenario

DGMGRL> switchover to ab3_stby;
Performing switchover NOW, please wait...
Operation requires a connection to instance "AB3" on database "ab3_stby"
Connecting to instance "AB3"...
Connected as SYSDBA.
New primary database "ab3_stby" is opening...
Operation requires start up of instance "AB3" on database "ab3"
Starting instance "AB3"...
ORACLE instance started.
Database mounted.
Switchover succeeded, new primary is "ab3_stby"

Pros & Cons on the different options available for migration to Oracle Enterprise Linux on AWS

Key Considerations Oracle Data Guard 3rd Party Solutions (e.g. SIOS, Veritas)
Cost Part of Oracle Enterprise License Additional Software License cost involved
Implementation Efforts Medium to High Medium
Technical constraints Source must be Oracle Database 10g or higher. Destination has to be on the 11.4 or a higher

See SIOS Protection for Linux Support Matrix

See Veritas Infoscale Release Notes – Linux

Special features Oracle Supported SAP Certified solution and supported by 3rd parties (e.g. SIOS and Veritas)
Skills required Infrastructure specialist, Oracle DBA Infrastructure specialist, SAP Basis


In this blog, we have shown how to enable the Fast-Start Failover option for the Oracle Database native HA solution on AWS by using Oracle Data Guard with observer node for SAP application. Please note, if you have an Oracle runtime license acquired from SAP you can setup and use Oracle Data Guard with Fast-Start Failover option, however SAP will not provide support for the Fast-Start Failover component. If Oracle support is also required for the Fast-Start Failover component then an Oracle support contract is required directly from Oracle.

To learn more about why 5000+ active customers run SAP on AWS, visit