AWS for SAP
High availability design and solution for SAP NetWeaver installations with Oracle Data Guard (Fast-Start Failover)
Introduction
Many SAP customers are still running their mission-critical SAP workloads on Oracle database with the different combinations of Operating systems (IBM AIX, HP-UX, Red Hat Enterprise Linux / SUSE Linux Enterprise Server) in their on-premises environment. The challenge in their cloud adoption journey is to migrate Oracle based workloads as-is using a “lift and shift” approach to get the immediate benefit of reduced TCO (Total Cost of Ownership) from running in the Cloud. Customers commonly select this approach as an interim step while deciding on their long term SAP Strategy for beyond 2027.
While preparing for the migration, one of the most asked question is, how do we make our Oracle database highly available in the AWS cloud? The answer is, there are multiple options available with Amazon Web Services (AWS) which are divided into Oracle Native and Third-Party Solutions.
On premises, the typical approach for making Oracle workloads highly available depends on a “shared storage solution” together with virtual IP for network failover, all under the orchestration of proprietary hardware vendor tools.
In this blog, we will dive into Oracle’s native database high availability solution Oracle Data Guard with fully automated primary to standby failover using Oracle Data Guard Fast Start Failover (FSFO) on AWS.
Note: This particular scenario has been tested on Oracle database release 12.1.0.2.0 and Oracle Enterprise Linux OL7.9-x86_64. You can select the required combination of Operating system and database by referring to SAP Product Availability Matrix. SAP NetWeaver (7.0x to 7.5) products requiring Oracle Database 18c (min 18.5.0) or 19c (min 19.5.0) must run on Oracle Enterprise Linux (OEL) 6.4 or higher. This applies to both the Database and the Application Servers that require the Oracle Client as per SAP Note 105047 (a valid S-user able to connect to SAP ONE support Launchpad is required to read the mentioned SAP note).
Architecture Diagram
Brief description of Architecture components
- Primary node – A Data Guard configuration contains one production database, also referred to as the primary database, that functions in the primary node role. This is the database that is accessed by all of your applications servers.
- Standby node – A standby database is a transactionally consistent copy of the primary database. If the primary database becomes impaired then standby database will be promoted to become primary database.
- Observer node – is a separate OCI (Oracle client Interface) client-side component that runs on a different server from the primary and standby databases and monitors the availability of the primary database.
In this blog we will focus on how to build the Oracle Data Guard (DG) with Fast Start Failover and Observer node, without covering the SAP Application layer part. The same underlying Oracle DG solution will work for all SAP Application scenarios, regardless if it’s a distributed or highly-available installation. We’ll also cover some of the cost benefits of using the Oracle Data Guard with the Linux environments.
For the application tier, we recommend you to follow the best practice of deploying the application servers across two or more Availability zones. For the Database (DB) tier, to ensure the high availability in the event of AZ failure, you should deploy the primary and standby nodes in two different Availability zones (AZ’s) and the Observer node in the 3rd AZ. Each Availability Zone is fully isolated, and connected through low latency network. If one DB instance fails, an instance in another Availability Zone after failover handle the requests of failed instance.
Oracle Data Guard solution enables customers to deploy a HA cluster across AWS Availability Zones (AZs) in a region for SAP NetWeaver based applications. Data Guard maintains these standby databases as transactionally consistent copies of the production database. Then, if the production database becomes unavailable because of a planned or an unplanned outage, Oracle Data Guard can switch the role of standby database to the primary role, minimizing the downtime associated with the outage.
Fast-Start Failover enables an automated failover to standby database, in case the primary database goes down by incident or network loss. An Observer process is used to monitor the network connectivity and availability of the databases. The observer is a separate OCI client-side component that runs on a different server from the primary and standby databases and monitors the availability of the primary database. It is recommended to place the observer node in another availability zone (not with the Primary and Standby Oracle node) in case of there is an outage for both availability zones.
Prerequisites
Compute
Three nodes are required to configure Oracle Database HA solution with Fast Start Failover (FSFO)
- Two nodes will work as primary and standby nodes for the Oracle database
- Third node within auto scaling group (minimum 1, maximum 1, desired capacity 1, across all AZ in region), will work as an Observer node (build a launch template AMI using the steps from “Observer node setup” section in this document) to maintain and centralize the creation, maintenance, and monitoring of Oracle Data Guard configurations for high availability cluster.
Recommended Sizing
* This is just an example. Instance size is to be adjusted to your Oracle DB size
Storage
Following are the recommendations on the Amazon Elastic Block Store (Amazon EBS) Oracle allocation for the on-premises to AWS and needs to be adjusted accordingly if you are looking for the Homogenous or Heterogeneous migrations of the existing SAP workload from on-premises to AWS.
To get the best performance from your database, you must configure the storage tier to provide the IOPS and throughput that the database needs. This is a requirement for Oracle Database running on Amazon Elastic Compute Cloud (Amazon EC2). If the storage system does not provide enough IOPS to support the database workload, you will have sluggish database performance and transaction backlog. However, if you provision much higher IOPS than your database actually needs, you will have unused capacity
Since, you are migrating the Oracle workload from on-premises to AWS and the best way to estimate the actual IOPS that is necessary for your database is to query the system tables over a period of time and find the peak IOPS usage of your existing database. To do this, you measure IOPS over a period of time and select the highest value. You can get this information is from the GV$SYSSTAT dynamic performance view, which is a special view in Oracle Database that provides database performance information.
General Purpose (SSD) volumes (GP3) are sufficient for most Oracle Database workloads. If you need more IOPS and throughput than GP3 can provide, Provisioned IOPS (PIOPS) is the best choice. PIOPS can provide up to 64,000 IOPS per volume for Nitro based instances and 32,000 IOPS per volume for other instance families. Using the logical volumes in the LVM with striping to achieve better EBS throughput and avoid the risk of disk corruption, and the redo log files to distribution across multiple disks to manage.
Follow the disk layout based on the Oracle recommendation, e.g. using logical volumes manager (LVM) with striping to achieve better EBS throughput and best performance at optimal cost. Oracle Database uses disk storage heavily for read/write operations, so we highly recommend that you use only instances optimized for Amazon Elastic Block Store (Amazon EBS). Amazon EBS-optimized instances deliver dedicated throughput between Amazon EC2 and Amazon EBS. Bandwidth and throughput to the storage subsystem is crucial for good database performance. Redo log files distribution across multiple disk will also help to manage the risk due to disk corruptions.
Recommended storage layout and sizing
* This is just an example. Instance size is to be adjusted to your Oracle DB size
Elastic File systems (EFS)
Mount additional two Amazon Elastic File Systems (Amazon EFS) on both the Primary and Standby database nodes – /sapmnt/<SID> and /usr/sap/trans/ .Size of these file system varies from Development system to Production system and totally depends on the system usage.
Solutions Implementation
Infrastructure Setup
We will use multiple AWS Availability Zones to configure high availability for the Oracle database nodes. AWS Availability Zones which are physically separated by a meaningful distance, many kilometers, from any other Availability Zones, although all are within 100 km (60 miles) of each other. Availability Zones have a low latency network connectivity and meets customers High availability requirement to support synchronous database replication. A key requirement to setup the Oracle Data Guard is, both Primary and Standby database host must be identical to operate the Oracle Data Guard. This means:
- Same Oracle Enterprise Linux patch level and kernel on both hosts
- Same parameter settings on database and operating system (e.g. nfiles)
- Firewall on Oracle Enterprise Linux 6.4 and above version is enabled by default and needs to be disabled on database and observer nodes for communication.
- Identical Oracle version, with the same database patch set levels are recommended on both the nodes.
- Identical file system structure, especially for SAP data and Oracle home.
- The databases must be operated in ARCHIVELOG mode
- Use of server parameter file (SPFILE)
- “SAP ONE Support Launchpad” access is required
Refer to SAP note 105047 on Oracle Data Guard for an SAP environment.
Database Installation on Primary and Standby nodes
Download the following software from the SAP Marketplace with the authorized credentials –
- Software Provisioning Manager 1.0 (SWPM) with latest patch level
- SAP Netweaver 7.5 DVDs (required version)
- ORACLE 12.1 64-BIT RDBMS
- ORACLE 12.1.0.2 Client
- Latest SAP Kernel
- SAP Host Agent
- SAPCAR
The first step in setting up the SAP system, in a distributed, or High-availability (HA) scenario is to install ABAP central services instance (ASCS). Once that is done, proceed with the database installation. Once that is done, continue with the Primary Application Server instance (PAS) installation.
Complete the Oracle database installation Oracle 12.1 64-bit RDBMS on Linux on x86_64 64bit using SWPM on primary database node. On the standby node, you just need to install the Oracle binaries as rest of the database and redo logs files will be created during the Oracle Standby setup process.
For more details: Refer to SAP Note 1915301 – Database Software 12c Installation on UNIX.
Standby node setup and sync with Primary node
There are many ways of setting up the Standby node. The easiest and recommended way is to use Oracle Recovery Manager (RMAN). The standby database can be created from an OFFLINE or ONLINE backup of the production database. Amazon Simple Storage Service (Amazon S3) can be leveraged to take the database backup for the restoration purpose.
Setup the Oracle standby by setting the environment variables, adding standby logfiles and enable the flashback by following the instructions on Oracle Help Centre. Make sure that both the database nodes can communicate and validate the connectivity by tnsping command.
Oracle Data Guard Fast-Start Failover configuration (FSFO) –
At this point we have a primary database and a standby database, so now we need to configure Data Guard Broker to manage them. Data Guard command-line interface (DGMGRL) enables you to manage a Data Guard broker configuration. There are several options when it comes to setting up the protection mode with Oracle Data Guard. See below:
To setup Fast-Start Failover, we will be using “Max Availability Mode” feature. The steps are:
- Configuring Data Guard Broker – Open primary database and mount standby database first. The DMON process (Data Guard Monitor) on both database nodes is set to active by setting the dg_broker_start to TRUE. Make sure that the listener on the standby database is started and that the database can be accessed with SQL*Net.
- On the primary DB server, issue the following command to login to dgmgrl
- On the primary server, issue the following command to register the primary server with the broker.
- Add the standby database (you can issue these commands from any of the node)
- Now, we enable the new configuration upon registration of both nodes
- The following command show how to check the configuration –
- By default, MaxPerformance protection mode will be enabled and needs to be altered later on to Max Availability. After the observer node setup, you can change this configuration to Max Availability and include as part of the observer node setup. Check the status of the Standby database from the broker (These commands can be issued from any node once the configuration is enabled)
- Check the status of the Primary database from the broker
Start the DG_BROKER PROCESS (on both databases nodes).
SQL> ALTER SYSTEM SET dg_broker_start=true;
System altered.
dbprnode3:oraab3 44> dgmgrl sys/"Password"@AB3
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production
Copyright (c) 2000, 2013, Oracle. All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected as SYSDBA.
DGMGRL> create configuration my_dg_config_1 AS PRIMARY DATABASE IS AB3 CONNECT IDENTIFIER IS AB3;
Configuration "my_dg_config_1" created with primary database "ab3"
DGMGRL>
DGMGRL> ADD DATABASE AB3_STBY AS CONNECT IDENTIFIER IS AB3_STBY MAINTAINED AS PHYSICAL;
Database "ab3_stby" added
DGMGRL>
DGMGRL> ENABLE CONFIGURATION;
Enabled.
DGMGRL>
DGMGRL> SHOW CONFIGURATION;
Configuration - my_dg_config_1
Protection Mode: MaxPerformance
Members:
ab3 - Primary database
ab3_stby - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS (status updated 23 seconds ago)
DGMGRL> show database AB3_STBY
Database - ab3_stby
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 1 second ago)
Apply Lag: 0 seconds (computed 1 second ago)
Average Apply Rate: 1.00 KByte/s
Real Time Query: OFF
Instance(s): AB3
Database Status:
SUCCESS
DGMGRL> show database AB3
Database - ab3
Role: PRIMARY
Intended State: TRANSPORT-ON
Instance(s): AB3
Database Status:
SUCCESS
Note: Database should only be mounted and not opened up anytime on the standby node. The listener must be up and running on both the nodes.
Ensure you can switch over the database from the Primary to Standby Node and vice versa manually to validate the configuration before enabling the FSFO and other parameter settings.
Reconnect SAP instance to database
There are different approaches possible to connect the SAP instance to the Oracle database after the failover/disaster. We recommend to adapt the database level role change by creating a database trigger with the database services startup. This will help to avoid any changes at the SAP profiles and SQL*Net profiles during the failover/disaster situation.
Create the database trigger and execute as per below commands –
SQL> !more /home/oraab3/cre_trig1.sql
create trigger my_sap_trig1 after startup on database
declare
v_role varchar(30);
begin
select database_role into v_role from v$database;
if v_role = 'PRIMARY' then
DBMS_SERVICE.START_SERVICE('AB3-HA1');
else
DBMS_SERVICE.STOP_SERVICE('AB3-HA1');
end if;
end;
/
SQL> connect /as sysdba;
Connected.
SQL> @/home/oraab3/cre_trig1.sql
Trigger created.
By default, SAP is configured to connect only to one host/port. With the updates below we have configured the tnsnames.ora file used by the SAP instances to look at both the Primary and Standby database nodes. The SAP instances will connect to database node where the service AB3-HA1 has been started – this service is only started on the database node where the database role is primary.
Update the tnsnames.ora on SAP PAS and any other AAS (Addtional Application Servers) nodes as per below configuration and validate the SAP connectivity –
ascsnode1:ab3adm 85> more /sapmnt/AB3/profile/oracle/tnsnames.ora
AB3.WORLD =
(DESCRIPTION_LIST =
(LOAD_BALANCE=OFF)
(FAILOVER=ON)
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = dbprnode3)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = AB3-HA1)
(SERVER = DEDICATED)
)
)
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = dbprnode4)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = AB3-HA1)
(SERVER = DEDICATED)
)
)
)
Note: Make sure the StaticConnectIdentifier parameter is correctly reflected in the listener.ora file on both the Primary and Standby database nodes as you will be using the different static descriptor property and can be setup and validated using the DGMGRL command.
DGMGRL> edit database 'AB3' set property
staticconnectidentifier='(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=dbprnode3)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=AB3_DGMGRL)(INSTANCE_NAME=AB3)(SERVER=DEDICATED)))';
DGMGRL> edit database 'AB3_STBY' set property
staticconnectidentifier='(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=dbprnode4)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=AB3_STBY_DGMGRL)(INSTANCE_NAME=AB3)(SERVER=DEDICATED)))';
DGMGRL> show database AB3_STBY StaticConnectIdentifier
StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=dbprnode4)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=AB3_STBY_DGMGRL)(INSTANCE_NAME=AB3)(SERVER=DEDICATED)))'
The following SID_DESC entries are recommended for the database and observer in the listener.ora file on both the nodes and make sure you validate these entries after the restart of listener :
Observer Node Setup
In this section, we will setup the observer node and validate the configuration to make sure that both the Primary and Standby nodes can be managed via the observer node as well.
- Setup the Observer node using the same Oracle Enterprise Linux image (OL7.9-x86_64-HVM-2020-12-07) and install the Oracle client (Oracle 12.1.0.2 Client)
- Add the username “observer” on the observer node
- Adapt the sqlnet.ora , tnsnames.ora and environment variables for the observer node from the primary database node
- You should be able to ping both the primary and standby node via tnsping command
- Change the protection mode – Check the protection mode first:
- Data Guard properties: Set the Delay for applying the redo to 2 mins (you can adjust accordingly as per your database requirements) to apply the logs to the standby node:
- Configure Fast-Start Failover: Specify the target standby database with the FastStartFailoverTarget property and standby with primary as FSFO target.
- Set the FastStartFailoverThreshold property: This property manages the time for failover. The default is set to 30 seconds, setting it to a different value overwrites the default and will give DBA a longer period to possible stop the countdown.
- Start DGMGRL and Observer. After all the property and setup , you can issue these commands either from Primary node or Standby node.
- The following commands are recommended to start (in the background) and stop the observer processes in the background:
- Enable Fast-Start Failover: Make sure flashback is ON on both the primary and standby node before enabling the Fast-Start failover.
setenv PATH /oracle/AB3/121/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/sbin:/usr/local/sbin
setenv DB_SID AB3
setenv dbms_type ORA
setenv dbs_ora_tnsname AB3
setenv ORACLE_SID AB3
setenv ORACLE_HOME /oracle/AB3/121
setenv ORACLE_BASE /oracle/AB3
setenv NLS_LANG AMERICAN_AMERICA.UTF8
setenv LD_LIBRARY_PATH $ORACLE_HOME/lib
setenv TNS_ADMIN $ORACLE_HOME/network/admin
[observer@dbprnobs1 ~]$ tnsping AB3
TNS Ping Utility for Linux: Version 12.1.0.2.0 - Production on 11-APR-2021 11:11:34
Copyright (c) 1997, 2014, Oracle. All rights reserved.
Used parameter files:
/oracle/AB3/121/network/admin/sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP) (HOST = dbprnode3) (PORT = 1521))) (CONNECT_DATA = (SID = AB3) (GLOBAL_NAME = AB3.WORLD)))
OK (0 msec)
[observer@dbprnobs1 ~]$ tnsping AB3_STBY
TNS Ping Utility for Linux: Version 12.1.0.2.0 - Production on 11-APR-2021 11:11:29
Copyright (c) 1997, 2014, Oracle. All rights reserved.
Used parameter files:
/oracle/AB3/121/network/admin/sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP) (HOST = dbprnode4) (PORT = 1521))) (CONNECT_DATA = (SID = AB3) (GLOBAL_NAME = AB3.WORLD)))
OK (0 msec)
SQL> SELECT PROTECTION_MODE, PROTECTION_LEVEL FROM V$DATABASE;
PROTECTION_MODE PROTECTION_LEVEL
-------------------- --------------------
MAXIMUM PERFORMANCE MAXIMUM PERFORMANCE
DGMGRL> EDIT DATABASE ab3 SET PROPERTY LogXptMode='SYNC';
Property "logxptmode" updated
DGMGRL> EDIT DATABASE ab3_stby SET PROPERTY LogXptMode='SYNC';
Property "logxptmode" updated
Then change the protection mode of configuration to Max Availability Protection.
Then check if the protection level has been changed.
SQL> SELECT PROTECTION_MODE, PROTECTION_LEVEL FROM V$DATABASE;
PROTECTION_MODE PROTECTION_LEVEL
-------------------- --------------------
MAXIMUM AVAILABILITY MAXIMUM AVAILABILITY
DGMGRL> EDIT DATABASE AB3 SET PROPERTY DelayMins='2';
DGMGRL> EDIT DATABASE AB3_STBY SET PROPERTY DelayMins='2';
DGMGRL> EDIT DATABASE ab3 SET PROPERTY FastStartFailoverTarget = 'ab3_stby';
Property "faststartfailovertarget" updated
DGMGRL> EDIT DATABASE ab3_stby SET PROPERTY FastStartFailoverTarget = 'ab3';
Property "faststartfailovertarget" updated
DGMGRL>
DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold = 600;
Property "faststartfailoverthreshold" updated
[observer@dbprnobs1 ~]$ dgmgrl sys/PASSWORD@AB3
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production
Copyright (c) 2000, 2013, Oracle. All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected as SYSDBA.
DGMGRL>
DGMGRL> connect sys@AB3
Password:
Connected as SYSDBA.
DGMGRL> START OBSERVER;
Observer started
[observer@dbprnobs1 ~]$ dgmgrl -logfile observer.log sys/Password@AB3 "start observer" & 5666
[observer@dbprnobs1 ~]$ dgmgrl -logfile observer.log sys/Password@AB3 "stop observer"
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MAXAVAILABILITY;
Succeeded.
DGMGRL>
DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold = 3600;
Property "faststartfailoverthreshold" updated
DGMGRL>
DGMGRL> ENABLE FAST_START FAILOVER;
Enabled.
DGMGRL> show fast_start failover
Fast-Start Failover: ENABLED
Threshold: 3600 seconds
Target: ab3_stby
Observer: dbprnobs1
Lag Limit: 30 seconds (not in use)
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Observer Reconnect: (none)
Observer Override: FALSE
Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Offline YES
Oracle Error Conditions:
(none)
Solution Testing
Operational readiness is the key aspect before the Go-live of your mission critical SAP application. As a solution architect, you need to ensure the application meets the RTO and RPO requirements before the Go-live.
Our recommended approach is to validate the following scenario by failing over the primary node to standby database node along with SAP application level testing.
The Data Guard monitor process (DMON) is an Oracle background process that runs for every database instance that is managed by the broker. When you start the Data Guard broker, a DMON process is created and responsible for monitoring the health of the broker configuration and for ensuring that database has a consistent description of the configuration and
In case that the Primary database node goes down due to disk-failure or other network outage, primary database will not be available for querying and the SAP Work process stay in ‘reconnect’ mode waiting for the database to come up again on the standby node. Once the database is moved on the standby node, the SAP work processes are able to re-connect to the standby database and resume normal operation. This proves the Observer is able to move the database between the nodes without any manual intervention.
It is also recommended to manually test the failover first. You can use the following commands to switchover back and forth to validate the failover scenario.
DGMGRL> switchover to ab3_stby;
Performing switchover NOW, please wait...
Operation requires a connection to instance "AB3" on database "ab3_stby"
Connecting to instance "AB3"...
Connected as SYSDBA.
New primary database "ab3_stby" is opening...
Operation requires start up of instance "AB3" on database "ab3"
Starting instance "AB3"...
ORACLE instance started.
Database mounted.
Switchover succeeded, new primary is "ab3_stby"
DGMGRL>
Pros & Cons on the different options available for migration to Oracle Enterprise Linux on AWS
Key Considerations | Oracle Data Guard | 3rd Party Solutions (e.g. SIOS, Veritas) |
Cost | Part of Oracle Enterprise License | Additional Software License cost involved |
Implementation Efforts | Medium to High | Medium |
Technical constraints | Source must be Oracle Database 10g or higher. Destination has to be on the 11.4 or a higher | |
Special features | Oracle Supported | SAP Certified solution and supported by 3rd parties (e.g. SIOS and Veritas) |
Skills required | Infrastructure specialist, Oracle DBA | Infrastructure specialist, SAP Basis |
Conclusion
In this blog, we have shown how to enable the Fast-Start Failover option for the Oracle Database native HA solution on AWS by using Oracle Data Guard with observer node for SAP application. Please note, if you have an Oracle runtime license acquired from SAP you can setup and use Oracle Data Guard with Fast-Start Failover option, however SAP will not provide support for the Fast-Start Failover component. If Oracle support is also required for the Fast-Start Failover component then an Oracle support contract is required directly from Oracle.
To learn more about why 5000+ active customers run SAP on AWS, visit aws.amazon.com/sap