Deploying highly available SAP systems using SIOS Protection Suite on AWS

This post is by Santosh Choudhary, Senior Solution Architect at Amazon Web Services (AWS).

AWS provides services and infrastructure to build reliable, fault-tolerant, and highly available systems in the cloud. Due to the business-critical importance of SAP Systems, high availability is essential to the business.

High-availability for SAP applications can be achieved in many ways on AWS, depending on the operating system and database that you use. For example, SUSE High Availability Extensions (SUSE HAE), Red Hat Enterprise Linux for SAP with High Availability and Update Services (RHEL for SAP with HA and US), Veritas InfoScale Enterprise for AWS, SIOS Protection Suite, etc.

In this post, we will see how to deploy SAP on AWS in a highly available manner in Windows and Linux environments using SIOS Protection Suite. We’ll also cover some of the differences in SIOS setup in Windows and Linux environments.

SIOS Protection Suite software is a clustering solution that provides a tightly integrated combination of high availability failover clustering, continuous application monitoring, data replication, and configurable recovery policies to protect business-critical applications and data from downtime and disasters.

To start with, AWS recommends deploying the workload in more than one Availability Zone. Each Availability Zone is isolated, but the Availability Zones in an AWS Region are connected through low-latency links. If one instance fails, an instance in another Availability Zone can handle requests.

diagram with three availability zones in a region

Now, let’s explore the architectural layers within an SAP NetWeaver system, single points of failure (SPOFs) within that architecture, and the ways to make these components highly available using SIOS Protection Suite.

Understanding SAP NetWeaver architecture

The SAP NetWeaver stack primarily consists of a set of ABAP SAP Central Services (ASCS) servers, a primary application server (PAS), one or more additional application servers (AAS), and the databases.

ASCS consists of Message Server and Enqueue Server. Message Server acts as a communication channel between the application servers and provides load balancing between the application servers. Enqueue Server stores the database table locks and forms the critical component of ASCS to ensure database consistency.

In an SAP architecture, ASCS and databases are the SPOFs and in a highly available scenarios they need to be made highly available and fault tolerant.

To achieve high availability, ASCS instances are deployed in a clustered environment like Windows Server Failover Clustering (WSFC) or Linux clusters. One of the requirements of a clustered environment is a shared file system. On the AWS Cloud, SIOS Data Keeper can be used to replicate the common file share across the Availability Zones.

Setup for a Windows environment

The SIOS DataKeeper part of SIOS Protection Suite is an SAP certified, optimized, and host-based replication solution that performs block-level replication across the Availability Zones to configure and manage high-availability to imitate a Server Message Block (SMB) file share.

It is used to make a /<sapmnt> highly available file system by replicating the content in synchronous mode. It can also be used to make /usr/sap/trans a shared file system.

Using SIOS DataKeeper Cluster Edition, you can achieve high availability protection for critical SAP components, including the ASCS instance, back-end databases (Oracle, DB2, MaxDB, MySQL, and PostgreSQL), and the SAP Central Services instance (SCS) by synchronously replicating data at the block level. In a Windows environment, the DataKeeper Cluster integrates seamlessly with Windows Server Failover Clustering (WSFC). WSFC features, such as cross-subnet failover and tunable heartbeat parameters, make it possible for administrators to deploy geographically dispersed clusters.

The setup consists of Windows Failover Cluster Manager with both ASCS nodes (e.g., ASCS-A and ASCS-B as shown in the following screenshot) and a file server that acts as witness in the cluster. We recommend deploying the file server in a separate, third, Availability Zone.

failover cluster manager nodes

At any point in time, the cluster is pointing to one active node.

failover cluster manager

The following diagram shows the architecture of a highly available SAP system on AWS.

high availability s a p architecture diagram

Customers can either choose to do database replication using database-specific methods (like SQL Always On availability groups) or block-level replication using SIOS for both the database and the ASCS instance.

The following diagram shows the high-level architecture of SIOS Datakeeper used to create file share for ASCS in a cluster environment and leveraging native SQL replication (using an Always On availability group).

h a sap with m s sql server

This next diagram shows the generic architecture of highly available SAP (running on AnyDB) using SIOS.

diagram for generic h a architecture

Setup for a Linux environment

In the case of a Linux environment, both the DataKeeper and LifeKeeper components of SIOS Protection Suite are used. Datakeeper provides the data replication mechanism, and LifeKeeper is responsible for automatic orchestration of failover of SAP ASCS and databases (e.g., SAP HANA, DB2, Oracle, etc.) across Availability Zones.

The SAP Recovery Kit, which is part of the SIOS Protection Suite, provides monitoring and switchover for different SAP instances. It works in conjunction with other SIOS Protection Suite Recovery Kits (e.g., the IP Recovery Kit, NFS Server Recovery Kit, NAS Recovery Kit, and database recovery kits) to provide comprehensive failover protection. For example, the SAP HANA Recovery Kit within LifeKeeper starts the SAP HANA system on all nodes and performs the take-over process of system replication.

The actual IP address of the SAP ASCS Amazon Elastic Compute Cloud (Amazon EC2) instance and the underlying database is abstracted using overlay IP address (also called floating IP address). An overlay IP address is an AWS-specific routing entry that sends network traffic to an instance within a particular Availability Zone. As part of the failover orchestration, LifeKeeper is also responsible for changing the entries within the route table during failover to redirect the traffic to the active node (primary node).

architecture diagram with route tables

The detailed SIOS guide steps through the deployment of SAP NetWeaver with high availability on AWS using SIOS Protection Suite. The whitepaper uses NFS as part of the setup. However, you can simplify the setup by using Amazon Elastic File Service (Amazon EFS) instead.

Amazon EFS provides a simple, scalable file system for Linux-based workloads that are running on AWS Cloud services and on-premises resources. It is designed to provide massively parallel shared access to thousands of Amazon EC2 instances, enabling your applications to achieve high levels of aggregate throughput and IOPS with consistent low latencies.

In case of any questions, please feel free to reach out to us.