AWS for SAP

SAP Disaster Recovery Solution Using CloudEndure: Part 1 Failover

Disasters due to natural calamities, application failures, or service failures not only causes downtime for business applications but also cause data loss, and revenue impact. To mitigate the impacts of such scenarios, Disaster Recovery (DR) planning is critical for organizations running mission-critical and business-critical applications such as SAP.

In this blog, we will walk through how organizations can leverage CloudEndure as Disaster Recovery solution for SAP applications and review aspects that are applicable to SAP.

CloudEndure Disaster Recovery solution enables organizations to quickly and easily shift their disaster recovery strategy to AWS from existing physical or virtual data centers, private clouds, or other public clouds, in addition to supporting cross-region / cross-AZ disaster recovery in AWS. CloudEndure Disaster Recovery minimizes downtime and data loss by providing fast, reliable recovery of physical, virtual, and cloud-based servers into AWS Cloud, including public regions, AWS GovCloud (US), and AWS Outposts. You can use CloudEndure Disaster Recovery to protect your most critical databases, including Oracle, MySQL, and Microsoft SQL Server, as well as enterprise applications such as SAP.

CloudEndure Disaster Recovery continuously replicates your machines (including operating system, system state configuration, databases, applications, and files) into a low-cost staging area in your target AWS account and preferred Region. In the case of a disaster, you can instruct CloudEndure Disaster Recovery to automatically launch thousands of your machines in their fully provisioned state in minutes. By replicating your machines into a low-cost staging area while still being able to launch fully provisioned machines within minutes, CloudEndure Disaster Recovery can significantly reduce the cost of your disaster recovery infrastructure. The two key concepts when it comes to DR planning are Recovery Time Objective (RTO) and Recovery Point Objective (RPO). RTO is the maximum period you want your systems to be unavailable due to an outage. RPO refers to the point of data processing you wish to recover to if there is a disaster. The following diagram illustrates the correlation of RTO and RPO:

Chart showing the difference between RPO and RTO

Solution Overview

CloudEndure Disaster Recovery minimizes downtime and data loss by providing fast, reliable recovery of physical, virtual, and cloud-based servers into AWS in the event of IT disruptions.

The CloudEndure Agent is installed on servers running SAP workloads to connect to the CloudEndure console to replicate those servers (including operating system, system state configuration, databases, applications, and files) into a low-cost staging area in the target DR AWS account in DR AWS Region or DR AWS Availability Zone. In the case of a disaster, you can launch the replicated servers via CloudEndure console to fully provisioned state in minutes then register in DNS to continue the operations. Below diagram is a reference Production Architecture for SAP Application DR from primary AWS Region to DR AWS Region.

Architecture diagram showing Primary and Disaster Recovery Regions

Pre-Requisites

Customers have the flexibility to implement Disaster Recovery in another AWS Region different from where production workload runs or in another Availability Zone depending on their data sovereignty compliance requirements. In this blog, we will cover using a different region for Disaster Recovery but the steps are similar to using another Availability Zone.

  1. Customer has implemented SAP systems in the primary region and identified the DR AWS region
  2. Customer has CloudEndure license
  3. Customer has identified the RTO and RPO for application servers, and database
  4. Customer has implemented replication for shared file systems (EFS) using AWS DataSync or rsync or other appropriate tool from the primary to DR AWS region
  5. Customer has identified the process to update the DNS during DR test
  6. Customer handles Database native replication to sync DR Database
  7. Customer has at least 50% of the production level Amazon EC2 instance capacity reserved in DR AWS Region in case of DR failover
  8. Create AWS IAM users and policies

Steps

  1. Register the CloudEndure account for the customer in AWS Marketplace
  2. Create a Disaster Recovery project in the CloudEndure console
  3. On the Setup & Info page, under AWS Credentials, the AWS access key ID and secret access key of the IAM user were created in the target account and choose SAVE.
  4. Setup the IAM role which enables CloudEndure to create target EC2 instances and copy EBS volumes
  5. Setup the Blueprint and Replication settings
    • Blueprint
      1. Choose the Subnet
      2. Choose to use new private IP for DR instance
      3. Choose the IAM role
      4. Choose the source systems disks to replicate
    • Replication Settings
      1. Choose the replication server instance type
      2. Choose the DR Amazon Virtual Private Cloud (VPC)
      3. Choose the DR security group
      4. Choose the DR staging area disks
  1. Save the settings
  2. CloudEndure Agent installation instructions are in the “How to Add Machines” section of CloudEndure Console. Download the CloudEndure Agent on one Amazon EC2 instance running SAP and share with the others for installation using below command line:

wget -O ./installer_linux.py https://console.cloudendure.com/installer_linux.py

  1. Install the CloudEndure Agent in each source SAP instance using the project installation token as shown below. The installation Token is unique to the project and available in the CloudEndure console.

sudo python ./installer_linux.py -t <Installation Token> –no-prompt

Screenshot of how to add machines within the CloudEndure console

  1. To check the status of CloudEndure Agent on source machines, execute the following command:

ps -ef | grep cloudendure | grep -v grep | grep -v bash | wc -l

The output of the above command shows ‘5’ CloudEndure processes that indicates Agent is fully running and less than ‘5’    indicates the Agent is not fully operational. The Agent log can be found in the file: /var/lib/cloudendure/agent.log.0

  1. Enable the port 443 and 1500 to establish communication between CloudEndure Agent and the Replication Server. Table below shows the ports and their purpose for reference.
Port Number Protocol Source Destination Description
443 HTTPS Source Machines (CE Agent) CloudEndure Service Manager Agent Download, Upgrade. Display replication status. Capture source machine packages and metrics
443 HTTPS Replication Server CloudEndure Service Manager Display replication status and capture replication server metrics
1500 Custom TCP Source Machines Replication Server Encrypted data transfer
  1. Once the Agent is installed and the source machine is registered in the CloudEndure Console, you’ll see the instance appear in the CloudEndure Console while the initial data replication task starts
  2. Once the CloudEndure Replication is complete; the instance is ready to be launched at the DR site.

Launch the SAP instance in the DR site

For illustration purposes, the following steps launches one Amazon EC2 server running SAP workload on the DR site replicated via CloudEndure from the Primary site. Same procedure applies for launching additional servers.

  1. Select the instance to be launched and click on the “Launch Target Machine”.
  2. Select the “Recovery Mode”, for instance to failover from primary to DR site. Testing the DR solution is standard practice to make sure the solution works and should be part of periodic Disaster Recovery test.
    • To perform a DR test, select the “Test Mode”

Screenshot of the Step where we have the Launch Machines options

Screenshot of the Confirmation on the Launch Machine

  1. Click Next
  2. Choose the Recovery point.
    Screenshot showing the option to choose recovery point
  3. Click on “Continue with Launch”
  4. The SAP DR instance is launched in the DR AWS Region
    Screenshot of the Console showing the Instance launched in the DR Site
  5. Connect to the DR instance either by using Session Manager or SSH through a Bastion host or direct access via corporate network.
    Screenshot showing the SSM Option to login to the EC2 instance
  6. CloudEndure performs the block level replication of the source EC2 instance EBS volumes which hosts the Operating system and file systems. The shared file systems such as /sapmnt and /usr/sap/trans which were created using EFS are not part of CloudEndure Replication. The shared file systems are replicated by CloudEndure to DR AWS Region using AWS DataSync or rsync as stated in pre-requisites point 4. The replicated DR EFS for shared file systems: /sapmnt and /usr/sap/trans are mounted in the DR system
  7. Start the SAP application on each server in the DR environment using the following commands as the <sid>adm user. Confirm that the system returns “OK” in response to each command.

sapcontrol -nr <Inst No> -function StartService <SID>

sapcontrol -nr <Inst No> -function Start

Register the new DR IP in the DNS. Login to the SAP instance using SAPGUI to validate if SAP is up and running in the DR AWS Region.

Cost Estimation

For cost estimation, AWS CloudEndure pricing is $0.028 per hour per server, or an estimated $20 per month per server. As a reference point, the cost of 50 instances in the DR AWS Region would be

SAP Instances Estimated Cost (Month)
50 $1000.00

Conclusion: We saw how AWS CloudEndure could be leveraged as a DR solution for SAP systems to fail from Primary to DR AWS Region. The CloudEndure is an effective and cost-optimized solution for critical and non-critical applications. In the next blog, we will see how we can fail back from DR to the primary region.

To learn why more than 5,000 customers run SAP on AWS, visit the SAP on AWS page.

CloudEndure Reference Documentation:

https://docs.cloudendure.com