SAP Disaster Recovery Solution Using CloudEndure: Part 2 Failback

In the previous blog, we covered failover from Amazon Web Services (AWS) Primary region to AWS Disaster Recovery Region with CloudEndure Disaster Recovery. In this blog post, we will walk you through a failback of your productive workload from the DR region to the primary Region.

There are a variety of reasons for doing a failback as soon as possible to your primary region. This includes effective leverage of previously purchased usage commitments (Reserved Instances or Savings Plans), network latency to end users and other linked workloads, or reducing data transfer costs.

CloudEndure allows you to prepare for Failback by reversing the direction of Data Replication from the Target machine back to the Source machine. This blog post will cover fail back procedure from AWS DR Region to AWS Primary Region using CloudEndure Disaster Recovery.

Solution Overview

In this scenario, the failover from Primary Region to AWS DR Region is performed using CloudEndure as documented in SAP DR Solution Part 1.

Now that the SAP instances are running in the AWS DR Region with connections to all the interfaces, SAP and non-SAP systems and services are back up & running as per normal.

CloudEndure provides the option to fail back the SAP instances by replicating the instance at block level to Primary region. Using CloudEndure, customers don’t have to perform backup and restore of applications from DR to Primary Region for the failback to Primary. When the CloudEndure replication to Primary Region is complete, the instances are updated in DNS to match the Primary Region IPs to resume services.

Note: One of the criteria to be observed when implementing DR is not to share the resources between Primary and AWS DR region.

This picture depcits the high level architecture of the CloudEndure setup and the replication flow from DR Region to Primary region during failback. The CloundEndure Agents on source will continuously replication the data to Primary Region


CloudEndure provides the fully orchestrated failback method within the CloudEndure DR console. Customer’s need to ensure that the following pointers are met when they are failing back to AWS Primary Region, this will help validate the overall efficient failback.

1.     Customer has failed over the SAP instances from Primary to AWS DR region

2.     Customer has identified and scheduled a time to perform the failback will not disrupt the production workload SLA’s

3.     Customer has implemented replication for database, shared file systems (Elastic File System) using AWS DataSync or rsync or other supported replication solution from the DR to primary region

4.     Customer has identified the process to update DNS during DR test

5.     Customer has the instances reserved in the Primary region

Pre-Requisites Steps

First, we will prepare CloudEndure to perform the failback of the failed over Amazon EC2 Instances, by configuring the failback replication.

1.     Use the Disaster Recovery project in the CloudEndure console that was used to failover from primary to AWS DR region

2.     On the Setup & Info page, check that the AWS Credentials, the AWS access key ID and secret access key of the IAM user are created in the target account and choose SAVE.

3.     Check the Blueprint and Replication settings

a.     Blueprint

i.     Choose the Subnet in primary region

ii.    Choose to use new private IP for primary instanc

iii.   Choose the AWS Identity Access Management Role in primary region, which Amazon EC2 instances has to be assigned

iv.    Choose the source systems disks to replicate

b.     Replication Settings

i.     Choose the replication server instance type

ii.     Choose the primary Region Amazon VPC

iii.     Choose the primary instance security group

iv.     Choose the staging area disks

4.     Save the settings

5.     Enable the port 443 and 1500 in source and target security groups,

You can establish communication between the Source machines and the CloudEndure Service Manager over TCP Port 443 either via Direct communication between the Source machines and the Service Manager or indirect communication by using a proxy, for more details refer to CloudEndure Documentation

6.     Perform a telnet check to connect port 1500 from the DR instance to Primary region

7.     Use a private connection, such as over an IPSec VPN or AWS Direct Connect link, by checking the “Use Private IP” option. For more details refer to the CloudEndure Documentation

Failback Steps

Now that the failback replication has been configured, we can start the replication, then failback the failed over Amazon EC2 instances.

1.     Select the instance to be replicated from AWS DR Region to Primary Region in the CloudEndure console

2.     Start the replication

Screenshot Shows the Replication Progress on CE Console

3.     Once the Replication is complete, stop any user/batch activity on the source machine and then launch the target machine

Tip: Look at user sessions in SM04, Al08 or SM66 to make sure of any user activity. Check for any active batch jobs in SM37.

This picute is showing the console prompting to launch the Target system at Primary Site during failback

4.     Choose the recovery point and click on “Continue with Launch”.

Tip: This step defines the RPO strategy of the SAP Disaster Recovery

Screenshot shows to Choose the Recovery Point to Launch the instance

5.     The Machine Launch Status can be checked for status under Job link

The Screenshot shows the Launch Progress

6.     Once the machine is launched, the target information will show the Amazon EC2 Instance ID

Screenshot Shows the Instance Information Launched on the CE Console

7.     AWS Management Console will show the launched target instance under Amazon EC2 we can connect to the Failback instances either by using session manager (SSM) or SSH through a bastion host or direct access via corporate network.

The Screenshot shows the Instance status of EC2 on the AWS Management Console

8.     Mount /sapmnt and /usr/sap/trans to the Amazon EC2 instances in the primary region.

Tip: CloudEndure performs the block level replication of the source Amazon EC2 instance Amazon Elastic Block Store volumes which hosts the Operating system and file systems. The Amazon Elastic File system shared file systems such as /sapmnt and /usr/sap/trans are not part of CloudEndure Replication. The Amazon EFS at DR are replicated back to Primary region using AWS DataSync or rsync as stated in pre-requisite point 3.

9.     Start the SAP application on each replicated Amazon EC2 instance in failback environment using the follow commands as <sid>adm user. Confirm that the system returns “OK” in response to each command.

sapcontrol -nr <Instance number> -function StartService

sapcontrol -nr <Instance number> -function Start

10. Check the sap application services and validate if application is up and running (Message Service, Enqueue, Dispatcher). The following command can be used to check the status

sapcontrol -nr <Instance number> -function GetProcessList


Using CloudEndure as Disaster Recovery solution for SAP works both ways – for failing over SAP systems from Primary to DR AWS Region and when the Primary site has been recovered, CloudEndure can also be used to failback to Primary site in by allowing you to prepare for Failback by reversing the direction of Data Replication from the Target machine back to the Source machine in a fully orchestrated way.

For additional details, consult the CloudEndure Reference Documentation. 

To learn why 5,000+ SAP customers trust AWS to get more value out of their SAP investments, visit the SAP on AWS page.