AWS for SAP

Increase Availability of SAP Convergent Mediation using AWS Auto Scaling

Introduction

SAP Convergent Mediation (SAP CM) by DigitalRoute is a product in the SAP Billing and Revenue Innovation Management (SAP BRIM) solution. Customers deploy SAP CM on AWS to track and orchestrate their billing process. Further details and links can be found in the SAP Help portal – SAP Convergent Mediation by DigitalRoute.

There are two deployment types for an SAP BRIM solution depending on usage. The first deployment type is for offline/batch applications (e.g. batch billing mediation) where short periods of downtime are not disruptive to the business. The second deployment type is for real-time applications (e.g. online billing mediation) which require zero service interruptions. Customers who use SAP CM for a batch scenario will often use third party clustering solutions at the application tier to maintain higher degrees of availability in an automated fashion. While this is technically possible, it often increases their overall operational complexity as well as infrastructure support costs.

In this blog, we describe a cost effective and less complex approach to increase availability of an SAP CM platform server in an automated way using Amazon EC2 Auto Scaling. With this proposed solution, customers can minimize the unavailability of the platform server without the need for cluster management software on the application tier. This solution uses an Auto Scaling group (ASG) with a launch template to initiate a new platform server using a custom Amazon Machine Image (AMI). This design minimizes the installation footprint while also avoiding manual intervention in case a platform server experiences an outage. Optionally, customers can use an Amazon EventBridge rule and an AWS Systems Manager automation document to create an image before instance termination.

This blog provides guidance on how to set up a pilot to test resiliency for the platform container. Before using it in a production environment, additional development and tuning for your environment’s requirements are necessary. Also, for stateful, real-time scenarios in SAP CM which require session information to be persisted, a high availability deployment using external cluster management software will still be required, but will not be covered in this blog.

Overview

In SAP CM, the platform and execution containers are installed on separate hosts. Each container contains at least one pico process of type Platform, Execution context (EC), or Service context (SC). These pico instances are typically configured after the installation of the container. Platform and database host configurations provide storage and services that are essential to the mediation zone system. Execution servers provide scalable processing capacity in the system and redundancy is achieved by having multiple execution servers in different AWS Availability Zones.

The following diagram explains the high-level architecture for SAP Convergent Mediation. An AMI is taken from an existing platform server and invoked using a launch template along with user data. The user data script configures an Overlay IP address that is allocated to the platform server. Execution servers in SAP CM communicate with the platform server using this Overlay IP. In case of an issue in the platform container, the Application Load Balancer finds the web interface port (default 9000) of the platform container unreachable and reports the unhealthy status of the instance to the ASG. Due to the ASG’s target instance settings, it will terminate the faulty instance and launch a new platform server based on the configured AMI. The new instance then registers itself as a new target instance and the Application Load Balancer forwards the next request to the new instance. To troubleshoot the root cause of the failure, it is possible to take a backup of the instance before termination using a lifecycle hook, Amazon EventBridge rules and an AWS Systems Manager automation document.

Increasing Availability of platform server in SAP Convergent Mediation

Figure 1: Increasing Availability of platform server in SAP Convergent Mediation

The SAP CM web interface health check detects any anomalies with the platform pico process only. This doesn’t cover any additional service contexts (SCs) if manually configured to run on the platform instance.

Architecture Description

  • Route 53 is a highly available and scalable Domain Name System (DNS) web service.
  • Application Load Balancer (ALB) serves as the single point of contact for client connections, and routes the requests to the platform container.
  • Auto Scaling Group helps to maintain Amazon EC2 instance availability.
  • Amazon EFS is used for SAP CM storage shared across platform and execution containers
  • Multiple execution containers in multiple AZ’s to increase redundancy. In case of a failure of an execution container, batches running in those execution containers need to be restarted manually.
  • Pacemaker cluster for SAP HANA database high availability. This blog does not address the resilience requirements of the database layer, although in most cases a Pacemaker cluster is used for this purpose. Further details can be found on the following link: SAP HANA on AWS High Availability Configuration Guide for SLES and RHEL
  • AWS Systems Manager automation document to trigger an AMI of the EC2 instance before termination

Prerequisites

  • Installation of SAP CM is done as per the installation guide in the SAP Help Portal – SAP Convergent Mediation by DigitalRoute. In the example below, cmplat is the SAP CM platform container and cmexec1, cmexec2 are the SAP CM execution containers. The container name is mz01 for cmplat, ec01 for cmexec1 and ec02 for cmexec2.
  • In Figure 2 we have the AWS Identity and Access Management (IAM) Policy assigned to the platform container that provides permission to update the route table. Replace the AWS Region, account number and route table details accordingly.

{

“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Action”: “ec2:ReplaceRoute”,
“Resource”: “arn:aws:ec2:<AWS Region>:<VPC-Account-Number>:route-table/rtb-xxxxxxxxxxxxxxxxx”
},
{
“Effect”: “Allow”,
“Action”: “ec2:DescribeRouteTables”,
“Resource”: “arn:aws:ec2:<AWS Region>:<VPC-Account-Number>:route-table/rtb-xxxxxxxxxxxxxxxxx”
}
]
}

Figure 2: Enable access to update route table entries

Solution

The flow is as follows:

  1. Disable source/destination checks in the SAP CM platform server.
  2. Overlay IP is added to the IP configuration on the active SAP CM platform server.
  3. The Overlay IP in the route table has a destination defined as the ENI of the active SAP CM server.
  4. Modify the platform and execution container properties to point to the OIP.
  5. Take an AMI of the platform EC2 instance.
  6. Create Launch Template.
  7. Create Auto Scaling Group.
  8. Attach the existing platform EC2 instance to the ASG.
  9. Create a Target Group with health check HTTPS and health check path as /mz/main.
  10. Update the ASG with the Load Balancer target group.
  11. Create an Application Load Balancer with the target set as the target group created in Step 9.
  12. Create a lifecycle hook in the ASG and create an SSM document to take an AMI of the instance. Create an Amazon EventBridge rule and add an SSM document as an EventBridge rule target.

Disable source/destination checks in the SAP CM platform server

Disable source/destination checks in the SAP CM platform server. In the Amazon EC2 console, select the EC2 instance for the SAP CM platform, choose Actions, Networking, Change source/destination check.

Change source / destination check

Figure 3: Change source / destination check

Overlay IP & Route table update

One Overlay IP, which is an IP address that exists outside of the CIDR range of the VPC, is assigned to the cmplat server. In this case, 192.168.0.1 is assigned as the Overlay IP.

cmplat:~ # ip address add 192.168.0.1/32 dev eth0
cmplat:~ # ip a show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
link/ether 02:01:60:61:b7:17 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.224/24 brd 10.0.2.255 scope global eth0
valid_lft forever preferred_lft forever
inet 192.168.0.1/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::1:60ff:fe61:b717/64 scope link
valid_lft forever preferred_lft forever
cmplat:~ #

Figure 4: Adding the overlay IP address to the cmplat server

Update the route table with the overlay IP and assigned target as the eni of the platform server

Overlay IP and route table update

Figure 5: Overlay IP and route table update

Modify the properties file

Modify the property pico.rcp.platform.host in {MZ_HOME}/common/config/cell/default/master/cell.conf as below.
MZ_HOME specifies the SAP CM software installation location and is shared across the platform and execution container servers.
pico.rcp.platform.host=”{chosen overlay IP}”

e.g. pico.rcp.platform.host=”192.168.0.1″

Modify the property “address” in {MZ_HOME}/common/config/cell/default/master/containers/mz01/container.conf.
“address” : “{chosen overlay IP}”

e.g. “address” : “192.168.0.1”

The property pico.rcp.server.host (for execution containers) in the below configuration file points to the local IP address of the respective execution container and doesn’t need any modification.{MZ_HOME}/common/config/cell/default/master/containers/ec01/container.conf
{MZ_HOME}/common/config/cell/default/master/containers/ec02/container.conf

pico.rcp.server.host=”{local IP address}”

Take AMI

Take an AMI of the platform container. AMI Name: cmplatimage

AMI of platform container

Figure 6: AMI of platform container

Create Launch Template

Create a launch template using the AMI taken in the previous step (cmplatimage). This launch template will be used for the Auto Scaling group in the next step. The user data section in the launch template is used to specify a configuration script that will run during launch which will take actions to add the Overlay IP, update the route table, and cleanly restart required platform services.

In the steps below, we create a launch template cmlt.

Creating launch template

Figure 7: Creating launch template

Creating launch template: selecting the AMI taken earlier

Figure 8: Creating launch template: selecting the AMI taken earlier

You can use the user data in Figure 9, in the launch template. This will add the Overlay IP to the IP configuration of the newly launched instance, update the route table with the ENI of the new instance and restart the platform instance.

#!/bin/bash -x
hostnamectl set-hostname –static cmplat
echo cmplat > /etc/hostname
ip address add 192.168.0.1/32 dev eth0
TOKEN=`curl -X PUT “http://169.254.169.254/latest/api/token” -H “X-aws-ec2-metadata-token-ttl-seconds: 21600″`
instance_id=$(curl -H “X-aws-ec2-metadata-token: $TOKEN” -s http://169.254.169.254/latest/meta-data/instance-id)
aws ec2 –region us-east-1 modify-instance-attribute –instance-id=$instance_id –no-source-dest-check
eni_id=$(aws ec2 –region us-east-1 describe-instances –instance-ids $instance_id –query ‘Reservations[*].Instances[*].NetworkInterfaces[*].{NetworkInterfaceId:NetworkInterfaceId}’ –output text)
aws ec2 –region us-east-1 replace-route –route-table-id rtb-0cbf476881fca9021 –destination-cidr-block 192.168.0.1/32 –network-interface-id $eni_id
if
su -c ‘mzsh restart platform’ – mzadmin; then
echo “Platform was started with rc 0”
else
echo “Platform was not started correctly in first attempt. Retrying”
su -c ‘mzsh shutdown platform’ – mzadmin
su -c ‘mzsh startup platform’ – mzadmin
fi
su -c ‘mzsh system start’ – mzadmin

Figure 9: Script for user data of cmlt launch template

EC2 Auto Scaling Group

Create an EC2 Auto Scaling group (ASG) with desired capacity settings. In the below example we created an ASG named cmasg. Launch template cmlt is used to launch a new EC2 instance.

Creating Auto Scaling group

Figure 10: Creating Auto Scaling group

After the ASG is created, attach the existing platform instance (cmplat) to the ASG (cmasg).

Attaching instance to Auto Scaling group.

Figure 11: Attaching instance to Auto Scaling group.

The health check grace period is 300 seconds by default when an ASG is created. We suggest setting this to a minimum of 600 seconds to allow the new EC2 instance initialization to be completed. This can be updated in the ASG health check settings. This will prevent unnecessary termination of the platform instance.

ASG health check grace period

Figure 12: ASG health check grace period

Create Target Group

Create a target group (cmplatform-tg-9000) with port as HTTPS:9000 and Path as ‘/mz/main’ (SAP CM Web Interface path). Port 9000 is configured by the install.str.mz_platform parameter during the SAP CM installation.

Create target group Basic configuration

Figure 13: Create target group – Basic configuration

Create target group Health checks

Figure 14: Create target group – Health checks

Create target group: Advanced health check settings

Figure 15: Create target group: Advanced health check settings

Register the cmplat server as the target by using the instance ID and following this link Register and deregister targets by Instance ID.

Create target group: Register target (1)

Figure 16: Create target group: Register target (1)

Create target group: Register target (2)

Figure 17: Create target group: Register target (2)

Create target group

Figure 18: Create target group

Update ASG with Load Balancer target group

Attach the target group cmplatform-tg-9000 in the Load Balancer target groups for the cmasg ASG. Update the desired capacity to 1.

Attaching target group to ASG

Figure 19: Attaching target group to ASG

Setting Desired capacity in ASG as 1

Figure 20: Setting Desired capacity in ASG as 1

Create Load Balancer

Create the Application Load Balancer (cmplatform) with listener ports which forward the request to target group (cmplatform-tg-9000).

Creating Application Load Balancer cmplatform

Figure 21: Creating Application Load Balancer cmplatform

Application Load Balancer Listener port configuration

Figure 22: Application Load Balancer Listener port configuration

For encrypted communication, an SSL certificate is required. You can use AWS Certificate Manager (ACM) to provision, manage, and deploy public and private SSL/TLS certificates.

At this point, the cmplat instance has a healthy status in the target group.

Initial health status in Target group

Figure 23: Initial health status in Target group

End result testing

Based on the configuration performed in the previous steps, the ASG performs periodic health checks for the instances in the group and maintains the desired capacity in case of any outage of SAP CM platform.

In case of unavailability of an EC2 instance or issues with the platform pico process, the instance in the target group will be unhealthy, as shown in the screenshot below. Once the instance is in an unhealthy status in the ASG (cmasg), it initiates a new instance based on the launch template (cmlt).

In the testing below, we initiated a manual kill for the platform process, which turned the target health check status to “unhealthy” in the target group. The ASG then initiates the termination of the unhealthy instance and replaces it with a new instance.

Health status in Target group post platform process failure test

Figure 24: Health status in Target group post platform process failure test

Instance status in Auto Scaling group post platform process failure test

Figure 25: Instance status in Auto Scaling group post platform process failure test

The unhealthy status of the instance in the ASG initiates a new instance launch using the launch template and makes the platform instance available again. The new cmplat instance has health check status as initial and after the instance initialization is complete, the health check status turns to healthy.

Health status in Target group showing “Target deregistration in progress” for previous cmplat instance, and “Target registration in progress” for new cmplat instance.

Figure 26: Health status in Target group showing “Target deregistration in progress” for previous cmplat instance, and “Target registration in progress” for new cmplat instance.

After the new instance initialization is completed, the platform instance has a healthy status and the platform is available again without manual intervention.

Health status in Target group showing “healthy” status after instance initialization is complete

Figure 27: Health status in Target group showing “healthy” status after instance initialization is complete

Using AWS SSM to take backups of instances before ASG terminates it

In the event of an SAP CM platform unavailability, the ASG will terminate the EC2 instance and launch a new instance based on the AMI image. If you would like to keep a backup image of the instance before termination you can follow the blog Run code before terminating an EC2 Auto Scaling instance.

Following these steps will keep the troubled instance in the Terminating:Wait status for one hour by default (can be customized by changing heartbeat timeout in lifecycle hook settings). The EC2 Termination event from the ASG will trigger an SSM document which will take a backup of the EC2 instance and then send a termination signal to the ASG. This provides a way for troubleshooting and finding out the cause of the error at a later timeframe.

Conclusion

In this blog post, you’ve learned how to increase the availability of an SAP CM platform server using AWS ASG’s, without the complexity or licensing cost of running a third party cluster. You may use this procedure to gain higher availability and resiliency of an SAP CM system for batch convergent mediation scenarios.

You can find out more about DigitalRoute documentation in SAP Note 2924977. (SAP Support Portal login required)

Join the SAP on AWS Discussion

In addition to your customer account team and AWS Support channels, AWS provides public Question & Answer forums on our re:Post Site. Our SAP on AWS Solution Architecture team regularly monitor the SAP on AWS topic for discussion and questions that could be answered to assist you. If your question is not support-related, consider joining the discussion over at re:Post and adding to the community knowledge base.