AWS for SAP

SAP application cluster, SAP HANA cluster and SAP application service monitoring

Background

AWS has more than 5000 active customers running SAP on AWS, with workloads that run on a diverse set of SAP releases and versions. Most customers are working with SAP NetWeaver based environments composed of one or more application servers and a database. The best practice to achieve high availability of applications and databases is to leverage operating system clustering. The cluster resource availability ensures the application and database resilience.

This blog/document describes the technique to monitor SAP Application cluster, HANA database cluster, HANA replication and SAP applications core services based on Amazon CloudWatch metrics and dashboards. This approach enables customers to effectively monitor SAP NW and Database cluster environments at a low cost without the need to deploy or manage any servers or agents. The solution can be deployed seamlessly within minutes. It uses the custom metric capability of Amazon CloudWatch. This allows you to easily publish your own metrics, such as cluster resource status data, and create thresholds and alarms in CloudWatch. The Alarm will then trigger notification emails using Amazon SNS.

Architecture

Architecture for monitoring SAP core services

The Architecture depicts the AWS services involved in deploying the monitoring of SAP applications, SAP clusters and SAP HANA database core services using AWS CloudWatch Custom metrics

The architecture is deployed with AWS CloudFormation template. The generated Amazon CloudWatch rule calls SSM document every minute. SSM runs the command on the target cluster nodes to push the cluster health check data to Amazon CloudWatch. Alarms can be set on custom metrics to generate a notification upon breach of threshold values.

The cluster resources are monitored for Started status. Any deviation from that will trigger a SNS notification. As part of SAP application cluster, Stonith, File system, SAPInstance and Overlay IP status are monitored. For HANA cluster Stonith, HANA topology and Over lay IP status are monitored.

The SAP dispatcher status is monitored on PAS / AAS application hosts. The dispatcher processes are monitored for running status. The additional custom metrics like memory utilization, disk usage on SAP hosts and HANA database are also monitored.

Amazon SNS notifications can be setup to receive alerts based on Amazon CloudWatch Alarms.

Instance pre-requisite

The instances should have SSM agent running, a role with PutMetricData policy and config (aws configure) file with right region information.

Setup monitoring

The monitoring setup is a simple process of keying in the instance Ids of PAS/ASS hosts, HANA cluster nodes, SAP ASCS, and SAP ERS hosts along with SAP SID.

The CloudFormation template can be accessed through the link and copied to your user’s S3 bucket. The monitoring setup can be initiated creating CloudFormation stack using S3 object URL of monitor.yaml.

The CloudFormation template can setup monitoring of cluster resources of both HANA and SAP along with SAP PAS availability. The template creates custom dashboard with SAP SID as the name. The custom dashboard comprises of widgets for metrics being monitored as part of the monitoring setup. The widgets are populated with the real time metrics providing status of cluster resources on both SAP Application and HANA database clusters, HANA Replication status, SAP Primary or Secondary application resources (dispatcher, ICM, IGS and gateway), disk usage, and memory usage.

CloudFormation Parameters

The parameters needed as part of CloudFormation to deploy monitoring

The successful run of CloudFormation template creates Amazon CloudWatch event rule. The rule runs SSM document on respective hosts to monitor the cluster resource status.  The status of each resources is pushed as custom metric to Amazon CloudWatch dashboard based the number resources running. For example, one STONITH per cluster is depicted as 1 on custom metric dash board.

Sample Dashboard

The dashboard depicts all the core services monitored using custom CloudWatch metrics

Reading the HANA Cluster, SAP Cluster, and SAP Availability widgets

The cluster monitoring data is fed through both nodes of HANA cluster as two separate widgets for each host. The HANA cluster has three key resources. One stonith to address the split-brain situations, one overlay IP address and one clone set which addresses the HANA services.  The timeline series-based graph shows these metrics over lapping on each other. You can hover the graph to view the status of the resources. The value 1 depicts the availability of the resources co-relating the active status. The value of any inactive resource is 0.

Widgets for HANA cluster nodes

The widgets represents the HANA cluster resources being monitored and depicts active OverlayIP, AWS Stonith and HANA Clone set resources through both Primary and Secondary HANA Nodes.

The monitoring setup captures the status of HANA replication as well. The widget value of 1 indicates the replication is active.

Widget for HANA replication status

The widget depicts the status of replication of data being HANA database servers which are part of cluster

As part of the SAP cluster resource monitoring the same data is fed to dashboard from both cluster nodes.  SAP cluster has four key resources – one stonith to handle split-brain situations, two SAP instance resources, two SAP file mount points, and two overlay IP addresses (one for each ASCS and ERS). The values on the widgets indicates the active resources.  A value of 2 shows both over lay IPs, mount points and SAP Instances are active.  A value of 1 for stonith implies the stonith is active as part of the cluster.

Widgets for SAP cluster nodes

Widget depicts the SAP cluster resources being monitored. The image shows active resources SAP Instances both ASCS & ERS, File systems for ASCS & ERS, 2 overlayIP one for each ASCS & ERS and AWS Stonith.

The availability status of the SAP PAS / AAS shows a number if work process is active. If for any reason the application is down the value trends to 0. The screenshot below shows active work processes, ICM, IGS and gateway services.

Widget for dispatcher status on Primary application server

The image depicts status of SAP PAS dispatcher and other core services which are active.

SNS Notifications

A CloudWatch Alarm can be set for getting a notification upon failure of any resources. Any resources other than started status triggers a SNS notification. This helps react in time to fix the issue to ensure the system availability.

Parameters for SNS Topic

Parameters needed for creating SNS Topic

Subscription Screen of SNS

The screen to create subscription for subscriber.

Subscription Screen Parameters

Create subscription by choosing the right protocol to receive the alarm about the deviation of the metric.

Subscription Confirmation

The image depicts the confirmation email for subscriber subscription

Confimed Subscription on SNS

The SNS topic depicting confirmed subscription on Amazon SNS

 

CloudWatch Alarm

You can create a Cloud Watch Alarm on metrics of interest using us Alarm by navigating to the Alarm setting.  The below example shows alarm set on STONITH resource. Each cluster has one STONITH resource and any value below 1 indicates the resource is not active. This triggers an Alarm sending an email notification leveraging the SNS topic.

CloudWatch Alarm parameters

The parameters needed for set up of an alarm

 

Threshold conditions for Alarm

Selecting the threshold type to for Alarm setup

SNS Notification Details for Alarm

The screen shows the depiction of the Alarm setup to choose the SNS topic or create a new SNS topic to send the notifications.

Description of Alarm

Provide the description and Name for Alarm to complete the Alarm setup

Alarm status

The status of the Alarm as part of the Alarm widgets.

Any Alarm in service triggers a notification email using SNS topic.

Cost impact

The unplanned downtimes have large revenue impact, based on 2014 blog of Gartner  a downtime results in $5600 / minute revenue loss.  The monitoring capabilities enhance better reaction times for unplanned downtimes reducing the revenue loss.

Cost Considerations

The cost of the custom metrics and custom dashboard vary by region. The template deploys a custom dashboard along with about 50 custom metrics monitoring resources and services on HANA Cluster, SAP cluster and SAP Primary application server. The cost estimations can be found here for the metrics and dashboard.

Conclusion

The blog provides an overview about one of the methods to monitor status of SAP cluster and HANA cluster resources leveraging Amazon CloudWatch. The solution provides easy deployment using CloudFormation to monitor SAP cluster resources, HANA database clusters resources, HANA replication status, and SAP applications core services. To learn more about Amazon CloudWatch, visit the Amazon CloudWatch documentation and if you would like to discuss the SAP cluster monitoring approach discussed in this blog, do connect with us here.