AWS for SAP
SAP application cluster, SAP HANA cluster and SAP application service monitoring
Background
AWS has more than 5000 active customers running SAP on AWS, with workloads that run on a diverse set of SAP releases and versions. Most customers are working with SAP NetWeaver based environments composed of one or more application servers and a database. The best practice to achieve high availability of applications and databases is to leverage operating system clustering. The cluster resource availability ensures the application and database resilience.
This blog/document describes the technique to monitor SAP Application cluster, HANA database cluster, HANA replication and SAP applications core services based on Amazon CloudWatch metrics and dashboards. This approach enables customers to effectively monitor SAP NW and Database cluster environments at a low cost without the need to deploy or manage any servers or agents. The solution can be deployed seamlessly within minutes. It uses the custom metric capability of Amazon CloudWatch. This allows you to easily publish your own metrics, such as cluster resource status data, and create thresholds and alarms in CloudWatch. The Alarm will then trigger notification emails using Amazon SNS.
Architecture
The architecture is deployed with AWS CloudFormation template. The generated Amazon CloudWatch rule calls SSM document every minute. SSM runs the command on the target cluster nodes to push the cluster health check data to Amazon CloudWatch. Alarms can be set on custom metrics to generate a notification upon breach of threshold values.
The cluster resources are monitored for Started status. Any deviation from that will trigger a SNS notification. As part of SAP application cluster, Stonith, File system, SAPInstance and Overlay IP status are monitored. For HANA cluster Stonith, HANA topology and Over lay IP status are monitored.
The SAP dispatcher status is monitored on PAS / AAS application hosts. The dispatcher processes are monitored for running status. The additional custom metrics like memory utilization, disk usage on SAP hosts and HANA database are also monitored.
Amazon SNS notifications can be setup to receive alerts based on Amazon CloudWatch Alarms.
Instance pre-requisite
The instances should have SSM agent running, a role with PutMetricData policy and config (aws configure) file with right region information.
Setup monitoring
The monitoring setup is a simple process of keying in the instance Ids of PAS/ASS hosts, HANA cluster nodes, SAP ASCS, and SAP ERS hosts along with SAP SID.
The CloudFormation template can be accessed through the link and copied to your user’s S3 bucket. The monitoring setup can be initiated creating CloudFormation stack using S3 object URL of monitor.yaml.
The CloudFormation template can setup monitoring of cluster resources of both HANA and SAP along with SAP PAS availability. The template creates custom dashboard with SAP SID as the name. The custom dashboard comprises of widgets for metrics being monitored as part of the monitoring setup. The widgets are populated with the real time metrics providing status of cluster resources on both SAP Application and HANA database clusters, HANA Replication status, SAP Primary or Secondary application resources (dispatcher, ICM, IGS and gateway), disk usage, and memory usage.
The successful run of CloudFormation template creates Amazon CloudWatch event rule. The rule runs SSM document on respective hosts to monitor the cluster resource status. The status of each resources is pushed as custom metric to Amazon CloudWatch dashboard based the number resources running. For example, one STONITH per cluster is depicted as 1 on custom metric dash board.
Reading the HANA Cluster, SAP Cluster, and SAP Availability widgets
The cluster monitoring data is fed through both nodes of HANA cluster as two separate widgets for each host. The HANA cluster has three key resources. One stonith to address the split-brain situations, one overlay IP address and one clone set which addresses the HANA services. The timeline series-based graph shows these metrics over lapping on each other. You can hover the graph to view the status of the resources. The value 1 depicts the availability of the resources co-relating the active status. The value of any inactive resource is 0.
The monitoring setup captures the status of HANA replication as well. The widget value of 1 indicates the replication is active.
As part of the SAP cluster resource monitoring the same data is fed to dashboard from both cluster nodes. SAP cluster has four key resources – one stonith to handle split-brain situations, two SAP instance resources, two SAP file mount points, and two overlay IP addresses (one for each ASCS and ERS). The values on the widgets indicates the active resources. A value of 2 shows both over lay IPs, mount points and SAP Instances are active. A value of 1 for stonith implies the stonith is active as part of the cluster.
The availability status of the SAP PAS / AAS shows a number if work process is active. If for any reason the application is down the value trends to 0. The screenshot below shows active work processes, ICM, IGS and gateway services.
SNS Notifications
A CloudWatch Alarm can be set for getting a notification upon failure of any resources. Any resources other than started status triggers a SNS notification. This helps react in time to fix the issue to ensure the system availability.
CloudWatch Alarm
You can create a Cloud Watch Alarm on metrics of interest using us Alarm by navigating to the Alarm setting. The below example shows alarm set on STONITH resource. Each cluster has one STONITH resource and any value below 1 indicates the resource is not active. This triggers an Alarm sending an email notification leveraging the SNS topic.
Any Alarm in service triggers a notification email using SNS topic.
Cost impact
The unplanned downtimes have large revenue impact, based on 2014 blog of Gartner a downtime results in $5600 / minute revenue loss. The monitoring capabilities enhance better reaction times for unplanned downtimes reducing the revenue loss.
Cost Considerations
The cost of the custom metrics and custom dashboard vary by region. The template deploys a custom dashboard along with about 50 custom metrics monitoring resources and services on HANA Cluster, SAP cluster and SAP Primary application server. The cost estimations can be found here for the metrics and dashboard.
Conclusion
The blog provides an overview about one of the methods to monitor status of SAP cluster and HANA cluster resources leveraging Amazon CloudWatch. The solution provides easy deployment using CloudFormation to monitor SAP cluster resources, HANA database clusters resources, HANA replication status, and SAP applications core services. To learn more about Amazon CloudWatch, visit the Amazon CloudWatch documentation and if you would like to discuss the SAP cluster monitoring approach discussed in this blog, do connect with us here.