End-to-End Observability for SAP on AWS: Part 2 – SAP Network Latency Monitoring

Introduction

This is part 2 of a blog series for End-to-End SAP Observability on AWS. Part 1 can be found here.

Network performance is critical to enterprise applications like SAP which usually runs a company’s business processes. In the SAP blog “Network Performance Analysis for SAP Netweaver ABAP”, it explained how SAP’s three tier software architecture (i.e. presentation tier, application tier and database tier) communicates with each other using a combination of protocols and APIs.

Slow network performance will cause significant delays in the processing of data and lead to slow response time for the user. Network traffic congestion will cause dropped packets and even complete failures in communication between tiers, which can cause data loss or corruption, and leads to serious consequences for the SAP application and its users. A typical execution of a fast database statement is in the range of 100 microseconds, and a typical network time for round trip is around 300 microseconds, thus network time easily takes 75% of the total database response time.

Given the importance of network performance , we will discuss how to optimise and monitor SAP network performance within AWS Cloud. Let’s start with the network performance between the SAP servers (as per the highlighted red box in Figure 1).

Network Performance between SAP servers

SAP is a client and server enterprise application, which consists of multiple components such as the SAP application server(s) where the user logs in to execute their day-to-day activities, and the database which stores the SAP data. The diagram below highlights how SAP users on the customer network (on-premises), access the SAP application running on AWS Cloud through a WAN (Wide Area Network) connection such as a site-to-site VPN or a dedicated link known as AWS Direct Connect.

Figure 1 : Example architecture diagram showing on-premises connectivity to SAP applications running on AWS Cloud

AWS Global Infrastructure for Mission-Critical Workloads

Before we dive deeper, let’s first understand the AWS Global Infrastructure. AWS has the concept of a region, which is a physical location around the world where we cluster data centers. We call each group of logical data centers as an Availability Zone (AZ). Each AWS Region consists of a minimum of three, isolated, and physically separate AZs within a geographic area.

Unlike other cloud providers, who often define a region as a single data center, the multiple AZ design of every AWS region offers advantages for customers. Each AZ has independent power, cooling, and physical security and is connected via redundant, ultra low latency networks. If a workload is partitioned across AZs, customers are better isolated and protected from issues such as power outages, lightning strikes, tornadoes, earthquakes, and more. AZs are physically separated by a meaningful distance, many kilometers, from any other AZs, although all are within 100 km of each other. AWS customers who require high availability can design their applications to run in multiple AZs to achieve even greater fault-tolerance. On top of that, AWS infrastructure Regions meet the highest levels of security, compliance, and data protection as well.

The AWS Region and Availability Zone model has been recognised by Gartner as the recommended approach for running enterprise applications that require high availability.

SAP Network Latency requirements

In order to have optimum performance, SAP has outlined its network recommendation :

Network latency between the SAP application server and database server to be less than 0.7 milliseconds (ms), as per SAP Note 1100926
Network latency for HANA system replication with synchronous data replication (which is required to achieve zero data loss) to be less than 1 ms

SAP provides the NIPING tool to measure the health of the network infrastructure used by the local SAP system. NIPING runs as a client/server tool and can be used to measure the network latency and throughput.

Based on the above, let’s look at network latency between AZs (Inter-AZ), and within the same AZ (Intra-AZ) at the AWS North Virginia region (us-east-1) with 6 AZs. We have created 6 EC2 instances, one in each AZ, and then we execute the test using NIPING, with AZ1 as the baseline.

Figure 2 : EC2 instances to measure network latency using NIPING

The NIPING results show the Inter-AZ network latency between AZ1 to AZ2-AZ6 relative to the 0.7 ms threshold. For the Intra-AZ network latency within AZ1, it is also below the 0.7ms threshold.

Figure 3 : Network latency measured using NIPING in AWS North Virginia (us-east-1) region as of July 2023

As per above, NIPING tool can be used to monitor and measure the inter-AZ and intra-AZ network latency. However it requires several EC2 instances to run – this is where you can use AWS Network Manager as an alternative for this measurement.

AWS Network Manager – Infrastructure Performance

AWS Network Manager provides tools and features to help you manage and monitor your network on AWS. Network Manager makes it easier to perform connectivity management, network monitoring and troubleshooting, IP management, and network security and governance. In particular, we would like to focus on AWS Network Manager – Infrastructure Performance.

Infrastructure Performance allows you to monitor the AWS global network performance in both near real-time and historical network latency, across AWS Regions and across or within AZs for a specified time period. You can monitor the network latency in up to 5-minute intervals, as well as view the 45-day historical trend. In addition, you can also publish these latency metrics to Amazon CloudWatch, for further monitoring, analysis and alerting. This can help you to easily evaluate whether network performance might affect SAP or other running applications. There is also no cost when using Infrastructure Performance, and no EC2 instances to provision either!

Using the AWS North Virginia region (us-east-1), let’s look at network latency analysis between AZs (Inter-AZ), and within the same AZ (Intra-AZ).

From the AWS console, navigate to Network Manager, and select Infrastructure Performance :

Figure 4 : AWS Network Manager – Infrastructure Performance service

Setting up network latency monitoring is quick and easy. Select “Inter-Availability Zone” and in the example below using AZ1 as baseline, select the other 5 AZs :

Figure 5 : Selecting “Inter-Availability Zone and the respective us-east-1 AZs to be monitored

Next, select the appropriate time period (1 week in this example) and monitoring frequency (5 minutes) :

Figure 6 : Selecting the monitoring time period

The Inter-AZ latency from AZ1 (use1-az1) to the other 5 AZs is consistent with the the NIPING results above :

Figure 7 : Inter-AZ network latency monitoring

Similarly, we repeat the Intra-AZ network latency test within all 6 Availability Zones, showing an average network latency of under 0.3ms, well under the 0.7ms threshold :

Figure 8 : Intra-AZ network latency monitoring

In the example above, we used AZ1 as baseline. Repeat the latency tests above between other AZ pairs to determine which AZ’s meet the recommendation using AZ2 as baseline, then AZ3, etc.

What about Inter-Region network latency? Let’s compare AWS North Virginia region (us-east-1) with other US-based AWS regions; Ohio (us-east-2), N. California (us-west-1) and Oregon (us-west-2).

Figure 9 : Inter-Region network latency monitoring

In this case, we observe network latency between us-east-1 (North Virginia) and us-east-2 (Ohio) being the lowest, given the smaller geographical distance.

Monitoring AZ latency via Amazon CloudWatch

Follow the steps as per AWS documentation: Creating a CloudWatch dashboard to create a dashboard for Inter-AZ latency monitoring, by subscribing to CloudWatch metric “AggregateAWSNetworkPerformance”. You can also create and send alerts when the network latency exceeds certain threshold values. Note that even though AWS Network Manager – Infrastructure Performance is free to use, there is also a cost associated with CloudWatch monitoring; please refer to Amazon CloudWatch pricing.

Figure 10 : CloudWatch dashboard for Inter-AZ monitoring

Note that Infrastructure Performance does not incorporate performance metrics for paths through VPC networking resources, such as transit gateways, NAT gateways, VPC endpoints, Elastic Load Balancing, or Amazon EC2 network interfaces. The network latency observed from the SAP application is also not taken into consideration by Infrastructure Performance, as there would the SAP application / Operating System / Database overhead on top. For further details, please refer to the Infrastructure Performance AWS documentation.

Architecting SAP for Reliability and Availability

If you have multiple servers (i.e. multiple SAP Application Servers, SAP Web Dispatchers), we recommend spreading these servers across AZs to increase reliability and availability, as this outweighs the impact of additional network latency. If you are architecting SAP with high availability (i.e. database replication), select the AZ pairs with lower network latencies than others to ensure optimal performance. Note that the network latencies could change over time, varying per region and AZ pair.

In cases where you have batch jobs with extreme performance requirements, then we recommend scheduling these batch jobs in the SAP Application Server(s) located within the same AZ as the database server.

If you have cross-region disaster recovery (DR) requirements, consider the network latency between AWS regions and the corresponding geographical distance , to determine the secondary region for DR.

For further details, refer to the SAP lens AWS Well-Architected Framework documentation.

In RISE with SAP, AWS is the first cloud provider to support both Short-Range DR and Long-Range DR in all AWS regions. Short-Range DR provides DR within a single region across 2 AZs, and a Recovery Point Objective (RPO) of zero i.e. no data loss, a testimony SAP’s confidence on AWS’ multi-AZ infrastructure coupled with high-speed, low latency network links between the geographically separated AZs supporting synchronous data replication.

Long-Range DR consists of DR across 2 AWS regions, with an RPO of 30 minutes. Customers have the option of selecting either DR options, or combining both Short-Range & Long-Range DR as part of the SAP RISE construct.

Network performance between users (on-premises) and AWS Cloud

Figure 11 : WAN connectivity between on-premises and AWS

The WAN connection between on-premises and AWS Cloud is critical in ensuring customers are able to access the SAP systems. Monitoring is an important part of maintaining the reliability, availability, and performance of your WAN connection.

Amazon CloudWatch provides monitoring for both site-to-site VPN connections and AWS Direct Connect. For site-to-site VPN, we recommend monitoring at least the state of the VPN tunnel, the inbound and outbound data transferred. Refer to AWS documentation: Monitoring your Site-to-Site VPN connection and Monitoring Direct Connect with CloudWatch for details on the available monitoring metrics.

RISE with SAP

If you are running SAP RISE on AWS, you will still require a WAN connection to connect your users to AWS Cloud.

In this case, you can use AWS Transit Gateway to connect your AWS account to the AWS account managed by SAP RISE, as per the diagram below. For further details and connectivity options, refer to AWS documentation: SAP RISE on AWS Connectivity.

Figure 12 : Example WAN connectivity for SAP RISE on AWS customers

Reducing Network Latency via AWS Global Accelerator

What if you have geographically dispersed, remote users trying to access the SAP system? For example, users in Europe accessing the SAP system in the AWS Singapore region via the Internet, could lead to unpredictable network performance leading to poor user experience.

One option is to use the AWS Global Accelerator, which improves the availability and performance of applications by directing traffic through AWS’ global network backbone. When users connect to AWS Global Accelerator, traffic is automatically routed to the optimal AWS endpoint based on the lowest network latency. You can additionally encrypt the data in transit via Accelerated Site-to-Site VPN which also uses the AWS Global Accelerator, to provide optimal security and network performance for your users.

Figure 13 : Remote users accessing SAP via the AWS Accelerated Site-to-Site VPN

To determine if your remote SAP users could benefit from AWS Global Accelerator, you can use the AWS Global Accelerator Speed Comparison tool, which provides network latency measurements to various AWS regions. You can run this latency test from your SAP users’ location as AWS Global Accelerator would route the network traffic to the closest AWS endpoint, and compare it against the existing Internet connectivity. Note that results may differ when you run the test multiple times. Download times can vary based on factors that are external to Global Accelerator, such as the quality, capacity, and distance of the connection in the last-mile network that you’re using.

Figure 14 : Network latency comparison with and without AWS Global Accelerator

Improving SAP Fiori performance using Amazon CloudFront

Amazon CloudFront is a Content Delivery Network service that can deliver web content such as SAP Fiori, with low latency and high transfer speeds by caching the content through a global network of edge locations. When a user requests content from SAP Fiori cached by CloudFront, CloudFront routes the request to the nearest edge location that can serve the request. If the content is already cached at the edge location, CloudFront delivers the content directly to the user from the edge location, which reduces latency and improves the user experience.

If you run a global SAP system with SAP Fiori, and the system is used by geographically, dispersed users, then you may benefit from using Amazon CloudFront.

For a detailed comparison between AWS Global Accelerator and Amazon CloudFront , refer to this AWS blog.

Conclusion

Network performance tuning and monitoring can be time consuming. With AWS Network Manager – Infrastructure Performance, you can easily monitor network latency of AWS global infrastructure; across regions, within or across AZs. With this information, you can optimise placement of your SAP applications to extract maximum network performance on AWS Cloud, continuously monitor your mission critical SAP workloads, and perform troubleshooting when issues arise.

Amazon CloudWatch allows you to have a single-pane-of-glass monitoring for the networking components critical to SAP. Combining AWS Network Manager and Amazon CloudWatch allows you to monitor the Inter-AZ, Intra-AZ network latency and also your WAN connectivity connecting your customer network to AWS Cloud.

To improve end user connectivity into AWS Cloud, you can look at AWS Global Accelerator or Amazon CloudFront, which improves the availability and performance of applications via routing and caching mechanisms respectively.

You can find out more about SAP on AWS, Network Manager – Infrastructure Performance, Amazon CloudWatch, AWS Global Accelerator, and Amazon CloudFront from the AWS product documentation.

Credits

We would like to thank Derek Ewell and Spencer Martenson for their contribution to this blog.

AWS for SAP