AWS Storage Blog

Replicate and clone SAP HANA databases with Amazon FSx for NetApp ONTAP

Customers have been running SAP HANA databases on AWS since 2012, when we announced SAP HANA One. Customers running SAP HANA on AWS often need to be able to perform routine maintenance tasks, such as creating backups and cloning production database volumes for upgrade and development testing.

Amazon FSx for NetApp ONTAP, with the optional NetApp SnapCenter Plug-In, offers integrated support for application consistent snapshot and cloning capabilities for SAP HANA databases. By leveraging the capabilities of FSx for ONTAP file systems, SAP HANA database administrators can automate, accelerate, and simplify the processes of backing-up, replicating, and cloning their critical production environments. These features of storage-based snapshotting and cloning are in use today by enterprises for development testing, disaster recovery, and system upgrades with virtually no impact to availability or performance of their production SAP HANA databases.

On May 10th, 2022, AWS announced the certification of SAP HANA on AWS with Amazon FSx for NetApp ONTAP. SAP HANA is supported for production workloads running on certain EC2 instances with Amazon FSx for ONTAP Single-AZ file systems.

In this blog post, I describe using FSx for ONTAP Single-AZ file systems to create storage-based snapshots and clones of your critical SAP HANA databases to help enable quicker time-to-market and faster test and development cycles.

Overview

The FSx for ONTAP Single-AZ file system SSD storage tier is certified by SAP for SAP HANA-based workloads. FSx for ONTAP file systems deliver consistent sub-millisecond latency, can scale throughput, IOPS, and SSD tier storage capacity independently, and can take snapshots of any size volume in seconds. Snapshots are point-in-time, efficient, read-only replicas of the storage volumes. Single-AZ FSx for ONTAP file systems are powered by a pair of file servers in a highly available configuration within a single availability zone (figure 1).

FSx for NetApp ONTAP Single Availablity Zone

Figure 1: FSx for NetApp ONTAP Single Availability Zone

It’s important to create and maintain regular backups for mission critical SAP workloads. While traditional, streaming backup methods may be sufficient for many use cases, a benefit of using FSx for ONTAP is that it provides native, built-in, application-consistent snapshot and replication capabilities.

The following diagram (figure 2) shows at a high level how the optional NetApp SnapCenter software coordinates the database snapshot and backup process using a plug-in installed on the SAP HANA database server. The SnapCenter management software is installed on an EC2 instance running a Windows Server OS, and the storage systems and database server snapshots are managed using a simple web interface (figure 3).

FSx for NetApp ONTAP with Snapcenter and Snapmirror replication

Figure 2: FSx for NetApp ONTAP with SnapCenter Replication

NetApp Snapcenter web interface example

Figure 3: SnapCenter Web Interface

Prerequisites

This blog post assumes that the reader understands AWS services, SAP HANA databases, and has familiarity with the FSx for ONTAP file system. Deployment of this solution requires access to the SAP HANA database software from SAP as well as the SnapCenter software from NetApp. Customers should contact their AWS sales representative to learn more if previous agreements are not already in place.

  1. Access to FSx for ONTAP must be NFS v4+
  2. SAP HANA must be deployed on one of the supported EC2 instances
  3. FSx for ONTAP must be running in the same subnet as your SAP HANA database
  4. Although not required, the NetApp SnapCenter Plug-In for SAP HANA is recommended
  5. Some aspects will require access and use of the FSx for ONTAP Command Line Interface (CLI)

Benefits of Running SAP HANA using FSx for ONTAP

Running SAP HANA on FSx for ONTAP file systems brings benefits like being able to automate system backups using the NetApp SnapCenter SAP HANA Plug-In, enabling scaling of storage IOPS, throughput, and capacity discretely, and offering consistent, low latency performance. The FSx for ONTAP file system also offers the ability to rapidly take application consistent, storage-based snapshots. There are multiple use cases for leveraging storage-based snapshots for large in-memory databases, but we can break them down into three key categories by functionality:

  1. Backup, Offsite Copy
  2. Development, test, system upgrade or migration, disaster recovery testing
  3. Checking for logical corruption

Backups differ from the other use cases in that they do not require write access on the secondary file system used to store the off-site copy. All other scenarios require read / write availability of the replicated or cloned data.

Backups

Backups are an important part of any enterprise data protection policy. Traditionally, with SAP HANA deployed on AWS, the backup plan may include a streaming copy of data from the primary database to an Amazon S3 bucket with the AWS Backint Agent service. While this remains an effective strategy, as customer database sizes grow larger, this solution can be both time consuming and resource intensive for the SAP HANA database server.

Conversely, ONTAP snapshots can be created on the storage service directly. This relieves the pressure on the EC2 instance where the SAP HANA database is running, since it no longer needs to coordinate and stream the backup data. When a backup is executed leveraging the plug-in for SAP HANA through NetApp SnapCenter, it initiates a database-consistent save point. Snapshot copies are then created on the underlying volumes of the primary FSx for ONTAP file system in seconds. SnapCenter can then also coordinate the replication of individual snapshots to a secondary file system, which can be either an FSx for ONTAP file system in AWS (figure 4), or a NetApp storage system running on-premises. The snapshots can be decoupled from the primary file system and stored for a longer retention period on the secondary file system using NetApp SnapVault. The secondary file system can be located in AWS in a different availability zone, a different region, or even back to the on-premises customer data center. The SnapCenter software manages the scheduling of the backup and replication jobs, as well as the retention of the snapshot copies on the primary file system. Additionally, it’s recommended to have a supplemental weekly file-based backup in order to execute a block integrity check, which can also be managed by the SnapCenter software.

FSx for NetApp ONTAP Snapvault Backup diagram

Figure 4: FSx for NetApp ONTAP SnapVault Backup

Dev/test and quality assurance (QA) for migrations and DR

Similar to backup, other activities like development and testing also require a copy of the production database. The difference is that for dev and test, the copies need to be actively available for other systems to interact with. Without FSx for ONTAP storage this can introduce the following challenges:

  1. Traditional backup/restore operations on large sized databases can take hours or more, and therefore tests can be delayed while waiting for this process to complete
  2. The backup and restore process used for creating database copies can be resource intensive for both source and target systems
  3. Creating and storing additional production sized copies adds extra storage costs, especially in environments where multiple development and test systems are required

With the earlier backup use-case, individual point-in-time snapshots were replicated and stored at a secondary location, with a longer retention policy, for the purpose of being recalled for disaster recovery. Conversely, in the test and development use-case, the production volumes are either replicated to a secondary file system with all snapshots intact, or cloned in place. Also, for development and testing, the SnapMirror replication may be to the same availability zone, or can even be created locally on the same FSx for ONTAP file system.

The snapshot process happens in seconds, and multiple snapshots can be scheduled throughout the day. Cloning a snapshot for use on a development system also happens in seconds, and can be refreshed at any time by manually taking a new snapshot and creating a new clone. This enables an SAP admin to create point-in-time copies of production data for development and testing faster than via a streaming backup.

SnapMirror replication copies only the changed blocks since the previous replication event (figure 5), but preserves the full snapshot history of the volume. The processing is handled by the storage file system, and removes the overhead of copying the data off of the database server.

Multiple clones of the snapshots can be made on the primary or secondary system for access by several different test and development database servers. The clones are efficient, and share identical blocks on the same file system, reducing the required amount of total storage.

The snapshot, cloning, and SnapMirror replication processes do not interrupt the ongoing data protection policy used for the backup.

FSx for NetApp ONTAP SnapMirror replication diagram

Figure 5: FSx for NetApp ONTAP SnapMirror Replication

Checking for logical corruption

In most cases when a SAP HANA database has encountered corruption, the recommendation is to restore from a known-good backup and apply transaction logs, as needed, to recover completely. However, for situations where an environment has encountered corruption in the live production database, a clone of a snapshot from before the time the corruption occurred may be mounted to another database server to perform logical repairs instead (figure 6).

Logical corruption can be caused by several factors, but doesn’t always result in the complete unavailability of the database, and in certain situations it may be desirable to repair instead of restore the database. For example

  1. If the corruption is understood and limited in scope
  2. If the corruption needs to be studied in detail before reverting
  3. Backups are suspect or unavailable

In this scenario, snapshots on the production file system are used in conjunction with a repair server to fix the logical corruption. Snapshots are a preferable solution to recovering to a full backup in this scenario because of the speed in which they can be mounted and examined.

ONTAP snapshots constitute a full point-in-time copy of all the data needed to perform a recovery. However, in order to mount a writable copy of a snapshot, it must first be cloned. The complete process of both cloning and mounting the volume to the repair system via NFS happens in only seconds. This means that multiple tests to various points-in-time can happen in quick succession. This assists the business and database administrators to locate and fix the source of corruption with minimal disruption to the production environment.

FSx for NetApp ONTAP SAP HANA Repair System Example

Figure 6: FSx for NetApp ONTAP SAP HANA Repair System

Considerations

For optimal performance and data efficiency, the ONTAP volumes should be laid out such that each key component is contained within its own Storage Virtual Machine (SVM). This is done to prevent throughput bottlenecks from the interfaces on the SAP HANA database server (figure 7).

FSx for NetApp ONTAP Storage Layout for SAP HANA

Figure 7: FSx for NetApp ONTAP Storage Layout for SAP HANA

ONTAP snapshots are taken at a volume level and store data alongside the production data. Each volume keeps its own set of snapshots based on the default policy used at creation. Because these snapshots are not application aware, they should be disabled in favor of snapshots created by the SnapCenter plug-in for SAP HANA. To perform this task from the FSx for ONTAP file system command line interface:

  1. Disable the default snapshot schedule on all volumes
    FsxId01234567890abc::> vol modify -volume <vol_name> -vserver <svm_name> -snapshot-policy none

Additionally, both storage efficiency and storage tiering are not supported by SAP HANA and need to be disabled. These options may be selected during volume creation, or these commands can be executed from the FSx for ONTAP file system command line interface:

  1. Disable storage efficiency
    FsxId01234567890abc::> volume efficiency off -volume <vol_name> -vserver <svm_name>
  2. Disable storage tiering
    FsxId01234567890abc::> vol modify -volume <vol_name> -vserver <svm_name> -tiering-policy none

Finally, it is recommended to set the TCP max transfer size on the FSx for ONTAP file system to 262,144 for optimal performance. This is an advanced command only available through the ONTAP command line interface:

  1. Set the TCP max transfer size
FsxId01234567890abc::> set advanced

Warning: These advanced commands are potentially dangerous; use them only when directed to do so by NetApp personnel.
Do you want to continue? {y|n}: y

FsxId01234567890abc::*> vserver nfs modify -vserver <svm_name> -tcp-max-xfer-size 262144

Warning: Setting "-tcp-max-xfer-size" to a value greater than the configured TCP transfer size could affect the performance for existing
         connections. Contact technical support for guidance.
Do you want to continue? {y|n}: y
FsxId01234567890abc::*> set admin
FsxId01234567890abc::>

For additional information regarding deploying SAP HANA on FSx for ONTAP file systems, please review the SAP HANA on AWS documentation, the NetApp TR-4667 – SAP HANA System Copy and Clone technical report, and the SAP HANA backup and recovery with SnapCenter user guide.

Conclusion

In this blog post, I covered the benefits of using FSx for NetApp ONTAP Single-AZ file systems with the SnapCenter Plug-In for SAP HANA. These include protecting and replicating your critical SAP HANA database application, as well as rapidly recovering in case of disaster or logical corruption. You can now deploy and manage both production and replicated copies of your data stored on fast and efficient ONTAP volumes.

For more information please refer to the following user guide pages:

  1. FSx for ONTAP
  2. SAP on AWS

Thank you for reading this post. For feedback or questions about SAP on AWS please contact the SAP on AWS Team or visit aws.com/sap to learn more. For further information about Amazon FSx for NetApp ONTAP, please visit the product page.

Jay Horne

Jay Horne

Jay Horne is the global technical leader and service aligned solutions architect for the Amazon FSx for NetApp ONTAP service in the World-Wide Specialist Organization at AWS. Based in Nashville, Tennessee, Jay has over 15 years of enterprise consulting experience working on a variety of cloud, storage, server, and network infrastructures. You can frequently find Jay presenting at storage and cloud conferences all over the world.