How to use snapshots for SAP HANA database to create an automated recovery procedure
Many customers are looking into SAP migrations to the cloud. For each migration, every customer must define the appropriate architecture in the cloud. The defined service level agreements (SLAs) must be fulfilled, and the implemented procedures should fit to the operational processes.
In this blog post, we describe a cloud native approach to demonstrate the power and capabilities of AWS. There are still good reasons for HANA System Replication (HSR) or third-party cluster software to build productive systems in cloud environments. However, we focus on an alternative approach by using cloud native features, such as Amazon EC2 Auto Scaling and Amazon Elastic Block Store (EBS) Snapshots. With these features we build an infrastructure with native backup/restore functionality, automated processes, and the focus of low costs, for non-critical SAP applications.
A fast and automated restore process can provide new capabilities on cloud environments. With On-Demand Instances, instances can be provisioned when needed and can increase the availability of the SAP system. Relaying on an automated restore process removes costs of standby instances, regardless of whether they were implemented with the pilot light approach or as hot standby. Furthermore, no additional license costs for third-party software are required.
Without standby resources for the HANA database, the challenge is to create a robust and highly available architecture across multiple Availability Zones. We should make sure that the restore process can be triggered in any Availability Zone within the Region.
Amazon EBS Snapshots are the foundation of the described architecture. Snapshots provide a fast backup process, independent of the database size. They are stored in Amazon Simple Storage Service (S3) and replicated across Availability Zones automatically, meaning we can create a new volume out of a snapshot in another Availability Zone. In addition, Amazon EBS snapshots are incremental by default. Only the delta changes are stored since the last snapshot. To create a resilient high available architecture, automation is key. All steps to recover the database must be automated in case something fails.
The following picture shows the high-level architecture:
Set up Amazon EBS Snapshots and SAP HANA database integration
To get started, we use the SAP HANA on AWS Quick Start with the following modifications:
- The /backup is based on EBS st1 volume by default. We replace this with /backup_efs, an nfs share provided by Amazon Elastic File System (Amazon EFS).
- The HANA configuration needs to be adjusted (Parameter:
basepath_catalogbackup). The backups are written directly to Amazon EFS, which is replicated across Availability Zones. Backups, especially log backups, are now securely stored and still available even if an issue affects an Availability Zone.
- In addition, the Amazon EFS Infrequent Access (EFS IA) option is activated. This automatically stores data that is not accessed within seven, 14, 30, 60 or 90 days. This saves up to 92% of the costs compared to standard Amazon EFS with the recent Amazon EFS price reduction for Infrequent Access storage. HANA log backup data is a perfect use-case for EFS IA because accessing the data is only needed during the recovery process.
Log backups are written automatically to /backup_efs. By default, SAP HANA triggers a log backup every 15 minutes. We can decrease this value to reduce the recovery point objective (RPO) even further.
Now that the log files are available and securely stored across multiple Availability Zones, we can configure the full database backup with snapshots.
HANA snapshot script
In the script “aws-sap-hana-snapshot.sh,” the following commands are implemented. The following code examples will explain the most important commands:
- We must make the database aware of the storage snapshot, so an entry into the HANA backup catalog is required. The snapshot script automatically adds an entry before the EBS snapshot is executed with the following SQL command:
To log on to the HANA database, we store the password of the system user in the HANA hdbuserstore.
- Now that the database is aware of the storage snapshot, a roll-forward of the database is possible after the restore. To trigger a snapshot of the EBS volumes, we use the snapshot feature to create point-in-time and crash-consistent snapshots across multiple EBS volumes. The advantage of this feature is that no manual Input/Output (I/O) freeze on a volume level is required to bring multiple volumes in sync. Before this feature was available, the I/O freeze for data and log volumes had to be implemented by
dmsetup suspend <lvm-group-name>.The following code snippet is used to execute the snapshot of all volumes, except the root volume.
- After the snapshot execution, we confirm the backup in the SAP HANA backup catalog:
If the snapshot was not successful according to the SAP HANA backup catalog, the snapshots on AWS get deleted.
Now we have a backup process in place with full backups via the AWS snapshots and log backups written to EFS. Both storage locations are independent of the Availability Zone and can be accessed from another Availability Zone. Because of that, we can re-create the entire database with the AWS Auto Scaling group in another Availability Zone later on.
Set up AWS Auto Scaling
We will now set up an AWS Auto Scaling group with a minimum and maximum capacity of one instance. In case the Amazon Elastic Compute Cloud (Amazon EC2) instance has an issue, such as a hardware failure, the AWS Auto Scaling group automatically creates a new instance based on an Amazon Machine Image (AMI). By selecting multiple Availability Zones, the desired capacity is distributed across these Availability Zones.
The following section describes the automated restore process:
- Create AMI, which is used by the AWS Auto Scaling group later on.
We need to delete all volumes that belong to /hana/data and /hana/log. These volumes are recreated out of the snapshots automatically and must not be included in the AMI.
- Create launch configuration.
- Within the launch configuration, we select the recently created AMI as a basis and the required instance size.
- To start the restore procedure after a system crash, the userdata-hana-restore.sh script is stored in the Amazon EC2 user data and executed during the initial launch of the Amazon EC2 instance. New volumes of the most recent snapshot are created. These new volumes are attached to the Amazon EC2 instance, and we execute a roll-forward of the database logs to the most recent state. The script can be added under advanced details into the user data section.
- Once we create the launch configuration, we can set up the AWS Auto Scaling group.
- The group size is one instance. We select all available subnets where the new instance can be deployed.
- We keep the group size at its initial size.
HANA restore script
Let’s have a closer look at the restore.sh script and each step during the restore process.
As a prerequisite, two parameters in the AWS Parameter Store are required. The AWS Parameter Store lists the volume ids for SAP HANA data and log volumes.
- To create the parameters with command line interface, we use the commands:
This step is required only once during the setup to create the parameters in the AWS Parameter Store. The values get updated later on automatically by the script.
- To create new volumes out of the latest snapshot, the script is looking for the latest snapshot created by the snapshot script and the snapshot-id.
The restore process should use the latest snapshot to reduce the number of log files to recover. The AMI was created without SAP HANA data and log volumes, and these volumes must be created out of the EBS snapshot.
- Create a new volume out of the snapshot.
We recommend to delay the procedure and wait until the volumes are available, before attaching them to the instance.
- Attach volumes to instance:
If we would start the database now, it would have a crash consistent state, based on the time the snapshot was taken. To recover the database to the most recent state, we can use the log files stored on EFS. The AMI automatically mounts the EFS file system during startup.
- To recover SAP HANA to the most recent state, we must trigger a point-in-time recovery. This indicates that the recovery is based on a snapshot and with a timestamp in the future.
It is important to consider the prerequisites for the snapshot script. Especially to disable the automatic start of the HANA tenant database, after the DB instance start. If the tenant is online, it is no longer possible to recover log files. With the no restart option, we can prevent the automatic tenant start.
SQL command to set no-restart mode:
The restore time depends on two main aspects: 1) time to create new volumes from the snapshot and 2) the number of log files to recover. The volume creation depends on the volume size and the volume type. With Amazon EBS Fast Snapshot Restore, it is possible to reduce the time to initialize newly created volumes. The database recovery process depends on the number of log files and change rate of the database since the snapshot was created.
Additional things to consider
The IP of the newly created instance will change after the AWS Auto Scaling Group creates a new instance. The SAP application server must be aware of this. It is possible to change the DNS entry in Amazon Route 53 and update the IP. In addition, AWS Auto Scaling group can launch the new instance in another Availability Zone. With this configuration, the application server might remain in another Availability Zone and cross Availability Zone traffic with slightly higher latency gets created.
It is possible to build a highly available architecture for SAP HANA with automatic recovery to the most recent state, across different Availability Zones at low costs. Amazon EC2 Spot Instances can further reduce the costs. If a Spot Instance is revoked, AWS Auto Scaling automatically recovers the database for non-critical systems. The automatic restore process does not require any active standby resources. There are tradeoffs for even higher availability or productive workloads, which can be achieved with HANA system replication.