Automating the update process of a clustered SAP HANA DB using nZDT and Ansible

Introduction

SAP HANA is the defacto database for new SAP deployments and will be the only choice in the near future, with SAP ending general support for all non-HANA based systems by 2027. Patching databases in a consistent and automated manner is key to reduce TCO, especially for customers who operate a large number of HANA instances. Often times, HANA System Replication (HSR) is also enabled between HANA nodes. When combining this with pacemaker clusters, customers achieve HA architecture for their databases. When the database is clustered in this fashion, you can benefit from the nZDT (near-Zero Downtime) method of patching the HANA software. In this blog post, we will explain how to carry out nZDT patching of clustered HANA nodes and also demonstrate a sample Ansible playbook that automates the entire process on Red Hat Enterprise Linux (RHEL) based systems.

The database is nearly always available throughout the patching process, on at least one node. The patching activity, without clusters, is fairly well documented on the SAP help site but, there is only limited information available on how to perform nZDT patching when the HANA nodes are clustered.

HANA cluster pair running on AWS before patching

Figure 1: HANA Cluster with two nodes, using pacemaker

Pre-requisites:

A working HANA HSR cluster pair (an easy and fully automated way to deploy working SAP HANA clusters is to use AWS Launch Wizard – Get HANA installations done in a few hours. For a full explanation, see the Launch Wizard User Guide)
A user with necessary privileges configured under key SRTAKEOVER

Required OS patches (if any) have been applied prior to HANA patching. You can get more information from the following notes.
You need “Red Hat Enterprise Linux for SAP Solutions” (if BYOS) or “Red Hat Enterprise Linux for SAP with High Availability and Update Services” if from AWS Marketplace. For supported operating systems refer to OSS Note 1631106. You may also consult the Red Hat Enterprise Linux for SAP Solutions subscription knowledgebase article.

You will need the root password and it should be the same on both nodes – contact your organization’s Linux admins if you need help with this process.
You will need the SYSTEM account password for the SYSTEMDB and TENANT – contact your organization’s DB admins if you need help with this process.
You have a working Ansible infrastructure available and configured to run playbooks on HANA nodes.
You have the desired HANA patch software package on an S3 bucket or staged on the file system (the automation can source it from both)
Download SAP HANA patch software from SAP Marketplace Software Download Center. (SAP Marketplace account required to access download area)
Amazon Elastic Compute Cloud (EC2) has appropriate Identity and Access Management (IAM) roles to access Amazon Simple Storage Service (S3) bucket in case the patch file is sourced from a bucket.

Procedure

The general process to patch HANA clustered nodes with nZDT method is listed below. For this example, we will assume node 1 is currently the primary and node 2 is the secondary node.

* caution: the sequence of steps is important

Put cluster node 2 in standby mode
By enabling standby mode, the specified node is no longer able to host resources. Any resources currently active on the node will be attempted to be moved to another node, if constraints allow. In this case, the HANA instance is the Secondary role and the only other available cluster node 1 already hosts the Primary Instance, hence no resources are moving to cluster node 1.

Put the cluster into maintenance mode
When putting the entire cluster in to maintenance mode it makes sure that the cluster will not manage any cluster resources. It is essential because during HANA patching the HANA service may be stopped intermittently and the cluster would interfere otherwise.

Update the HANA software on node 2
Patching HANA on the Secondary node. This step is the core part to update the software version for the database.

Figure 2: HANA software patching of Secondary node

Throughout the patching process, the secondary node will be unavailable for certain periods of time. The primary node is operating as usual and serves the SAP application and users, though.

Take node 2 out of standby mode
In this step cluster node 2 is made available to the cluster again and accepts resources.

Turn off maintenance mode for the cluster
When maintenance mode is disabled, the cluster re-establishes the HSR between Primary node and Secondary node, automatically. Note, that SAP does support having Secondary node on higher patch level than the Primary node. The Secondary node syncs up with the Primary node.

Validate that replication has resumed and that it is healthy
At this phase we have to wait until the Secondary is fully sync up again with the Primary node and is ready to take over (status SOK).

Put node 1 in standby mode
In standby mode, the Primary cluster node cannot host any resources anymore, and it triggers the takeover of the Primary HANA role from node 1 to node 2. The cluster will demote the current Primary node and promotes the HANA instance on cluster node 2. The cluster also moves the overlay IP over to cluster node 2 and modifies the route table. This ensures that after the take over SAP and users can continue to connect to the HANA database.

After patching of secondary node, initiate a failover

Figure 3: During takeover process, the secondary node gets promoted

Wait for takeover to complete
Takeover process takes a short time. Before patching node 1 we have to make sure the HANA instance on node 2 has fully assumed the Primary role.

Put the cluster into maintenance mode
To avoid having the cluster interfere with any of the patching process, the entire cluster needs to be put in to maintenance mode.

Patch the HANA software on node 1
At this stage, HANA DB is running on node 2 as Primary and accepts connections from the SAP system as well as from the users. HANA instance on node 1 can now be patched.

Patching of the original primary node that is now demoted to secondary

Figure 4 – Patching original primary node – now demoted to secondary

Take node 1 out of standby
Once patching is completed on node 1, the cluster node can be enabled again by removing it from standby status.

Turn off maintenance mode for the cluster
When the cluster gets out of maintenance mode, it will make sure that cluster node 1 will have HANA instances started. Since currently the HANA instance on cluster node 2 has the Primary role, the HANA instance on cluster node 1 will be started as Secondary role and replication from node 2 to node 1 will be started.

After patching both nodes, the cluster re-establishes HSR

Figure 5: Re-establish HSR between patched HANA nodes

Clear the cluster resources
During maintenance activities some errors or alerts may occur in the cluster framework. They have to be cleaned up to give the cluster a fresh start.

To summarize, during the patching process the database should be accessible at least on one node at any given time with the exception of the brief outage when the takeover occurs. Note: the primary and secondary roles will switch places by the end. This is normal and will not affect the operation of the database. You may switch back to the original topology at a convenient time.

The sample Ansible playbook code, that is automating the entire process, is located in this public github repo.

Preparing to run the Ansible playbook

Download the target HANA patch SAR file from SAP Marketplace and place it into an S3 bucket or somewhere on the file system of the servers. Make sure the bucket or directory does not contain any other files than the one SAR patch file. Below is an example bucket that contains the SAR file for the patch HANA SP05 rev64.

S3 bucket contains a single SAR file for the HANA patch software

Figure 6: S3 bucket with HANA SAR file

Clone the repo to the Ansible controller server

One way to clone a repo is using git clone command – see reference section for git commands. For this, git needs to be installed first – see instructions how to install on Linux.

Change directory to the cloned repo and create an inventory file that has a group in it, named “SAP_<SID>_hana_ha” and add the two HANA node IPs to the group. For example if the HANA SID is “HDB”, HANA node 1 IP is 10.20.30.40, HANA node 2 IP is 10.20.30.50, then the content of the inventory file should be something similar:

[SAP_HDB_hana_ha]
10.20.30.40
10.20.30.50

The automation will need to be aware of various credentials to be able to run HANA patching tool, hdblcm. As a security best practice, passwords should not be plain visible in variable files or any other way. Ansible provides the ansible-vault tool to help to encrypt sensitive information. The HANA patching ansible playbook solution expects a vault file, called “passvault.yml”, that contains the following credentials…

root password                                    – variable name: ROOTPWD
<sid>adm password                          – variable name: SIDADMPWD
SYSTEM @ tenant password             – variable name: SYSTEMTNTPWD
SYSTEM @ SYSTEMDB password     – variable name: SYSTEMDBPWD

When adding these credentials to the “passvault.yml” file, it stores the variable name
and its encrypted value.

For example, to add the encrypted root password to the passvault.yml file run:

ansible-vault encrypt_string ‘theactualpassword’ --name ‘ROOTPWD’ | tee -a passvault.yml

As another example, to add the encrypted password of SYSTEM user in SYSTEM DB to the passvault.yml file run:

ansible-vault encrypt_string ‘somepassword’ --name ‘SYSTEMDBPWD’ | tee -a passvault.yml

When adding the password variables use the same vault password for encryption. Once all the encrypted passwords are added, make sure they start in a new line in the file. The file should look something like this after adding all required passwords.

[root@ip-***-***-***-*** sap-hana-update-cluster-nzdt]# cat passvault.yml
SYSTEMDBPWD: !vault |
$ANSIBLE_VAULT;1.1;AES256
35643964393039616161626666353862653038373434613533313566353134376364643337666539
6436623736323161613031663636356162633362353831620a393365646636653837636633623164
32623365313931303939323532393265613565643965393538393737353436366330636233646330
3265663163636366340a633734306136313839366431616562666162386633353035393764313833
3137
SYSTEMTNTPWD: !vault |
$ANSIBLE_VAULT;1.1;AES256
39613736376436396361383939313637623435326232656163353863383138633230326166666334
3733653737646431373261313434353065646431343037370a333963643230313565653264643831
36393332386266333438326639346539316330303331393464663539653864353834656665346264
3134306666623765640a303733383031376131613438396436393334636166386233633966396432
3232
ROOTPWD: !vault |
$ANSIBLE_VAULT;1.1;AES256
30323033336466663837623064343631376634316534316165316438306635636562633164653266
3331663837303932346237643165353062656165356562300a626363373134663635313962303363
37323531633737656239356438613838343665353530313939356530616631623561623733643236
6138653361316265650a386561366330303832346235383035346363633463663035383131623732
3438
SIDADMPWD: !vault |
$ANSIBLE_VAULT;1.1;AES256
63633035656533316432353435376433393565623066643735326165313137396633323132363730
3562623132356439613533633536323563333738373931650a623964323966386634616466376631
37386564666337626630333338666264666365616263366531306162636539366461386135663263
3334373437643763380a316238373335653636323564363139376562616130396164613932633938
3361
[root@ip-***-***-***-*** sap-hana-update-cluster-nzdt]#

Please, consider cleaning up the bash command history afterwards to avoid traces of credentials. Once the file is set up, carry on with the next steps.

Running the playbook

Switch directory to the cloned repo root and issue the following command:

ansible-playbook -i <inventoryfile> --ask-vault-pass -e "SID=<SID>" -e "MEDIASRC=<s3/fs>" -e "MEDIALOC=<locationofSARfile" ./patch_sap_hana.yml

for example, in case the inventory file is “myinventory”, the HANA DB SID is “HDB”, the MEDIA source is from S3 bucket “s3://hanapatch/” then use the following syntax:

ansible-playbook -i myinventory --ask-vault-pass -e "SID=HDB" -e "MEDIASRC=s3" -e "MEDIALOC=s3://hanapatch/" ./patch_sap_hana.yml

As another example, in case the inventory file is “myinventory”, the HANA DB SID is “HDB”, the MEDIA source is from file system and the SAR file is in /tmp/hanapatch/, then use the following syntax:

ansible-playbook -i myinventory --ask-vault-pass -e "SID=HDB" -e "MEDIASRC=fs" -e "MEDIALOC=/tmp/hanapatch/" ./patch_sap_hana.yml

The automation patches the nodes in the sequence discussed earlier. Please note that at the end of the patching process the roles of the nodes would swap. The original roles of the nodes can be reverted anytime later. To verify that patching worked, you can find the new patched version of each HANA nodes at the tail end of the ansible logs.

Cleanup

The password template file is cleaned up automatically after a successful run of the playbook.
The software is still available after the playbook has been run in case it is needed again. If it is not required anymore it’s recommended to archive or simply delete.

Cost

Besides the cost of the two HANA nodes, the automation may need a small Ansible control node, running on an Amazon EC2 instance.
The HANA Patch software needs to be stored in a S3 bucket. A typical patch file is about 3~4 GB which equates to only a few dollars a year (0.023$ / GB / month).

What’s Next

To learn more about SAP HANA HSR concepts follow the SAP official help documentation.
Find more answers to common questions to HANA HSR in SAP Note 1999880 – FAQ: SAP HANA System Replication.
To learn more about SAP pacemaker clusters for HANA on RHEL read the official SAP HANA on AWS guide.
Take advantage of the AWS Free Tier services to help with learning about using Ansible with SAP on AWS, at a minimal cost. You can set up an EC2 instance using AWS Free Tier, with Amazon Linux 2. This instance can be used to run an Ansible control node.
To learn more about ansible modules and coding techniques, read the official documentation of ansible.

Conclusion

In this example we have discussed how the nZDT patching process looks like for clustered HANA nodes. Also, as an example, we provided an Ansible playbook to demonstrate how to automate the process.

Summary of steps to patch HANA using nZDT

Figure 7: Overall nZDT HANA patching process with high-level steps

Please note that AWS also supports AWS Systems Manager Documents (SSM documents) to automate HANA patching, however, clustered nodes are not yet supported.
For SLES based systems, the concept is the same. Replace the pcs commands with the respective crm equivalents in the code, or use the YAST module to aid the process.

Call to Action

Get started and try to deploy a new HANA cluster using AWS Launch Wizard for SAP. Make sure to familiarize yourself with the online documentation first.
Install ansible on a free tier amazon instance and clone the sample code from our public repo.
Verify the roles of the HANA cluster nodes, and check the HANA DB version before patching.
Prepare the inventory and passvault.yaml files, and launch the patching playbook.
Verify again the roles of the HANA cluster nodes, and check the HANA DB version after patching. Which node is the primary now?

Make sure to remove the resources after testing to avoid unnecessary costs.

References:

Updating SAP HANA Systems with SAP HANA System Replication official SAP pages.

Use SAP HANA System Replication for Near Zero Downtime Upgrades.

AWS Launch Wizard

SLES nZDT patching of HANA cluster with YAST module.

Git Clone command reference.

AWS for SAP