AWS for SAP

SAP HANA sizing considerations for secondary instance with reduced memory footprint

Introduction

SAP systems that are critical to business continuity require a well designed and tested High Availability and Disaster Recovery (HA/DR) framework to maximise application availability during planned and unplanned outages. SAP HANA is an in-memory database that supports mission critical workloads and is a key component of an SAP system that needs to be safeguarded from failures. A primary or active HANA database can be replicated to a secondary instance using SAP HANA System Replication (HSR). SAP HSR continuously replicates data to ensure that in the event of a failure on the primary instance, changes persist in an alternate instance. In Amazon Web Services(AWS), the secondary HANA database instance can either exist within the same region (different Availability Zone) or in a separate region.

To achieve a near zero Recovery Time Objective (RTO), it is necessary for the primary and secondary HANA database instances to have a similar memory capacity. However, if a higher RTO is acceptable, it is possible to operate the secondary HANA database instance with a reduced memory footprint. This can result in considerable cost savings for compute, either by choosing a smaller instance size or leveraging excess memory on the secondary to operate other SAP HANA database workloads.

The reduced memory requirement in the secondary HANA node is achieved by configuring the secondary to not load HANA column data into main memory. A restart is required before promoting it to be primary, at which time it is assumed that the full memory requirement is available. The actual memory demand on secondary host is highly dependent on the HANA System Replication (HSR) configuration and production data change rate. Even by disabling HANA column table loading, the memory demands on the secondary host can go beyond 60% of the actual production HANA memory usage. If the secondary instance is not sized correctly, this can lead to out-of-memory crashes and reduced resilience.

This blog aims at providing a detailed guidance on how to size the secondary HANA database instance while operating with reduced memory footprint with examples for multiple deployment scenarios.

Architecture

There are two common architectures for deploying a secondary HANA database with a lean, or reduced memory footprint in AWS. In this blog we differentiate between them with the terms ‘smaller secondary’ and ‘shared secondary’. Smaller secondary is where the infrastructure is sized smaller than the primary, and then re-sized on takeover. This is sometimes referred to as Pilot Light DR as in the blog Rapidly recover mission-critical systems in a disaster. Shared secondary is where the unused memory is utilised by a non-production or sacrificial instance. SLES documentation refers to this as ‘Cost Optimized Scenario‘.

In both these scenarios, preload of column tables is disabled on the secondary HANA database. The specific configuration to implement this change is to set the HANA database parameter preload_column_tables to false. This configuration needs to be changed to ‘true’ before promoting this instance as primary. Given the manual intervention required and the time taken following a takeover to load the column tables before the HANA database is open for SQL connections, these scenarios are more relevant for a Disaster Recovery (DR) than a High Availability (HA) solution.

Smaller secondary

The following diagram illustrates the deployment of a smaller secondary in a separate Amazon Web Services(AWS) Availability Zone within the same region. However, such a deployment is also possible across multiple AWS regions. When replicating between AWS regions, the recommended replication mode is async for HSR due to increased latency.

HSR Smaller secondary

Shared secondary

A common use case for a shared secondary scenario is to operate an active quality (QAS) instance along with the secondary HANA database instance on the same host, also called MCOS (Multiple Components One System). This setup requires additional storage to operate the additional instance(s). During a takeover, the instance with lower priority (QAS) can be shutdown to make the underlying host resources available for production workloads.

HSR Shared secondary

For both the scenarios, the secondary HANA database instance configuration is set to preload_column_tables = false. The default value of this parameter is ‘true’ and has to be explicitly changed on the secondary instance to operate with reduced memory. This change is made in HANA database configuration file (global.ini) located in the file path (/hana/shared/<SID>/global/hdb/custom/config).

HANA SR Operation modes

The two operation modes supported by HANA System Replication (HSR) to setup a secondary system are logreplay and delta_datashipping. A high-level comparison between logreplay and delta_datashipping operation modes is presented below.

# Criteria                                                         Operation Mode
                     logreplay               delta_datashipping
1. Operational behaviour Logs are shipped and continuously applied on the secondary system. Delta data is shipped from primary to secondary occasionally (default 10 min). Logs are only shipped, not applied until a takeover.
2. Recovery time Shorter takeover window, as logs are applied continuously until the failure occurs Relatively higher takeover time
3. Data transfer needs Only logs are shipped, hence comparatively lower data transfer needs Transfer of both delta data and log shipping adds to higher data transfers
4. Network bandwidth Due to reduced data transfers, needs relatively lower bandwidth between sites Requires higher bandwidth between sites
5. Memory footprint on DR instance Significantly higher memory required as column store needs to be loaded for delta merge speed Memory footprint is much smaller as column store is not loaded into the main memory when column preload is disabled.
6. Multi-tier replication Supported Supported
7. Multi-target replication Supported Not supported
8. Secondary time travel Supported Not supported

It is worth noting that in a 3-Tier replication setup, a mix of operation modes is not supported. For example if the primary and secondary HANA database instances are configured to use logreplay for HSR, then delta_datashipping operation mode cannot be used between the secondary and tertiary systems and vice-versa.

SAP Guidance on disabling column preload

The actual memory usage on the secondary host is dependent on the HANA replication operation mode configuration and the preload setting for column tables. Hence it is important to understand the sizing requirements for different operation modes in a HANA System Replication setup. The below chart is an excerpt from SAP Note : 1999880 – FAQ: SAP HANA System Replication (SAP Support Portal login required)

SAP Note 1999880 - FAQ: SAP HANA System Replication

As per the above guidance, even with preload = off, the minimum memory requirements for operation mode ‘logreplay’ also factors in the column store memory size of tables with modifications . As a rule of thumb, consider the size of column store tables with data modified in the previous 30 days from current date of evaluation.

SAP Note 1969700 – SQL Statement Collection for SAP HANA provides a collection of SQL scripts that can be used for analysing various administration and performance aspects of SAP HANA database. With minor changes to one of the scripts from this collection, titled ‘HANA_Tables_ColumnStore_Columns_LastTouchTime’, it is possible to provide a rough estimate for the minimum memory required to accommodate these tables. For exact steps of executing this script, refer to the ‘Additional information’ section at the end of this blog.

In the following sections, the examples provide details on how to evaluate sizing when deployed in a Shared secondary scenario. Although not shown, a similar sizing exercise is applicable to the Smaller secondary scenario, without the additional considerations of running a second HANA database instance (for example QAS) on the same host.

Option 1 – HSR with Operation Mode: logreplay

In the following example, the primary and secondary production HANA database instances are deployed on two 9TiB High Memory instances. The secondary instance host also hosts a Quality Assurance (QAS) HANA Database Instance. To be able to deploy these instances, the global_allocation_limit for all the three HANA database instances (PRD-Primary, PRD-Secondary, QAS) needs to be evaluated. Following section illustrates how to derive these values.

HSR logreplayIn the subsequent sections, Global Allocation Limit is referred as ‘GAL’.

  • GAL for Primary instance

On the Primary instance, global_allocation_limit is not set in this example (default value=0 and unit is MB). A default value does not limit the amount of physical memory that can be used by the database. However, as per the SAP Note 1681092 – Multiple SAP HANA systems (SIDs) on the same underlying server(s), maximum allocation to HANA database by operating system follows the rule of 90% of the first 64 GB of available physical memory on the host plus 97% of remaining physical memory.

  • GAL for Secondary instance

SAP’s guidance for minimum memory requirements for secondary HANA database instance with preload=off and logreplay operation mode is, ‘Row store size + Column store memory size of tables with modifications + 50 GB’. As an example, on the primary instance, the SQL script ‘HANA_Tables_ColumnStore_Columns_LastTouchTime’ was executed and the column tables with modifications for the previous 30 days is identified as 3077 GB. For detailed steps of executing this script, refer to the ‘Additional information’ section of this blog.

SQL Query output

Similarly, using SQL script ‘HANA_Memory_Overview’ from SAP Note 1969700 – SQL Statement Collection for SAP HANA, the memory requirements for ‘Row store’ can be determined. An example value of 153 GB is considered for the following evaluation.

Memory required (min) = Row store size 
                        + Column store memory size of tables with modifications 
                        + 50 GB
Ex: (153 + 3077 + 50) GB = 3280 GB
GAL for DR = 3280 GB^
  ^Convert to MB before changing GAL in the system 

Note: This value provides the minimum sizing guidance only. It is recommended to allocate additional headroom of 10-20% of this value to accommodate any immediate spike in column store table modifications.

Similarly, global_allocation_limit (GAL) has to be set for the second HANA database instance (QAS) running on the DR site. According to  the SAP Note 1681092 – Multiple SAP HANA systems (SIDs) on the same underlying server(s), the sum total of all HANA database allocation limits on a particular host should not exceed 90% of first 64 GB of available physical memory on the host plus 97% of remaining physical memory.

Σ [(GAL_DR) + (GAL_QAS)] ≤ {(0.9 * 64) + [0.97 * (Phys Mem GB - 64)]}

Applying this guidance to our example of 9 TiB(9896 GB)  bare metal instance is as follows.

Σ [(GAL_DR) + (GAL_QAS)]  ≤  {[0.9 * 64] + [0.97 * (9896 - 64)]} GB
                          ≤  9595 GB

The total available memory for all HANA database instances is 9595 GB, while the total available physical memory on the bare metal instance is 9TiB (9896 GB). As the HANA DR instance memory sizing is derived at 3280 GB, the remaining size can be the maximum allocatable memory for QAS instance.

Σ [ (GAL_DR) + (GAL_QAS)]  ≤ 9595 GB
Σ [ 3280 GB  + (GAL_QAS)]  ≤ 9595 GB
GAL for QAS: The maximum allowed value for GAL_QAS = 6314 GB^
    ^Convert to MB before changing GAL in the system

Option 2 – HSR with Operation Mode: delta_datashipping

This section provides an overview of HANA System Replication deployed with ‘delta_datashipping’ operation mode and the procedure to evaluate the memory sizing for each individual instance.

HSR Delta_datashipping

  • GAL for Secondary instance

SAP’s guidance for minimum memory requirements for secondary HANA database instance with preload=off and delta_datashipping operation mode is as follows. Applying this guidance to our example scenario to derive GAL for DR is as follows

Max [64 GB, (Row store size + 20 GB)]

Applying this guidance to our example scenario to derive GAL for DR is as follows

Max [64 GB, (153 + 20 GB)] 
GAL for DR: The minimum required value for GAL_DR = 173 GB^
    ^Convert to MB before changing GAL in the system 

The total available memory for all HANA database instances is 9595 GB, while the total available physical memory on the bare metal instance is 9TiB (9896 GB). In the previous step, secondary HANA database instance memory sizing is derived at 173 GB, the remaining size on the host can be the maximum allocatable memory for QAS instance.

Σ [ (GAL_DR) + (GAL_QAS)]   ≤  9595 GB
Σ [ 173 GB   + (GAL_QAS)]   ≤  9595 GB
GAL for QAS: The maximum allowed value for GAL_QAS = 9422 GB^
   ^Convert to MB before changing GAL in the system

Conclusion

This blog highlights that the cost savings of deploying a secondary HANA database instance with a reduced memory footprint may not be realised with logreplay operation mode. This is because of the higher memory demands of column store being loaded into the main memory of secondary HANA database and occurs irrespective of the column table preload setting. This behaviour is by design for continuous log replay. In addition, there is a requirement to constantly monitor column store data growth on the primary and adapt memory allocations on the secondary to avoid out-of-memory(OOM) issues on secondary. The operation mode delta_datashipping facilitates better cost savings in comparison to logreplay while disabling the preload of column tables on secondary HANA database instance. However, this choice comes with limitations which may include a higher recovery time and an increased demand for network bandwidth between the replication sites.

Opting for delta_datashipping is usually a trade-off between cost considerations and other limitations of this operation mode as explained in this blog. With relaxed RTO requirements and higher network bandwidth between the replication sites, the use of ‘delta_datashipping’ can be promoted . Also, with larger database instances, the potential for cost savings is higher. This is because the memory footprint on the secondary has a minimum requirement of row store memory and buffer requirements even for smaller database instances. The association between various factors that influence the choice of a particular operation mode and the potential for cost benefits with delta_datashipping option are represented in the following diagram. These factors are independent of each other and can individually influence the ability to use ‘delta_datashipping’ and thereby the cost benefits.Cost benefits with delta_datashipping

It is also important to understand that the memory requirement calculation for the secondary and subsequent setting of global_allocation_limit, is an iterative process. As the production database size grows, eventually the column store demand for delta merge also grows on the secondary instance. Hence, the memory allocations of secondary and optionally, other HANA database instances provisioned on the same hardware should be monitored periodically and also after mass data loads, go-lives and SAP system specific life-cycle events.

Additional Information

Following are the steps to execute the SAP SQL script.

1. Download the file SQLStatements.zip which is an attachment to SAP Note 1969700 – SQL Statement Collection for SAP HANA

2. Unzip the file and identify the SQL script named  HANA_Tables_ColumnStore_Columns_LastTouchTime_2.00.x+.txt (for HANA 1.0, choose a different version of the same script).

3. Modify the script with the following two changes.

TOUCH_TYPE
  Identify the section with title ' /* Modification section */'
  Identify the parameter 'TOUCH_TYPE' and enter the value as 'MODIFY'
Ex: 'MODIFY' TOUCH_TYPE,          /* ACCESS, SELECT, MODIFY */
BEGIN_TIME
  Identify the section with title ' /* Modification section */'
  Identify the parameter 'BEGIN_TIME' and keyin the value as 'C-D30'
Ex: 'C-D30' BEGIN_TIME, 

4. Execute the script on the HANA Production tenant DB using any of the SQL editors of SAP HANA Cockpit, SAP HANA Studio and DBACOCKPIT or the tool hdbsql.

5. Identify the output under the column ‘MEM_GB’ for the column store size that has been modified in the previous 30 days from the date of script execution.