AWS Storage Blog

Optimizing electronic health care records at scale with Amazon FSx for NetApp ONTAP

Electronic Healthcare Records (EHR) applications are approaching a 40 billion dollar market size with a high compound annual growth rate. While continuing to focus on enabling innovative healthcare, EHR consumers can benefit from adopting cloud-based approaches that reduce operational burden, management overhead, reduce capital outlay, and total cost of ownership. EHR deployments are complex in nature, made up of many interconnected applications and sub-environments, each having its own set of storage and performance requirements. The EHR’s central production database performance can become a limiting factor either on premises or in the cloud. Most often, storage environments of the production database are built with not only for today’s requirements, but for three to five years of growth.

Amazon FSx for NetApp ONTAP and Amazon Elastic Block Store (EBS) can address the full range or EHR storage requirements healthcare organizations encounter on their journey to cloud adoption.

In this blog, you can learn how EHR environments can optimally scale storage performance elastically, elegantly, and non-disruptively, while only paying for the performance used, which allows health care organizations to control storage costs even in periods of unpredictable growth. First, I examine an FSx for NetApp ONTAP architecture that can scale up cloud-based read-only copies of an EHR environment in the event of ransomware or disaster. Then I look at a highly available, disaster recovery capable production EHR cloud-based FSx for NetApp ONTAP environment that scales on demand.

An opportunity of scale

As patient load grows and health care organizations merge operations, so does the demand for compute, network, and storage performance to service these workloads. In response, AWS recently announced increased Epic performance scalability, leveraging advanced new AWS instances and Elastic Block Service (EBS) io2 block express, which is more than sufficient for the majority of health care organization’s current EHR deployments. However, healthcare organizations often experience unplanned growth due to mergers, acquisitions, or unprecedented growth. To meet this challenge, this blog will address an optimal method to scale the EHR storage environment while controlling storage cost.

Last year, in a storage blog, I introduced a parallel storage FSx NetApp architecture that scales traditional block-based database workloads. Today, I cover how you can use this same approach to reach unprecedented levels of EHR related storage performance. In fact, since the publication of the blog, and the introduction of newer, faster Amazon EC2 instances, we are now reaching up to 2 million 8K random sub-millisecond IOPs to a single server.

The reader is cautioned however, not to mistake raw, repeatable, sustained storage benchmark performance with application layer storage performance metrics. As in any highly integrated application, EHR application deployments will reach only some fraction of that storage performance when running without affecting overall application performance. In practical experience, this factor can be in the range of 40-60% – thus storage environments capable of 2 million 8k IOPs may see a practical application limit of approximately 1 million IOPs driven through the entire application stack to the production copy of the data base. Other associated workflows can leverage the remaining storage performance headroom provided by the AWS storage layer.

All this means storage scalability is a highly desirable. Unlike capitalized storage assets located on-premise, cloud deployments can scale storage performance just in time by a factor or 10 or more, with no operational changes or disruption, while only paying for the current desired performance level; here is no up-front capital sunk cost. This ability to scale gives organizations confidence they can economically meet current and future EHR production storage requirements. Organizations can grow storage performance slowly over time, or scale up within minutes during a disaster from a pilot light level to a full production workload. Lastly, the ability to scale storage performance up and down in near real-time allow surge capabilities for acquisition/mergers, or force majeure events such as ransomware.

Toward a scalable EHR storage environment

Historically most applications preferred vertical storage silos, thereby making sure of storage IO consistency at specific points in time through the use of array-based snapshots. Consistency across storage arrays was not supported. If a workload or entire environment did not fit on a single array, then the application or middleware layer had to coordinate points of consistency across multiple arrays for snapshots and backups. This issue has long been a practical barrier to adopting a more modern parallel storage approach. As of ONTAP version 9.1.1, Amazon FSx for NetApp ONTAP supports a cross-cluster (two-phase) consistency group. By resolving this issue, you can now scale applications across multiple instances of ONTAP while keeping storage consistency. This allows a massive increase in scalability while keeping consistency, instant space efficient clones, replication, and a host of other ONTAP features while reaching over 2.5 million 8K random IOPs and 64 GB/s, as shown in the following figure.

Scaled, IO consistent aggregate performance

Figure 1: Scaled, IO consistent aggregate performance

When the production EHR database and other surrounding environments do not need 2.5 million IOPs, the throughput and IOPs of each of the 16 FSx for ONTAP services can be set to lower levels, which drastically reduce cost. Each FSx for ONTAP by default gets three IOPs per GB of SSD storage, or up to 160,000 IOPs independent of capacity if you provision it. Higher levels of read IOPs can be observed for data that happens to reside in the DRAM caches in each FSx NetApp. IOPs can be dynamically changed either up or down. Likewise, each FSx for ONTAP can be dynamically configured from 128 MB/s to 4GB/s. You control your cost by the amount of performance you provision. For a given aggregate performance and capacity level there is no additional charge when utilizing multiple FSx for ONTAP, we can enjoy extreme scalability without a cost penalty. This cost model is vastly different than on on-premises deployment, where no cost advantage is gained by going parallel.

Now that I have covered the dynamic nature of performance shaping, I’ll return to our two example types of EHR deployments to demonstrate optimizing cost while scaling performance.

Cloud-based EHR read only copy

When organizations that run their EHR on premises need additional protection due to disaster, bad actors, or ransomware, a cloud-based read only copy of EHR assets can recover more quickly than an off-site backup, while also enabling your organization to leverage other advanced cloud services. There are multiple methods in EHR environments to create a read only copy in the cloud. The following figure shows how application and/or database layer replication be used.

Read only copy through EHR replication

Figure 2: Read only copy through EHR replication

The following figure illustrates how to leverage on-premises NetApp filer(s) in conjunction with FSx for ONTAP, a SnapMirror replication assist. As always, when deploying any healthcare application in AWS, consider utilizing Landing Zone for Healthcare, which is reliably architected and aligns with AWS best practices and is in conformance with multiple, global compliance frameworks in healthcare workloads and complex compliance requirements. Choose a read-only deployment architecture that works best for you depending on your on-premises storage environment, recovery goals, and plans.

Read only copy through NetApp SnapMirror

Figure 3: Read only copy through NetApp SnapMirror

In either scenario, note that during normal operation, the FSx for ONTAP in parallel can be set for a much lower aggregate IOPs and throughput capability than needed during a ransomware event or other surge use case. This controls cost while allowing seamless surge to higher performance on demand.

Cloud based production EHR environment

FSx for ONTAP can also be used as the foundational element in a complete EHR production environment running in AWS. When compared to on-premises EHR environments, it is possible to eliminate hardware refresh cycles and lower total cost of ownership (TCO) while providing non-disruptive pay-as-you-go scalability. Healthcare organizations that experience unplanned growth, mergers, acquisitions, or surge requirements can easily use the scalable nature of FSx for ONTAP on AWS to control costs, while simultaneously enabling interoperability for a myriad of advanced cloud-based healthcare applications.

A key component in any production EHR environment is a highly available architecture capable of disaster recovery. The AWS EHR FSx for ONTAP reference architecture achieves this by utilizing multiple Availability Zones (AZs) (for high availability) and multiple AWS Regions (for disaster recovery), as seen in the following figure.

Figure 4 Highly available EHR production with disaster recovery capability

Figure 4: Highly available EHR production with disaster recovery capability

In the production Region, the database, reporting, test and development, and other integrated surrounding applications run in one AZ. In the unlikely probability that the entire AZ becomes unavailable, processing can failover to the second AZ, as FSx for ONTAP is configured as a multi-AZ service and has two synchronous independent copies of the data (not shown in the diagram, one copy in each AZ). This configuration greatly simplifies operations while allowing a complete failure of an AZ without declaring a disaster.

If the entire AWS Region becomes unavailable, then a disaster is declared, and processing is restarted in the secondary region. The preceding figure shows two methods for replication, and either or both may be used to achieve the desired RPO/RTO. Storage layer replication (performed by FSx for ONTAP) is accomplished by SnapMirror, while database level replication is performed by the application stack. Note that there are two synchronous copies in the secondary Region maintained by FSx for ONTAP as well, for a total of four copies of the replicated data. To reduce costs, the FSx for ONTAP in the secondary Region can be configured for a much lower performance level, and increased only in a disaster, or on demand for testing. For example, during normal operations, perhaps the aggregate performance of the parallel FSx for ONTAP might be configured for 300,000 IOPs and 8 GB/s.

Single database server storage performance

Since the introduction of AWS’s 200 Gbit capable instances, it has been possible to reach up to 20 GB/s and over 2 million 8k IOPs of aggregated FSx for ONTAP performance through the client network to a single instance. These instances are also capable of up to 8 GB/s and 350,000 IOPs through the EBS optimized network, and in some cases, have locally attached NVMe known as EBS instance stores. As a result, the combined storage performance of these instances can exceed 30 GB/s. However, storage deployments needing extreme scale should utilize FSx for ONTAP block devices for the database, and EBS volumes for the temporary databases. This means the practical limit is 20 GB/s for tablespace operations. Using EBS instance stores for temporary databases leverages storage directly attached to the instance, which has lower latency where performance is additive to both optimized EBS volumes and FSx for ONTAP block devices. Confining the production database to the FSx for ONTAP block devices enables FSx NetApp Snapshots and FlexClones use, which drastically lowers capacity costs and avoids any issues of consistency between Amazon EBS and FSx for ONTAP based snapshots.

Aggregate storage performance analysis

Consider a production EHR database reaching 1 million 8k IOPs at peak, which consumes roughly 10 GB/s from a single 200 Gbit capable client. By utilizing a 16 FSx for ONTAP wide environment, the aggregate read capability is over 64 GB/s, and 2.5 million IOPs at 8K. This aggregate performance headroom allows the use of FlexClones attached to other instances for reporting, query, backup, testing, development, or other activities without affecting production performance, and without data movement (copies) that can reduce overall storage performance. These AWS application instances combined can get an aggregate storage level performance over 50 GB/s and 2 million IOPs.

Summary

In this blog, I covered a unique way to deploy FSx NetApp block service in conjunction with EBS, to meet the performance needs of Electronic Health Records applications while scaling far beyond the largest EHR deployments in the world today. I optimized the storage layer and controlled cost by leveraging the dynamic, pay-as-you-go nature of the cloud to create an environment that scales both up and down without the disruption or knowledge of the EHR application, and leverages the advanced space saving FlexClone ability in FSx NetApp ONTAP. Thus even if your organization’s storage requirements grow or shrink, your organization never pays for idle capital assets.

This powerful combination of pay-as-you-go economics in parallel with the advanced storage efficiencies of FSx NetApp delivers a scalable, highly reliable, high availability, disaster tolerant, yet cost-effective storage solution for electronic healthcare records applications in AWS.

Contact AWS to see how you can get started on your journey to optimized storage for EHR environments today by visiting AWS for Healthcare and Life Sciences, or contacting your AWS HCLS representative.