AWS Public Sector Blog
How KHUH built a long-term storage solution for medical image data with AWS
King Hamad University Hospital (KHUH) and Bahrain Oncology Center is a 600-bed-hospital in Bahrain. Over the years, KHUH faced constraints with exponential growth of their on-premise storage needs, particularly with the medical images stored by their picture archiving and communication system (PACS). After assessing their long-term storage needs, KHUH turned to Amazon Web Services (AWS) to develop a cost- and time-effective long-term storage solution without making changes to their existing PACS. Read on to learn how KHUH developed and iterated on a long-term medical image data storage architecture that helped them reduce their storage costs by 40%.
KHUH turns to the cloud for long-term storage of medical image data
As of January 2022, KHUH had accumulated around one million medical image studies, resulting in around 476 million files with a total data volume of 44 TB – and KHUH is adding around 1 TB of new data every month. An assessment of access patterns showed that out of those one million studies, only 94,000 studies from older years (2011-2019) were retrieved in 2021. These 94,000 studies represented a data volume of 3 TB – so only around 7% of historic PACS data is retrieved annually. The assessment demonstrated that the majority of medical images are not accessed in later years and can be stored in a long-term archive.
Storage upgrade activities tied up precious IT personnel and took efforts away from KHUH’s core healthcare business and patient-centric focus. As a result, KHUH wanted a long-term archival solution that could grow and adapt automatically. Furthermore, the solution needed to require minimal changes to the existing PACS system, as the PACS system was provided by an independent software vendor (ISV) and changes to the software code base would take time to implement.
KHUH decided to use AWS to help with this. KHUH implemented a solution leveraging Amazon S3 File Gateway and Amazon S3 Glacier to store medical images in the cloud for long term archiving, without making changes to the existing PACS. In their new storage architecture, only medical images generated within the last four years are kept on-premise; the remaining data is archived with AWS.
Inside KHUH’s long-term medical image storage solution on AWS
The following section provides an overview of the initial proof-of-concept architecture. Later, KHUH further simplified the architecture after the Amazon S3 Glacier Instant Retrieval storage class became available in November 2021.
Architecture overview
Amazon S3 File Gateway provides a file interface into Amazon Simple Storage Service (Amazon S3) and combines a service and a virtual software appliance. By using this combination, KHUH can store and retrieve objects in Amazon S3 using the file protocol server message block (SMB). The software appliance, or gateway, is deployed in the KHUH on-premises environment as a virtual machine (VM) running on VMware ESXi. The gateway provides access to objects in Amazon S3 as files or file share mount points.
Figure 1. Initial architecture using the Amazon S3 Glacier Flexible Retrieval storage class.
Using Amazon S3 File Gateway enabled KHUH to transfer medical image data to the cloud without modifications to their existing PACS system. The PACS system uses the SMB protocol to store data on a Windows file server. As the data volume grows, KHUH adds SAN volumes to the Windows file server and creates new file shares. New file shares are added to the PACS storage manager. The Amazon S3 File Gateway file share has been added as another file share so the PACS system can then store medical images on the Amazon S3 File Gateway.
In the initial solution architecture KHUH developed on AWS, the team used the Amazon S3 Glacier Flexible Retrieval storage class. For archive data that does not require immediate access, but needs the flexibility to retrieve large sets of data at no cost, such as backup or disaster recovery use cases – you can use S3 Glacier Flexible Retrieval. Amazon S3 Glacier Flexible Retrieval delivers flexible retrieval options that balance cost with access times ranging from minutes (for expedited retrievals), 3-5 hours (for standard retrievals) to 5-12 hours (for bulk retrievals).
A retrieval process needs to be triggered for files archived in the Amazon S3 Glacier Flexible Retrieval storage class before the files are available again to the PACS system. KHUH wanted to avoid manual interventions, so they designed their architecture to automate the retrieval process. In the initial architecture, KHUH implemented the approach outlined in the blog post “Automate restore of archived objects through AWS Storage Gateway”. In this approach, when the PACS system tries to access a file that has been moved to the Amazon S3 Glacier Flexible Retrieval storage class, an “InaccessibleStorageClass” error message is logged in Amazon CloudWatch logs. This error triggers an AWS Lambda function that initiates the retrieval of the medical image files via an Amazon S3 batch operation.
KHUH simplifies the architecture with Amazon S3 Glacier Instant Retrieval
In November 2021, AWS announced the new Amazon S3 Glacier Instant Retrieval storage class. This new storage class delivers the lowest cost storage for long-lived data that is rarely accessed and offers retrieval in milliseconds.
After the announcement, KHUH redesigned their long-term storage solution architecture to incorporate the new offering so they could take advantage of the built-in automated retrieval. In addition to lower retrieval times, it allowed KHUH to remove the AWS Lambda function, and with that complexity, from the architecture. The simplified architecture looks like the following:
Figure 2 Simplified architecture utilizing the Amazon S3 Glacier Instant Retrieval storage class.
Files archived in Amazon S3 Glacier Instant Retrieval do not require a retrieval process. Files are available with the same throughput and milliseconds access as the Amazon S3 Standard and Amazon S3 Standard-IA storage classes. The simplified solution also makes the AWS Lambda function and use of Amazon CloudWatch obsolete.
KHUH introduces automatic failover to reduce RTO
Using Amazon S3 File Gateway presented an operational risk for deploying the solution into production: in the case the Amazon S3 File Gateway ever became unavailable, KHUH would not have access to medical image data in the Amazon S3 File Gateway cache or in the Amazon S3 bucket in the AWS Cloud during the time it would take to recover.
To reduce the recovery time objective (RTO), KHUH deployed a second Amazon S3 File Gateway virtual machine in their data center. The Amazon S3 File Gateways are addressed via a single DNS name with failover alias records. In case the Amazon S3 File Gateway behind the primary record becomes unavailable, DNS returns the secondary failover record and the PACS server starts communication with the secondary Amazon S3 File Gateway. This failover is automatic.
As is recommended in the Troubleshooting file share issues documentation in the AWS Storage Gateway User Guide, KHUH deliberately avoided writing multiple file shares to one Amazon S3 bucket. To make sure of this in the failover architecture, KHUH configured two Identity and Access Management (IAM) roles. The read-and-write role is associated with the file share on the primary Amazon S3 File Gateway. The share on the secondary Amazon S3 File Gateway is associated with a read-only role.
In addition, the availability of the file shares on both Amazon S3 File Gateways is monitored with KHUH’s existing on-premise application monitoring system. In case any of the Amazon S3 File Gateways become unavailable, an automated alert is sent to KHUH’s IT operations team. The team then verifies the automated failover has taken place and restores the failed Amazon S3 File Gateway.
KHUH sees 40% savings in storage costs and other benefits with AWS
Using the simplified solution, KHUH reduced its storage cost by 40% compared to investment in upgrading the on-premises storage to accommodate their data growth over the next five years. KHUH realized additional benefits with this solution as well, like:
- Higher durability for medical images: Every file stored in Amazon S3 is automatically replicated across three Availability Zones in Bahrain. To achieve this level of durability on its own would have required significant investment on KHUH’s side in additional hardware, data center locations, and operations personnel.
- Reduction in administrative effort: The rapid growth in medical image data required KHUH to expand the on-premise storage once or twice a year. This required long planning cycles for the procurement process and installation. With the ability of Amazon S3 to dynamically and automatically expand as needed, these efforts are no longer required.
- Increased innovation: Archiving medical images in Amazon S3 opens possibilities for KHUH to innovate and experiment at a much faster pace. For example, medical image data is one of the enablers of multimodal machine learning (ML). Multimodal ML analyzes linked patient-level data from diverse data modalities, such as genomics and medical imaging, which can accelerate improvements in patient care. Read the blog post “Building Scalable Machine Learning Pipelines for Multimodal Health Data on AWS” for more.
Learn more about AWS for healthcare
This blog post describes how KHUH developed a solution for archiving medical image data in the cloud without impacting their existing on-premise PACS system. In addition, KHUH’s solution proved to be more cost effective than using traditional on-premises storage, plus it saves the KHUH team time in reduced administrative effort and provides even higher durability of their medical image solution. Additionally, their solution is PACS vendor-agnostic and can be used by other PACS systems that store image data via Windows file servers (using the SMB protocol)
Hospitals with similar challenges can start exploring how to optimize their operations with the cloud. The journey to the cloud is not often made in one big step; AWS offers many services, like Amazon S3 File Gateway, that allow hospitals to start using a hybrid architecture that integrates with their existing on-premises systems, on which they can gradually adopt services to provide better patient care.
Do you have any questions about how you can use the cloud to optimize your storage needs and manage other hospital operations? Reach out to the AWS Public Sector team for help and more information.
Read more about AWS for healthcare:
- Transforming radiology workflows with clinical decision support powered by AWS
- Getting started with healthcare data lakes: Using microservices
- How to create a task-generating voicemail solution with Amazon Connect
- AWS launching new Region in UAE in 2022
- Breaking down patient data silos in UK healthcare with serverless cloud technology
- Solving medical mysteries in the AWS Cloud: Medical data-sharing innovation through the Undiagnosed Diseases Network
Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.
Please take a few minutes to share insights regarding your experience with the AWS Public Sector Blog in this survey, and we’ll use feedback from the survey to create more content aligned with the preferences of our readers.