Skip to main content

Guidance for Ingesting PDF and Image Files to AWS HealthLake

Overview

This Guidance demonstrates how to ingest PDF or other image files to AWS HealthLake so you can generate business insights. In this Guidance, AWS CloudFormation deploys AWS resources, including an AWS Lambda function, an Amazon Simple Storage Service (Amazon S3) bucket, and a HealthLake instance store. This Guidance will help healthcare workers turn medical or claims data from PDF and image files into a more usable format so they can more securely share patient medical information, use data to inform clinical decision-making, and optimize overall efficiency in the hospital.

How it works

These technical details feature an architecture diagram to illustrate how to effectively use this solution. The architecture diagram shows the key components and their interactions, providing an overview of the architecture's structure and functionality step-by-step.

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

CloudFormation automates the creation of AWS resources. Lambda enables automatic responses to events. Amazon CloudWatch observes and watches resources and events on AWS. HealthLake uses CloudWatch and AWS CloudTrail to monitor performance. Together, these services support development and your ability to run workloads effectively so you can gain insight into your operations.

Read the Operational Excellence whitepaper 

HealthLake, Amazon S3, Lambda, and CloudFormation are Health Insurance Portability and Accountability Act (HIPAA)-eligible services. These services meet rigorous security and access control standards to help ensure patients’ sensitive health data is protected and meets regulatory compliance. S3 buckets have encryption configured by default, and objects are automatically encrypted by using server-side encryption with Amazon S3-managed keys (SSE-S3). Lambda encrypts uploaded files and environmental variables at rest. CloudFormation stores data encrypted at rest and uses encrypted channels for service communications in compliance with the AWS shared responsibility model.

Additionally, AWS Identity and Access Management (IAM) policies have been scoped down to the minimum permissions required for the service to function properly. AWS Key Management Service (AWS KMS) encrypts customer data, both in transit and at rest. Per Fast Healthcare Interoperability Resources (FHIR) specification, if a customer deletes a piece of data, it will be only be hidden from analysis and results; it is not deleted from the service and is only versioned.

Read the Security whitepaper 

Lambda maintains compute capacity across multiple Availability Zones (AZs) in each AWS Region to help protect your code against individual machine or data center facility failures. Amazon S3 stores data redundantly across a minimum of 3 AZs by default, providing built-in resilience against widespread disaster. Amazon S3 is also designed to sustain data in the event of AZ failure and provides a highly durable storage infrastructure designed for mission-critical and primary data storage.

Read the Reliability whitepaper 

HealthLake organizes and indexes patient information and stores it in the FHIR industry standard format to provide a complete view of each patients’ medical history. In addition, HealthLake transforms unstructured data using specialized ML models, like NLP, to automatically extract meaningful medical information from the data. You can use FHIR REST API operations to manage and search resources in your HealthLake data store.

Read the Performance Efficiency whitepaper 

Lambda is a serverless, event-driven compute service. Serverless architectures remove the need for you to run and maintain physical servers for traditional compute activities, helping you lower transactional costs that may otherwise be spent on maintaining infrastructure.

Moving to Amazon S3 reduces costs by eliminating over-provisioning, minimizing the chance of getting locked into hardware refresh cycles, and providing virtually unlimited scale.

Read the Cost Optimization whitepaper 

Amazon S3, Lambda, AWS Textract, and HealthLake are all AWS managed services, shared across a broad customer base to help optimize resource usage. Managed services reduce the amount of infrastructure needed to support cloud workloads, helping you minimize your environmental impact.

Read the Sustainability whitepaper 

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.