[SEO Subhead]
This Guidance shows how to use AWS Entity Resolution to perform patient entity resolution on healthcare data stored in AWS HealthLake. With HealthLake, you can establish comprehensive patient profiles with confidence scores, facilitating more accurate data management and maintaining data integrity across your environment. Furthermore, by integrating machine learning capabilities, this Guidance can assist you in identifying and linking disparate patient records across data sources, a key step in processes such as Master Data Management (MDM) or Enterprise Master Patient Index (EMPI).
Please note: [Disclaimer]
Architecture Diagram
[Architecture diagram description]
Step 1
An Amazon EventBridge scheduler automatically invokes the AWS Step Functions state machine at a scheduled time. Alternatively, the state machine can run on demand to run this Guidance and perform patient entity resolution for your AWS HealthLake data store.
Step 2
Amazon Athena fetches patient identifier information from the HealthLake data store. The Athena SQL query runs against the AWS Lake Formation resource link database.
Step 3
The query result dataset is saved in an Amazon Simple Storage Service (Amazon S3) bucket as a CSV file. The identifier attributes of the patient resources used for the query could include attributes like name, address, phone number, date of birth, and gender.
Step 4
Once the patient dataset has been created in the previous step, an AWS Glue crawler crawls the dataset and populates an AWS Glue Data Catalog table. This table will then be ready for ingestion into AWS Entity Resolution.
Step 5
This Guidance uses the pre-configured machine learning (ML)-based matching technique to find matches across the input patient dataset. An AWS Entity Resolution schema mapping and a matching workflow are created to define how to match the input patient data and where to write the match results.
An AWS Lambda function runs a job of the matching workflow. It writes the results to another Amazon S3 bucket with the AWS Entity Resolution match ID and confidence level.
Step 6
Once AWS Entity Resolution identifies matching patient records, a Lambda function reads and parses the AWS Entity Resolution results. Then, it inserts the match-IDs with a pre-defined high confidence level back into the patient resources as new identifier attributes. Now, patient records can be identified and matched across your HealthLake data store.
Well-Architected Pillars
The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
Use Step Functions to orchestrate your entire workflow as a state machine. Step Functions coordinates the processing of multiple Lambda functions, allowing you to perform operations as code for automated processing. You can also limit human error and enable consistent responses to events. EventBridge can schedule the Step Functions state machine to run automatically, reducing operational overhead and ensuring regular processing of your entity resolution process. Additionally, you can automate your extract, transfer, and load (ETL) process by using AWS Glue to crawl your patient dataset and populate the Glue Data Catalog. Finally, monitor the metrics and logs using Amazon CloudWatch, gaining operational visibility and simplifying troubleshooting.
-
Security
AWS Identity and Access Management (IAM) enforces least-privilege access and can integrate with Lake Formation to create and grant appropriate permissions to stakeholders. This allows your stakeholders to securely query your HealthLake data store using Athena. HealthLake has default encryption at rest and in transit to safeguard your data. You can further enhance your security posture by using Amazon S3 with encryption, access controls, and versioning.
-
Reliability
Amazon S3 offers durable data storage with automatic replication across multiple Availability Zones (AZs). You can use Athena for reliable and highly available access to your data in Amazon S3. In addition, orchestrate your workflows using Step Functions, which provide built-in error handling and retry mechanisms. And by running your services on the global infrastructure of AWS, which is designed for fault tolerance and high availability, you help ensure that issues in one Region do not impact services in other Regions.
-
Performance Efficiency
By using serverless technologies like EventBridge, Lambda, AWS Glue, Athena, and Amazon S3, this Guidance scales your configured resources based on your workload demands. Furthermore, with AWS Glue crawlers, you can automate your ETL process by streamlining data preparation and minimizing manual effort. Also, use the advanced matching capabilities of AWS Entity Resolution to accurately identify and link disparate patient records, optimizing resource utilization and reducing the need for manual intervention. You can then monitor the performance of your resources using CloudWatch, so you can identify and address potential bottlenecks or inefficiencies.
-
Cost Optimization
Athena, Amazon S3, Lambda, AWS Glue, and EventBridge scale on demand and only charge you for the resources you use. With Athena, you can analyze data in your HealthLake data store without provisioning or managing any infrastructure, eliminating idle resource costs. AWS Entity Resolution follows a pay-per-use model, where you only pay for the number of source records processed by your workflows.
-
Sustainability
EventBridge and Step Functions orchestrate workflows in a resilient, efficient manner with minimal resources. And by using Amazon S3, Lambda, Athena, and other serverless services that utilize the renewable energy infrastructure of AWS, your architecture is equipped to scale efficiently, optimizing energy usage.
Implementation Resources
The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.
Related Content
[Title]
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.