This Guidance shows how to use AWS Entity Resolution to perform patient entity resolution on healthcare data stored in AWS HealthLake. With HealthLake, you can establish comprehensive patient profiles with confidence scores, facilitating more accurate data management and maintaining data integrity across your environment. Furthermore, by integrating machine learning capabilities, this Guidance can assist you in identifying and linking disparate patient records across data sources, a key step in processes such as Master Data Management (MDM) or Enterprise Master Patient Index (EMPI).

Please note: [Disclaimer]

Architecture Diagram

[Architecture diagram description]

Download the architecture diagram PDF 

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

  • Use Step Functions to orchestrate your entire workflow as a state machine. Step Functions coordinates the processing of multiple Lambda functions, allowing you to perform operations as code for automated processing. You can also limit human error and enable consistent responses to events. EventBridge can schedule the Step Functions state machine to run automatically, reducing operational overhead and ensuring regular processing of your entity resolution process. Additionally, you can automate your extract, transfer, and load (ETL) process by using AWS Glue to crawl your patient dataset and populate the Glue Data Catalog. Finally, monitor the metrics and logs using Amazon CloudWatch, gaining operational visibility and simplifying troubleshooting.

    Read the Operational Excellence whitepaper 
  • AWS Identity and Access Management (IAM) enforces least-privilege access and can integrate with Lake Formation to create and grant appropriate permissions to stakeholders. This allows your stakeholders to securely query your HealthLake data store using Athena. HealthLake has default encryption at rest and in transit to safeguard your data. You can further enhance your security posture by using Amazon S3 with encryption, access controls, and versioning.

    Read the Security whitepaper 
  • Amazon S3 offers durable data storage with automatic replication across multiple Availability Zones (AZs). You can use Athena for reliable and highly available access to your data in Amazon S3. In addition, orchestrate your workflows using Step Functions, which provide built-in error handling and retry mechanisms. And by running your services on the global infrastructure of AWS, which is designed for fault tolerance and high availability, you help ensure that issues in one Region do not impact services in other Regions.

    Read the Reliability whitepaper 
  • By using serverless technologies like EventBridge, Lambda, AWS Glue, Athena, and Amazon S3, this Guidance scales your configured resources based on your workload demands. Furthermore, with AWS Glue crawlers, you can automate your ETL process by streamlining data preparation and minimizing manual effort. Also, use the advanced matching capabilities of AWS Entity Resolution to accurately identify and link disparate patient records, optimizing resource utilization and reducing the need for manual intervention. You can then monitor the performance of your resources using CloudWatch, so you can identify and address potential bottlenecks or inefficiencies.

    Read the Performance Efficiency whitepaper 
  • Athena, Amazon S3, Lambda, AWS Glue, and EventBridge scale on demand and only charge you for the resources you use. With Athena, you can analyze data in your HealthLake data store without provisioning or managing any infrastructure, eliminating idle resource costs. AWS Entity Resolution follows a pay-per-use model, where you only pay for the number of source records processed by your workflows.

    Read the Cost Optimization whitepaper 
  • EventBridge and Step Functions orchestrate workflows in a resilient, efficient manner with minimal resources. And by using Amazon S3, Lambda, Athena, and other serverless services that utilize the renewable energy infrastructure of AWS, your architecture is equipped to scale efficiently, optimizing energy usage.

    Read the Sustainability whitepaper 

Implementation Resources

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.

[Content Type]


This [blog post/e-book/Guidance/sample code] demonstrates how [insert short description].


The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.

Was this page helpful?