AWS for Industries

Clinical doppelgangers: Accelerating pediatric insights

Treating complex patients can be difficult. The ability to see how other patients with similar symptoms or conditions were treated and what their outcomes were can help clinicians better understand how to approach their own patients. Traditional search, however, falls short for nuanced clinical care scenarios (for example, post-Norwood arrhythmias responsive to alternative pacing).

Clinicians need to reason over context, not only keywords, to connect anatomy, intervention, and outcomes. The Clinical Doppelgangers project is a clinician-led initiative at Boston Children’s Hospital that uses large language models (LLMs), alongside Amazon Web Services (AWS) managed services, to surface patients with similar characteristics, such as clinical symptoms, conditions, and lab values.

Clinical Doppelgangers transforms unstructured narratives into queryable signals (semantic embeddings and sparse lexical vectors). It joins them with structured data, serving as an agentic LLM that helps clinicians, and researchers, ask natural language questions and receive interpretable source-attributed answers.

What we built

Boston Children’s Hospital implemented a complete pipeline that has processed over 6000 complex cardiac intensive care unit (CICU) pediatric cases. In doing so a knowledge base of over 250 structured measurements was developed (including over 500 sparse vector representations and thousands of semantic embeddings) to capture clinical meaning combined with numerical measurements. An LLM agent powered with this knowledge base can transform a natural language query into an object used to search this repository and find relevant patients to develop a cohort within seconds to minutes.

When a clinician wants to find patients that have, for example, hypoplastic left heart and arrhythmia after Glenn with good response to pacing, the LLM interprets intent and plans a multistep search across our knowledge base to develop a relevant patient cohort. The clinician can then look over these similar patients to help inform decisions going forward for the patient at hand.

Leveraging AWS to scale and move fast

The high-level architecture for Clinical Doppelgangers is shown in Figure 1 and includes the following components:

  • Ingestion and orchestration
    • Clinical notes and PDFs land in Amazon Simple Storage Service (Amazon S3)
    • AWS Step Functions orchestrate the workflow
    • AWS Lambda validates payloads and coordinates next tasks
  • Extraction
    • Amazon Textract performs optical character recognition (OCR) and layout extraction on PDFs
    • Amazon Comprehend Medical and Amazon Bedrock tag diagnoses, medications, procedures, and attributes
    • A clinical embeddings model hosted on Amazon SageMaker generates vectors
    • A SPLADE (sparse lexical and expansion model) document encoder is also hosted on Amazon SageMaker, which produces sparse maps of each note section and report
  • Storage and search
  • Agentic querying and reasoning
    • Amazon Bedrock Agents parse the clinician’s question, generating a search data structure and iteratively search the knowledge repository
    • LLMs are also used to generate high-level cohort answers based on the clinician’s query
  • Security
    • Everything runs in a private Amazon Virtual Private Cloud (Amazon VPC) with interface endpoints

The diagram shows a high-level architecture for the Clinical Doppelgangers solution which depicts a clinician at the top giving a natural language query to the solution which is in a HIPAA-Compliant Virtual Private Cloud (VPC) within the AWS Cloud. Within this VPC, five sub-section boxes are shown. The leftmost box is labeled Document Ingestion and contains within it three small icons connected by arrows pointing to the right: “Amazon S3,” “AWS Step Functions,” and “AWS Lambda.” The second box from the left is labeled “Agentic Preprocessing” and consists of four small icons within it, with a dotted arrow from the first box’s “AWS Lambda” icon to the second box’s “Amazon Textract” icon, which then has three arrows, one labeled “Full Text” to an “Amazon Bedrock (Embeddings)” icon, one labeled “Raw Text” to an “Amazon Comprehend Medical” icon, and one labeled “Structured Data” to an “Amazon SageMaker (Text-to-SQL)” icon. The third box is labeled “Storage & Search” and contains two icons, “Amazon OpenSearch Service” which has an arrow labeled “Vectors” coming from the second box’s “Amazon Bedrock” icon, and “Amazon Aurora PostgreSQL” which has one arrow labeled “Entities” coming from the second box’s “Amazon Comprehend Medical” icon and a second arrow labeled “SQL Insert” coming from the second box’s “Amazon SageMaker” icon. Above the second and third box is a fourth box labeled “Query Interface” which contains two icons within it: “Amazon Cognito” and “Amazon API Gateway.” The clinician’s natural language query arrow points to this “Amazon Cognito” icon which then has an arrow pointing to the “Amazon API Gateway” icon, which then has one arrow pointing to the fifth box’s “Amazon Bedrock Agents” icon and another arrow pointing from the fifth box’s “Amazon Bedrock Guardrails” icon to the fourth box’s “Amazon API Gateway” icon. These two arrows have the word “Cohort” near them. That fifth box is labeled “Agentic Reasoning.”Figure 1: High-level architecture for Clinical Doppelgangers

Early clinical impact and what’s next

Having received very positive feedback from their clinicians, the Boston Children’s Hospital are working to further expand the Clinical Doppelgangers project—increasing its clinical impact. In terms of time required to insight, they are targeting an 80% reduction in chart review to improve decision time for complex patient cases.

Clinical Doppelgangers can also be instrumental for research acceleration, where faster cohort identification is vital for quality improvement and research projects. The roadmap for the Clinical Doppelgangers project includes expanding beyond the CICU to additional ICUs and specialties. They are also looking to integrate into electronic health records (EHR) to streamline workflows and expanding learnings to other institutions (including both pediatric and adult healthcare organizations).

Where to learn more

Contact an AWS Representative to know how we can help accelerate your business.

Learn more about AWS for Healthcare & Life Sciences and our curated AWS services or check out the AWS Partner Network solutions used by thousands of healthcare and life sciences customers globally. Visit the AWS Healthcare Solutions webpage or check out the AWS Health Data Portfolio site. You can also read more blogs about AWS healthcare stories.

Further reading

Dinesh Rai, MD

Dinesh Rai, MD

Dinesh Rai, MD, is a Clinical AI Engineer at Boston Children's Hospital, with a background in emergency medicine and clinical informatics. He focuses on applying AI, including large language models and natural language processing, to practical healthcare challenges. Dinesh works on projects aimed at improving clinical workflows, patient care, and medical education through AI.

Angela Zhang

Angela Zhang

Angela Zhang is a Program Manager in the Innovation and Digital Health Accelerator at Boston Children’s Hospital, where she oversees strategy and operational best practices across the accelerator and emerging tech/AI workstreams. She leads AI use case prioritization and triage through the development of products and contributes to the hospital’s AI governance strategy to support responsible, scalable innovation.

Christine Tsien Silvers, MD, PhD

Christine Tsien Silvers, MD, PhD

Christine Tsien Silvers, MD, PhD, serves as Healthcare Executive Advisor at AWS. Her research at MIT and Harvard Medical School since the 1990s focused on AI/ML and their use to improve patient care. Trained at Massachusetts General Hospital and Brigham and Women’s Hospital, she is Board certified in Emergency Medicine as well as Clinical Informatics. In the 20+ years prior to joining AWS, Chris worked clinically and then served as Chief Medical Officer at two healthcare technology startups. She is passionate about leveraging technology to improve health.

John Brownstein, PhD

John Brownstein, PhD

John Brownstein, PhD, is a Professor of Pediatrics at Harvard Medical School and SVP and Chief Innovation Officer of Boston Children’s Hospital. He directs the Computational Epidemiology Laboratory and the Innovation and Digital Health Accelerator. His research has been instrumental in furthering the control and prevention of disease, improving public health practice, and engaging the public in health issues.