Interact Conversationally with AWS HealthLake

Large language models (LLMs) are revolutionizing the way we interact with technology, and AWS HealthLake is no exception. HealthLake is a secure, HIPAA-eligible data lake that allows healthcare organizations to store, transform, and analyze their health data at scale. By combining HealthLake with an LLM, healthcare providers can interact conversationally with their data, gaining insights and making decisions faster than ever before. One way to use an LLM with HealthLake is through a chatbot interface. A chatbot is a computer program that uses natural language processing (NLP) to simulate a human conversation.

With a chatbot, healthcare providers can ask questions about their data and receive real-time answers. Users can ask questions about both their structured data (e.g., Electronic Health Records (EHR)) and their unstructured data (e.g., doctor’s notes). For example, a provider could ask the chatbot, “What is the average LDL value for all patients since 2017?” In this example, the chatbot would construct a query using structured query language (SQL), and then run this query against HealthLake, analyze the data, and provide the answer. Or, consider a case where Kendra has indexed the doctor’s notes in HealthLake. A clinician could ask to “Search doctors notes for Tommy814’s socioeconomic status.” The chatbot could extract some key words (Tommy814 and socioeconomic), query, analyze Kendra’s top suggestions, and summarize the suggestions as they relate to the question.

Problem statement

Despite having aggregated their disparate data sources into HealthLake, healthcare providers still rely heavily on legacy document search methods and data engineering teams for their reporting. Relying on a data engineering team for impromptu queries of structured data leads to delays in cloud migration and legacy search methods on unstructured data are time consuming. This results in long hours for administrators and clinicians during reporting periods.

This reduces clinicians’ focus on patient care, which subsequently negatively impacts quality of care. Furthermore, delays in data migration have cascading effects as multiple teams rely on critical data pipelines to inform business and clinical decisions.

Therefore, healthcare providers need a way to interact with structured data like EHRs and unstructured data like doctor’s notes in HealthLake using natural language.

A two-pronged solution

This solution uses Amazon Kendra, Amazon Athena, and an LLM to provide natural language access to structured EMR data and unstructured doctor’s notes.

This architecture uses HealthLake to host FHIR formatted EHR data, Amazon Simple Storage Service (Amazon S3) as the storage layer of the data pipeline, AWS Glue Studio Jobs to extract, transform, and load (ETL) our EHR data, Amazon Kendra for machine learning (ML) powered search of our doctor’s notes, and Amazon SageMaker to serve an LLM for SQL generation and document summarization. Following modern data architecture best practices, this solution adheres to the foundational logical layers of the Lake House Architecture.

This LLM can work with many other data sources and is not limited to only FHIR formatted EHR data and doctor’s notes. Any structured and unstructured data you put into Athena and Amazon Kendra respectively can be made accessible by the chatbot, as shown in the following figure.

Our HealthLake chatbot offers a user journey that is streamlined and easy to use. Here are the steps:

First, we pre-populate a HealthLake datastore with synthetic patient and population health data from Synthea in the HL7 FHIR format. Then, AWS Glue automatically crawls this datastore to create a database, which can be queried using the Python package pyathena as part of our structured-data ETL.
Next, AWS Glue Studio Python Shell Jobs ETLs both our structured EHR data and unstructured doctor’s notes. The doctor’s notes are placed in a folder to be indexed by Amazon Kendra, while the EHR ETL output is ported to an Athena database stored in Amazon S3 using AWS Glue Crawler.
Users can easily interact with our chatbot through Amazon Lex.
We use a SageMaker Endpoint to facilitate interactions with the LLM.
When a user asks a question without the ‘doctors notes’ keywords, the question is sent to the LLM along with around twenty-five examples of questions and their associated SQL queries. This is an example of few-shot-learning, and it’s how the LLM learns the database schema and the lexicon of our queriers. If the query is valid, then the query is run against the Athena database and the result is returned to the user through the chatbot.
If the user prepends their query with ‘search doctors notes for’, then the query is passed to Amazon Kendra to find the most relevant documents. Next, the text from the top two most relevant documents are passed to our LLM with a prompt instructing the LLM to summarize the doctor’s notes as they relate to the query. Then, the results are passed back to the user through the chatbot.

Walkthrough – Deploying the infrastructure stack

Prerequisites: you must have a default VPC setup and you must be in us-east-1.
Using the AWS Management Console, navigate to AWS Lake Formation. In the left pane, select ‘Administrative roles and tasks.’ Click the ‘Manage administrators’ button, and from the dropdown, find the IAM role you are going to use to launch the CloudFormation stack (likely the role you are currently logged in as). If you don’t know your IAM role, look to the right of the region dropdown in the upper right corner of the console screen. The ID displayed there follows the format of role/user@account-id.
Launch the Stack.
Enter a valid Docker user name (not email) and password (get one for free here).
Select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
Select I acknowledge that AWS CloudFormation might require the following capability: AUTO_EXPAND.
Choose Create stack.
Wait approximately 60 minutes for AWS CloudFormation to create the infrastructure stack and data.
Navigate to CloudFormation and select the name of the stack you just created (default is ‘HealthLakeBot’).
Within that CloudFormation, select ‘Outputs’, and then the link associated with ‘WebAppUrl.’

Questions to ask HeathLakeBot

Patient journey questions:

What medications does the patient Tommy814 Sauer652 take?
What procedures has the patient Tommy814 Sauer652 had?

Population health questions:

What is the average medication cost per patient for patients taking the medication Epinephrine?
What is the average medication cost for patients taking the medication penicillin?
How many patients are there with the condition asthma?
What is the average LDL value for all patients since 2017?

Amazon Kendra questions:

Search doctors notes for Tommy814’s socioeconomic status.
Search doctors notes for whether Tommy814 has ever smoked.
Search doctors notes for Tommy814’s age.

Access to tables

This solution uses a pre-trained LLM and a question-and-answer style prompt to generate the SQL, that then it generates the response. The interaction of the user with the tabular data in HealthLake is enabled through Amazon Lex, which provides the ability to host the chatbot as a full-page chatbot UI or embedded into an existing site as a chatbot widget.

The process begins with a user asking a question through the chatbot window. The question is sent to an LLM hosted on a SageMaker Endpoint through an AWS Lambda function. The user’s question is prepended with a set of questions and their associated SQL queries. This dataset is called a “few-shot learning” dataset, and it helps the LLM to learn the database schema and the lexicon of the queriers.

The SQL query is sent to an Athena database, which runs the query and returns the results to the LLM. The LLM uses the original question and the database result to create a conversational answer. Then, this natural language answer is sent back to the user through the chatbot interface.

This process allows users to conversationally interact with structured EHRs. It can be used to answer a variety of questions about a patient’s health, including questions about medications, diagnoses, and procedures. It can also be used to generate reports and summaries of a patient’s health data.

This process can help to improve the quality of care that patients receive by providing them with accurate and up-to-date information about their health. It can also help to reduce the time that healthcare providers spend on administrative tasks.

Access to documents

Amazon Kendra is a natural language search service that can be used to find information in a variety of documents, including doctor’s notes. Amazon Kendra can summarize clinical questions about doctor’s notes stored in HealthLake by passing the user’s query to Amazon Kendra to find the most relevant documents. The top two most relevant documents are then passed to an LLM with a prompt instructing the model to summarize the doctor’s notes as they relate to the query. The results are then passed back to the user through the chatbot.

Here is an example of how this process might work:

A user asks the chatbot, “Search doctors notes for Tommy814’s socioeconomic status.” The chatbot passes the query to Kendra, which finds the most relevant doctor’s notes in HealthLake. The top two documents are then passed to the LLM, which is instructed to summarize the doctor’s notes as they relate to the query. Then, the results are concisely passed back to the user through the chatbot interface: “Tommy comes from a middle socioeconomic background.” Relevant excerpts from the doctor’s notes that were used to generate the answer are also passed to the user along with their associated hyperlinks, which can be used to download the full document.

This process can answer a variety of clinical questions about doctor’s notes. It can help to improve the quality of care that patients receive by providing them with accurate and up-to-date information about their health.

LLMs

An LLM is a type of artificial intelligence (AI) that has been trained on a large corpus of text. LLMs can generate text, translate languages, and answer questions in a comprehensive and informative way.

LLMs are trained on a massive amount of text data, which allows them to learn the statistical relationships between words and phrases. This allows LLMs to generate text that is like human-written text, and to translate languages with a high degree of accuracy. LLMs can also be used to answer questions in a comprehensive and informative way, even if the questions are open-ended or challenging.

You can think of LLMs as probable-next-word generators. For example, if you posed the following statement to a LLM:

“The capital of Washington State is”

Then the LLM would, referencing its extensive experience with huge amounts of text data, chooses the most probable next word in your sentence and replies:

“Olympia.”

Few-shot learning

The use case employed here is called question and answer, even though – as in our preceding example – the format is not strictly always a question followed by an answer. When we provide multiple of these question-answer pairs to an LLM the technique is referred to as few-shot learning. We can think of few-shot learning as learning by example.

In our use case, we want a model that can generate healthcare domain SQL queries. Therefore, we must pass our LLM many (up to around 50) question-and-SQL examples. Such as:

“Question: What is the average HDL value for all patients since 2017?”

“Answer: SELECT AVG(value) avg_hdl FROM observations WHERE LOWER(description) LIKE ‘%high density lipoprotein%’ AND date > ‘2017’”

“Question: What is the average medication cost per patient for patients taking the medication Metformin?”

“Answer: SELECT patient, AVG(totalcost) avg_cost FROM medications WHERE LOWER(description) LIKE ‘%metformin%’ GROUP BY patient”

“Question: What allergies does the patient Christal240 Brown30 have?”

“Answer: SELECT DISTINCT(a.description) FROM allergies a JOIN patients p ON a.patient = p.id WHERE p.first = ‘Christal240’ AND p.last = ‘Brown30’”

Then, when a clinician or administrator asks their question, it is appended to the bottom of the prompt like this:

“Question: How many patients had a mammography procedure in 2020?”
“Answer:”

And just like in our capital of Washington example, our LLM chooses the next most probable words in the sequence responding with:

“SELECT COUNT(DISTINCT patient) FROM procedures WHERE LOWER(description) LIKE ‘%mammography%’”

Then, this response is passed to one of Amazon’s database management services, like Amazon Athena or Amazon Relational Database Service (Amazon RDS), and the response is sent back to the user in a conversational form. All of this happens behind the scenes as the user simply interacts with our LLM through a chat window. In production, the user would not see the SQL code. However, for clarity the following image shows both the code and the response.

Amazon HealthLake ChatBot image

This process can improve the quality of care that patients receive by providing them with accurate and up-to-date information about their health. It can also help to reduce the time that healthcare providers spend on administrative tasks.

Curating few-shot examples

One of the challenges of few-shot learning is curating the right kind of examples. The examples should be representative of the types of questions that the model is asked. This means including examples of questions that are phrased in different ways. For example, one user might ask “What medications does the patient Tommy814 Sauer652 take?”, while another user might ask “What medications does Tommy814 Sauer652 take?”. If the model is only trained on examples of the first type of question, then it cannot answer the second type of question.

Therefore, the few-shot prompt can be a mechanism to tune the model’s accuracy by including a variety of questions with the same SQL output to make the model more robust to different types of phrasing. We can make the tool work for both of the preceding examples by changing two of the sample questions (no need to change the associated SQL queries) in the few-shot prompt like this: ‘What allergies does the patient Christal240 Brown30 have?’ to ‘What medications does Christal240 Brown30 take?’

In addition to the examples themselves, it is also important to consider the order in which the examples are presented to the model. The first few examples that the model sees have a significant impact on its learning. Therefore, it is important to start with examples that are representative of the types of questions that the model is asked.

The few-shot prompt has a length limit, which means you must be deliberate in which questions are included in the prompt. One strategy is a last-in-first-out strategy for distinct questions, dedupping along the way. Regardless of what question and answer pairs are passed to the model in the few-shot prompt, it is critical that there is a strategy in place for storing the pairs and their validity. This dataset of valid questions and answers should be used later to fine tune the LLM for increasingly accurate results using Amazon SageMaker JumpStart.

Validation

It’s critical that the responses from our model are validated for accuracy. The first line of defense is that we are not actually asking our LLM for the answer to the question and instead asking it to generate an SQL query. This reduces the chance that the model could make up a response (this is called hallucination). If the LLM makes a mistake, then the SQL query that was generated typically won’t run. This reduces the chances of our model creating false content.

In the rare instance where our model writes an incorrect SQL query that does return a valid response, we introduce a human-in-the-loop mechanism to verify the accuracy of the response. Each unique question and answer is captured in a database table along with whether the user likes or dislikes the response. This allows data engineers to set-up the LLM for success by experimenting with questions they already know the answers to, and recording when the model produces an undesirable result. Questions where the LLM produced an invalid SQL query are subsequently withheld from the few-shot examples until data engineers can correct the queries. This dataset of valid question and answers can be used later to fine tune our LLM for even more accurate results.

Fine-tuning

SageMaker JumpStart is a service that makes it easy to fine-tune pre-trained LLMs for your specific domain. Once you have a prompt table with around 250 KB of data, you can use JumpStart to fine-tune your model on your healthcare questions and SQL queries.

To use SageMaker JumpStart, you first need to create a SageMaker notebook instance and install the JumpStart library. Then, you can create a new JumpStart project and import your prompt table. Then, SageMaker JumpStart automatically generates a training script and a training job for you.

Alternatively, you can use SageMaker JumpStart through SageMaker Studio, which provides a simple user interface for accessing and fine-tuning a variety of open-source LLMs.

The training job fine-tunes your model on your healthcare questions and SQL queries. Once the training job is complete, you can deploy your model to a SageMaker endpoint. Then, you can use your model to generate SQL queries for your healthcare data.

SageMaker JumpStart is a great way to fine-tune pre-trained LLMs for your specific domain. It is easy to use and can save a lot of time and effort. Follow this AWS guide for a low code way to fine-tune your model on your healthcare questions and SQL queries.

Conclusion

The combination of Amazon Kendra, Athena, and a pre-trained LLM hosted on a SageMaker Endpoint provides a powerful tool for healthcare professionals to access both structured and unstructured data in HealthLake. With conversational access to this data, clinicians and healthcare administrators can quickly and easily retrieve the information they need to provide the best possible care to patients.

One of the benefits of this approach is that it enables clinicians and administrators to access electronic medical records and clinical documents without needing specialized knowledge of database query languages or data engineering. By implementing validation best practices, the LLM becomes a living and evolving model that becomes increasingly accurate over time. The more it is used, the more accurate and effective it becomes.

This approach also reduces the need for impromptu queries to be routed to the data engineering team. This frees up the team to focus on more strategic work, such as data migration, while clinicians and administrators can easily access the data they need. By automating the process of generating SQL queries, healthcare professionals can quickly and easily find the information they need to make informed decisions about patient care.

By enabling conversational access to structured and unstructured data, thereby reducing the routing of impromptu queries to the data engineering team, this approach streamlines the process of accessing electronic medical records and clinical documents. In turn, this improves the quality of care and increases efficiency in healthcare organizations.