AWS for Industries
Revolutionizing Real-World Evidence: How Generative AI Can Simplify Data Exploration
Real-World Evidence (RWE) is a domain in which new medical evidence is generated based on data from healthcare and other health services from “the Real World”, compared to clinical trials which are controlled and from specific medical evidence generation. This domain is an evolution of HEOR (Health Economic and Outcome Research) and is particularly inspired by the methodologies of infectious disease epidemiology and the rigorous academic work done on establishing evidence-based standard of care protocols in clinical practice guidelines.
Data holds immense power to drive advancements in personalized patient care in pharmaceutical research and clinical trials. With increases in the amounts of data, researchers are able to leverage real-world evidence to help identify risks or benefits of new products. Generating RWE is a complex process done by accessing various formats of structured and unstructured data which are then stored in different databases or storage systems. Traditional methods of data exploration and analysis not only demand extensive technical expertise, but also make it more difficult to draw insights.
In this blog, we will see how modern approaches, notably using Generative AI technology and modern data services can help us explore Real World Data (RWD) such as electronic health records or health insurance claims to simplify RWE generation.
Solution overview
This solution allows us to use natural language questions to query data in different data sources and synthesize the results to answer the RWE expert with structured and sourced answers.
For instance, if a clinician wanted to know what medications one of their patients was taking, they could simply ask “What medications does patient Tommy814 Sauer652 take?”, and they would receive an answer in a user-friendly, conversational format.
In the example above, we use the Synthea dataset, a dataset of synthetic data based on US population, which holds structured sources like electronic health records (EHRs) and claims data, as well as unstructured clinical narratives like doctor’s notes. These datasets are stored in Amazon Bedrock Knowledge Bases, AWS HealthLake and Amazon Kendra.
This solution leverages a combination of advanced technologies to seamlessly integrate various data sources and provide a user-friendly interface for healthcare professionals. By utilizing Amazon’s suite of AI and cloud services, including Amazon Bedrock, HealthLake, and Kendra, the system can process both structured and unstructured medical data. This comprehensive approach allows for a more holistic view of patient information, enabling healthcare providers to make more informed decisions quickly and efficiently.
Technical deep dive
When a user asks a question, the Bedrock Agent determines the intent of the question and where the relevant information is likely stored. It will then direct the data source to the appropriate Bedrock Agent Action Group. Action groups can interact with external systems, the structured data stored in Amazon Athena, or to unstructured data stored in Kendra, to gather information or perform tasks.
If the information is likely to be in a structured data source like a database, then the original question is combined with a tailored set of instructions and details about the database. Additionally, the system also searches previous queries to find similar queries which can be used for reference. All of these are then sent to Bedrock, which returns a SQL query that is used to retrieve the data from Athena. If there’s an error or no results, the system attempts to self-correct by providing error messages and more examples to improve the query.
An example would be a clinician asking, “What medications does the patient Tommy814 Sauer652 take?” The LLM would interpret this query, generate the appropriate SQL code, and retrieve the relevant information from the structured EHR data in HealthLake, presenting the results in a user-friendly, conversational format.
Alternatively, if the query involves unstructured data like “Search doctor’s notes for Tommy814’s socioeconomic status,” the LLM would leverage Amazon Kendra to find the most relevant clinical notes, summarize the pertinent information, and provide a concise response through the chatbot interface.
Walkthrough – Deploying the infrastructure stack
1. Prerequisites: you must have a default VPC setup and you must be in us-east-1.
2. Using the AWS Management Console, navigate to AWS Lake Formation. In the left pane, select ‘Administrative roles and tasks.’ Click the ‘Manage administrators’ button, and from the dropdown, find the IAM role you are going to use to launch the CloudFormation stack (likely the role you are currently logged in as). If you don’t know your IAM role, look to the right of the region dropdown in the upper right corner of the console screen. The ID displayed there follows the format of role/user@account-id.
3. Using the AWS Management Console, navigate to Amazon Bedrock, click the three lines in the upper left to display the left pane, select ‘Model access’ in the left pane, select ‘Manage model access’ in the upper right, check the box next to ‘Claude 3 Sonnet,’ and then select ‘Save changes’ in the bottom right corner.
4. Launch the Stack.
5. Select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
6. Select I acknowledge that AWS CloudFormation might require the following capability: CAPABILITY_AUTO_EXPAND.
7. Choose Create stack.
8. Wait approximately 60 minutes for AWS CloudFormation to create the infrastructure stack and data.
9. Navigate to CloudFormation and select the name of the stack you just created (default is ‘HealthLakeBot’).
10. Within that CloudFormation, select ‘Outputs’, and then the link associated with ‘WebAppUrl.’
Conclusion
This solution not only streamlines data exploration and analysis workflows but also democratizes data access across healthcare organizations. By abstracting away the complexities of search technology like SQL and enabling conversational interactions, non-technical users, such as RWE experts, clinicians, researchers, and data analysts, can easily access and analyze both structured and unstructured data sources, fostering cross-functional collaboration and knowledge-sharing. The solution’s ability to handle both structured and unstructured data sources can lead to better patient outcomes and accelerated research initiatives. Additionally, by simplifying the process of exploring real-world data (RWD), it allows experts to focus their efforts on innovation and generating real-world evidence (RWE), helping to provide better options for patients’ treatment and care.