AWS Machine Learning Blog

Use the AWS Cloud for observational life sciences studies

In this post, we discuss how to use the AWS Cloud and its services to accelerate observational studies for life sciences customers. We provide a reference architecture for architects, business owners, and technology decision-makers in the life sciences industry to automate the processes in clinical studies.

Observational studies lead the way in research, allowing you to formulate hypotheses and test those hypotheses in controlled experiments. These studies are a powerful tool to help researchers learn what happens in real-life studies. You can use this research as a precursor to drug discovery and new drug indications. The learnings from observational studies in the area of drug development can weed out potential failures early and save millions of dollars. To run an observational study, an independent body must first define and approve it. Participants must be identified and recruited. Staff handling and collecting data must be trained. Participant data must be collected. After the data has been captured, it has to be evaluated by professional staff. Finally, results are presented.

Due to the growth of technology, drug research is now using advanced technologies for genomics, proteomics, and other compute-intensive tasks at a considerable reduction in cost. Clinical study service providers are using technology and automation to support clinical trial data capture, but using these services can be cost prohibitive for observational studies. Companies performing observational studies for research tend to fall back to manual mechanisms for capturing and processing data. Imagine the challenge of capturing and managing data for studies that might have hundreds, if not thousands, of participants who are completing paper-based surveys. AWS services can automate and improve data capture and analytical reporting for both the participant and the researchers. This minimizes costs, timing, and errors, and it can scale when needed with minimal effort.

With the growth of Internet of Things (IoT) technologies, AWS has made it easier for participants and companies to provide valuable research data from observational studies. From the patient perspective, they can provide feedback using Alexa, a laptop, a mobile device, or even an iWatch and never leave the comfort of their home. For the company, data errors can be reduced and the speed of acquiring the data is instantaneous.

IoT technology also offers huge amounts of data that you can use for future AI or machine learning (ML) insights. This is an added benefit for companies looking for value in unstructured data.

How can AWS services help?

Amazon Alexa has revolutionized how people interact with machines to get things done. When you use an Alexa custom skill integrated with other AWS services, participants can provide study feedback by simply invoking the skill with an utterance. You can use this technology to capture survey feedback for participants who prefer not to leave their home. The skill can also provide feedback for physically incapacitated individuals who can only use their voice to provide feedback (for example, Multiple Sclerosis patients). The data Alexa captures can provide a wealth of participant information that might have been undiscoverable from simple survey feedback. Alexa data is unstructured and can be large in quantity, but Amazon has made it easy to gain insights using natural language processing (NLP) tools like Amazon Comprehend Medical and Amazon SageMaker Studio.


The following architecture diagram shows the art of the possible when using AWS technologies to capture observational study information and present findings. You can use this post and the proposed architecture to solve real-world problems.

At a high level, this architecture consists of the following services:

Amazon has introduced and expanded the HIPAA-eligible Alexa skill program, making it easier and more secure for developers and organizations to get their skills approved for privacy, compliance, and legal requirements. For other HIPAA-eligible AWS services, see HIPAA Eligible Services Reference. Security is a top priority but can be easily managed using AWS best practices. For example, all data stored in Amazon S3 can be encrypted by default or can be customer managed using keys from AWS KMS.

You can use Amazon Macie to scan the data in Amazon S3. If there is any sensitive data, including personal identifiable information (PII), you can use automation to ensure that such buckets and objects are secured or alert those responsible to take appropriate actions.

An AWS suggested best practice is to provide service roles for AWS services to interact with other services or users with minimal permission necessary to perform intended actions. You can use IAM to create the role and user entities with fine-grained access policies.

The following is a detailed breakdown of a proposed workflow:

  • All data captured from the Alexa conversations is automatically pushed into a secure S3 bucket in JSON format.
  • All IoT data is streamed into a secure S3 bucket via AWS IoT Core services or Amazon Kinesis.
  • Files dropped into S3 buckets trigger AWS Lambda functions to perform the following tasks:
    • Pass data through Amazon Comprehend to perform sentiment analysis and store this information in DynamoDB for future reporting purposes. Automation can be built in such a way that if a negative sentiment score using the Amazon Comprehend API is greater than, for example, 0.8, email alerts can be sent to the study manager for followup. You can use Amazon Comprehend to identify and redact PII, and Amazon Comprehend Medical has APIs for detecting PHI information in the Alexa transcript.
    • Store the specific study information in DynamoDB to help facilitate study management, monitoring, and future reporting requirements.
  • Amazon EMR jobs transform and enrich the data for future analyses. You can schedule these jobs to run on a set or periodic basis.
  • SageMaker Studio is used on both raw and transformed data to gain hidden value from the vast amounts of structured and unstructured data. SageMaker tools like AutoML can help any organization use AI/ML opportunities without having extensive knowledge in data science. If your organization has a sophisticated data science team, you can use Studio to wrangle data; build, train, and validate models; and ultimately deploy models, all in a single, easy-to-use MLOps platform.
  • Amazon Kendra can be used to search for answers in the unstructured text data.
  • Amazon Redshift and QuickSight provide data visualizations, analytical reporting, and automated report distribution. QuickSight Q, powered by ML, allows you to ask natural language questions against your data, which saves weeks of effort building predefined data models and dashboards.


This solution shows the possibilities using various AWS technologies to automate observational studies. We suggest you try out the solution for yourself and use the provided reference architecture. Please do due diligence and make necessary modifications to the architecture; you can also partially implement it based on your needs.

For more information related to implementing the proposed architecture, see the following:

For additional support, contact your Technical Account Manager or Solutions Architects.

About the Authors

Varad Ram is Senior Solutions Architect in Amazon Web Services. He likes to help customers adopt to cloud technologies and is particularly interested in artificial intelligence. He believes deep learning will power future technology growth.

Susant Mallick is an Industry specialist and digital evangelist in AWS’ Global Healthcare and Life-Sciences practice. He has over 20+ years of experience in the Life Science industry working with biopharmaceutical and medical device companies across North America, APAC and EMEA regions. He has built many Digital Health Platform and Patient Engagement solutions using Mobile App, AI/ML, IoT and other technologies for customers in various Therapeutic Areas. He holds a B.Tech degree in Electrical Engineering and MBA in Finance. His thought leadership and industry expertise earned many accolades in Pharma industry forums.

Pam McCaslin As technical leader with over 30 years of experience, Pam has spent the last 15 years building innovative cloud-based business products in the Life Sciences area. Her focus in recent years is to drive effective business process automation, incorporate Artificial Intelligence and Machine Learning and leverage data to drive innovation that will improve healthcare for patients.