AWS for Industries

Building the foundation for Lab of the Future using AWS

Life science industries are transitioning from wet lab environments to digital labs. Digital labs decrease the “time to science” and de-risk R&D portfolios. Customers see computational methods as a way to increase the performance, throughput, and effectiveness of laboratory operations. This presents opportunities around long-standing challenges with experiment reproducibility and the ability to address lab workflow inefficiencies via automation and predictive analytics.

The need to securely share information beyond corporate walls is urgent; collaboration continues to be a key approach to drug development. As precision medicine evolves, the need to evaluate large, complex datasets, such as genomic data or medical imaging, requires the use of artificial intelligence or machine learning to derive insights from large quantities of data. AWS services provide a unified vision around data capture, ingestion, storage, analytics, and AI/ML capabilities to empower the modernized biopharma lab.

In this post, we discuss the application of artificial intelligence and machine learning technologies in the end-to-end workflow of collecting, storing, processing, visualizing and acting on operational and experimental laboratory data. These tools are the foundation for building and operating the laboratory of the future.

Vision for the laboratory of the future

The life science value chain is modernizing to reduce the time and cost of developing and producing drugs. The commercialization costs of a new drug are estimated to be $2.5B, with 25% of those costs considered to be driven by inefficiency in the lab. To address this problem, life science companies are pursuing the laboratory of the future, where systems of electronic and physical devices monitor physical processes, create virtual copies of the physical world, and make decentralized decisions. Using the Internet of Things (IoT), these electro-physical systems communicate and collaborate with each other, and with humans, in real time. The capabilities of the laboratory of the future include data generation (via sensors, IoT, instruments and software/applications), data collection, aggregation, visualization, and analytics.

Laboratory operations are improved through the availability of real-time information, enabling scientists and research business leaders to make immediate, data driven decisions. This leads to better efficiencies, and provides a competitive advantage by improving early stage pipeline performance.

Building the laboratory of the future

The architecture for the laboratory of the future has three main steps:

  • Collect data — Collect real-time, streaming data from devices and static sources like applications (for example, Electronic Lab Notebooks), databases, or SaaS offerings. This data is collected through standard APIs, ETL tools, native AWS services, and third-party offerings. Amazon Kinesis, AWS DataSync, and AWS Glue are a few of the services that support this phase of data acquisition.
  • Process, store, and catalog data — AWS Lake Formation allows you to extract, transform, load, and prepare data while securing and then granting the appropriate fine-grained access control. Metadata is captured through cataloging and master data management (MDM). The data is transformed in the format amenable to further processing analyzing, and reporting using Amazon Machine Learning services. Amazon S3 is used as a data lake. AWS Glue and Glue Crawlers are used to catalog the data and capture metadata. Amazon OpenSearch Service (September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service) is used for exploring the available data. Amazon EMR processes vast amounts of data quickly, and cost-effectively at scale. Amazon QuickSight lets you easily create and publish interactive dashboards that include Machine Learning insights.
  • Act on data — The insights obtained from the data, either through descriptive or predictive analytics, can be viewed using Amazon QuickSight or third-party business intelligence tools. Additionally, transformed insights from the data, including alerts and feedback, can be provided in real time via emails and messaging using Amazon SNS.

An architecture for building the laboratory of the future is shown below:

The AWS Cloud allows for the best data capture, storage and processing, reporting and analytics capability. It provides the technical infrastructure to collaborate effectively, while also supplying consistent support for the data security, data privacy, data integrity, and compliance considerations such as HIPAA and Good Laboratory Practices (GLP).

Data security, compliance, and access control

It is imperative that we consider the data workflows and security frameworks when building the infrastructure for the laboratory of the future. Access to data must be quickly granted and then immediately revoked on an as needed basis. This access must be tracked, audited, and available to appropriate internal resources and select third parties. This is important for data protection, sovereignty audits, and GxP compliant practices.

The laboratory of the future in action

Protein crystallization is the process of the formation of a protein crystal. Protein crystals are useful in the study of protein structures for use in medicine. To enable protein crystallization, proteins are dissolved in an aqueous environment in order to reach a supersaturated state. This supersaturated state allows researchers to study the internal structure of proteins. Developing protein crystals is difficult, as the process is influenced by many factors, including pH, temperature, ionic strength in the crystallization solution, and even gravity. Once properly developed, these crystals can be used in structural biology to study the molecular structure of the protein, particularly for various industrial or biotechnological purposes, such as developing cancer treatments.

Based on the crystals, the determination of protein structure can traditionally be achieved by using X-Ray Diffraction (XRD). Alternatively, cryo-electron microscopy (cryo-EM) and nuclear magnetic resonance (NMR) could also be used for protein structure determination. The structure of proteins is significant to the structural analysis in biochemistry and translational medicine. Meanwhile, the protein structure is essential for the development of targeted therapy in modern drug advancement.

Large runs of samples must be analyzed to see if the crystals have formed. Currently each sample is placed in a microscope and a human determines if the crystal has formed. Samples with crystallization are then moved forward to process, and those without crystallization are discarded.

  • Using the laboratory of the future architecture above, images can be ingested from the computers connected to the microscopes through storage gateway, and then deposited into the data lake for processing. Initial metadata-tagging and indexing happens upon ingestion (for example, tagging the image with the barcode of the sample), and the next step in the workflow is submittal to Amazon SageMaker to determine crystallization status. Using a Jupyter notebook and pre-trained algorithms, each microscope image is classified in one of three ways:
    High certainty that a crystal has formed — These are tagged and made available to a workflow that can submit to X-Ray crystallography, an NMR, or any other workflow that the customer specifies. Each stage adds new metadata-tagging to the sample and allows the process to repeat as many times as required.
  • Low certainty that there is no crystal — These are tagged and stored. The process ends here for this sample.
  • Undetermined state — These are flagged for a human to investigate. Depending on the observation, it can be marked as crystal or no crystal and then sent to its corresponding workflow. An additional flag can be added so this image is marked as a training candidate and used for future learning.

This architecture’s goal is to only use the human resource when required, thus removing the manual burden and allowing highly trained and specialized resources to be used for more complex scientific tasks.

Final thoughts

Life science industry is in the midst of a transformation driven by overall healthcare reform, advancement in technology, and new scientific capabilities such as genomics. Historically, these problems have been difficult to address. However, with the advent of cloud technology, life science customers are addressing this critical component of the value chain in new ways by making data securely available at the right place, at the right time, and accessible with the right tools.

Click here to learn more about AWS for Life Sciences

Sam Coker

Sam Coker

Sanford is the Worldwide Technical Lead for Healthcare at AWS. In this role, he is challenged with coordinating and creating a unified technical healthcare roadmap for AWS and their customers. Sam has a long and varied background in academic research computing as well as hospital operations. Starting at the University of Kansas getting to design and build the 6th HPC cluster, and continuing that work in molecular modelling with Schrodinger LLC and The Rockefeller University. Looking for new challenges he moved to hospital IT operations running and designing clinical systems at Weill Cornell Medical College. These included early work with LIMS, PACS/VNAs and Epic electronic medical records. This led to the opportunity of engineering and operational responsibility of all clinical systems at NYU Langone Medical Center. After surviving 2 major black outs, 2 hurricanes and a super storm named Sandy, he received a lot of practice and experience in designing, running and restoring complex healthcare systems.

Deven Atnoor, Ph.D

Deven Atnoor, Ph.D

Deven Atnoor is an Industry Specialist in AWS’ Global Healthcare and Life-Sciences practice. Leveraging his domain knowledge, Deven is building digital transformation solutions for unlocking the power of data using AWS; enabling healthcare and life sciences customers to generate insights from their data assets to fuel innovation in order to deliver better outcomes for patients. Deven received a Ph.D. Chemical Engineering from the University of Cincinnati and a B.S. from the Indian Institute of Technology, Bombay, India.

Patrick Buckner

Patrick Buckner

Patrick has over 20 years of experience in the Life Science industry working with biopharmaceutical and medical device companies across North and South America, Europe and Asia via software organizations and 9+ years with the engineering and consulting subsidiary of Novo Nordisk. Patrick has worked across the value chain including R&D, clinical development, manufacturing and supply chain and has led sales and marketing teams in North America and Europe. Currently, he is the WW Business Development Manager, leading the Life Science industry solution program. Patrick received his B.A. from the University of North Carolina-Chapel Hill and a Machine Learning Professional Certification from the Massachusetts Institute of Technology (MIT).