AWS for Industries
Building the foundation for Lab of the Future using AWS
Life science industries are transitioning from wet lab environments to digital labs. Digital labs decrease the “time to science” and de-risk R&D portfolios. Customers see computational methods as a way to increase the performance, throughput, and effectiveness of laboratory operations. This presents opportunities around long-standing challenges with experiment reproducibility and the ability to address lab workflow inefficiencies via automation and predictive analytics.
The need to securely share information beyond corporate walls is urgent; collaboration continues to be a key approach to drug development. As precision medicine evolves, the need to evaluate large, complex datasets, such as genomic data or medical imaging, requires the use of artificial intelligence or machine learning to derive insights from large quantities of data. AWS services provide a unified vision around data capture, ingestion, storage, analytics, and AI/ML capabilities to empower the modernized biopharma lab.
In this post, we discuss the application of artificial intelligence and machine learning technologies in the end-to-end workflow of collecting, storing, processing, visualizing and acting on operational and experimental laboratory data. These tools are the foundation for building and operating the laboratory of the future.
Vision for the laboratory of the future
The life science value chain is modernizing to reduce the time and cost of developing and producing drugs. The commercialization costs of a new drug are estimated to be $2.5B, with 25% of those costs considered to be driven by inefficiency in the lab. To address this problem, life science companies are pursuing the laboratory of the future, where systems of electronic and physical devices monitor physical processes, create virtual copies of the physical world, and make decentralized decisions. Using the Internet of Things (IoT), these electro-physical systems communicate and collaborate with each other, and with humans, in real time. The capabilities of the laboratory of the future include data generation (via sensors, IoT, instruments and software/applications), data collection, aggregation, visualization, and analytics.
Laboratory operations are improved through the availability of real-time information, enabling scientists and research business leaders to make immediate, data driven decisions. This leads to better efficiencies, and provides a competitive advantage by improving early stage pipeline performance.
Building the laboratory of the future
The architecture for the laboratory of the future has three main steps:
- Collect data — Collect real-time, streaming data from devices and static sources like applications (for example, Electronic Lab Notebooks), databases, or SaaS offerings. This data is collected through standard APIs, ETL tools, native AWS services, and third-party offerings. Amazon Kinesis, AWS DataSync, and AWS Glue are a few of the services that support this phase of data acquisition.
- Process, store, and catalog data — AWS Lake Formation allows you to extract, transform, load, and prepare data while securing and then granting the appropriate fine-grained access control. Metadata is captured through cataloging and master data management (MDM). The data is transformed in the format amenable to further processing analyzing, and reporting using Amazon Machine Learning services. Amazon S3 is used as a data lake. AWS Glue and Glue Crawlers are used to catalog the data and capture metadata. Amazon OpenSearch Service (September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service) is used for exploring the available data. Amazon EMR processes vast amounts of data quickly, and cost-effectively at scale. Amazon QuickSight lets you easily create and publish interactive dashboards that include Machine Learning insights.
- Act on data — The insights obtained from the data, either through descriptive or predictive analytics, can be viewed using Amazon QuickSight or third-party business intelligence tools. Additionally, transformed insights from the data, including alerts and feedback, can be provided in real time via emails and messaging using Amazon SNS.
An architecture for building the laboratory of the future is shown below:
The AWS Cloud allows for the best data capture, storage and processing, reporting and analytics capability. It provides the technical infrastructure to collaborate effectively, while also supplying consistent support for the data security, data privacy, data integrity, and compliance considerations such as HIPAA and Good Laboratory Practices (GLP).
Data security, compliance, and access control
It is imperative that we consider the data workflows and security frameworks when building the infrastructure for the laboratory of the future. Access to data must be quickly granted and then immediately revoked on an as needed basis. This access must be tracked, audited, and available to appropriate internal resources and select third parties. This is important for data protection, sovereignty audits, and GxP compliant practices.
The laboratory of the future in action
Protein crystallization is the process of the formation of a protein crystal. Protein crystals are useful in the study of protein structures for use in medicine. To enable protein crystallization, proteins are dissolved in an aqueous environment in order to reach a supersaturated state. This supersaturated state allows researchers to study the internal structure of proteins. Developing protein crystals is difficult, as the process is influenced by many factors, including pH, temperature, ionic strength in the crystallization solution, and even gravity. Once properly developed, these crystals can be used in structural biology to study the molecular structure of the protein, particularly for various industrial or biotechnological purposes, such as developing cancer treatments.
Based on the crystals, the determination of protein structure can traditionally be achieved by using X-Ray Diffraction (XRD). Alternatively, cryo-electron microscopy (cryo-EM) and nuclear magnetic resonance (NMR) could also be used for protein structure determination. The structure of proteins is significant to the structural analysis in biochemistry and translational medicine. Meanwhile, the protein structure is essential for the development of targeted therapy in modern drug advancement.
Large runs of samples must be analyzed to see if the crystals have formed. Currently each sample is placed in a microscope and a human determines if the crystal has formed. Samples with crystallization are then moved forward to process, and those without crystallization are discarded.
- Using the laboratory of the future architecture above, images can be ingested from the computers connected to the microscopes through storage gateway, and then deposited into the data lake for processing. Initial metadata-tagging and indexing happens upon ingestion (for example, tagging the image with the barcode of the sample), and the next step in the workflow is submittal to Amazon SageMaker to determine crystallization status. Using a Jupyter notebook and pre-trained algorithms, each microscope image is classified in one of three ways:
High certainty that a crystal has formed — These are tagged and made available to a workflow that can submit to X-Ray crystallography, an NMR, or any other workflow that the customer specifies. Each stage adds new metadata-tagging to the sample and allows the process to repeat as many times as required. - Low certainty that there is no crystal — These are tagged and stored. The process ends here for this sample.
- Undetermined state — These are flagged for a human to investigate. Depending on the observation, it can be marked as crystal or no crystal and then sent to its corresponding workflow. An additional flag can be added so this image is marked as a training candidate and used for future learning.
This architecture’s goal is to only use the human resource when required, thus removing the manual burden and allowing highly trained and specialized resources to be used for more complex scientific tasks.
Final thoughts
Life science industry is in the midst of a transformation driven by overall healthcare reform, advancement in technology, and new scientific capabilities such as genomics. Historically, these problems have been difficult to address. However, with the advent of cloud technology, life science customers are addressing this critical component of the value chain in new ways by making data securely available at the right place, at the right time, and accessible with the right tools.