AWS for Industries
Accelerating Drug Discovery with high-throughput Cell Painting on AWS
Are you facing challenges in cell image processing? Let’s dive into how life sciences customers have transformed cell analysis using AWS’s Cell Painting Batch solution.
Introduction
In the field of drug discovery, analyzing cell images from microscopes plays a major role. Cell Painting, an innovative approach for high-content screening, has emerged to understand cellular behaviors and phenotypes. Leading companies in the biopharma industry have been leveraging tools such as CellProfiler software developed by the Broad Institute for cell analysis. However, exponential data growth and diverse imaging techniques pose significant challenges. Here, we will learn how life sciences customers have leveraged AWS to build a scalable, efficient, and distributed solution for cell analysis.
Current Situation
Cell painting experiments require scalable storage to accommodate large file sizes and scalable compute to enable high-throughput image analysis. Scientists today use open-source tools like CellProfiler, but they require scalable infrastructure to run automated pipelines without worrying about infrastructure management. Scientists are trying to run science experiments while managing huge volumes of microscopic image data and infrastructure provisioning. Researchers need to collaborate across laboratories effectively and securely with easy-to-use solutions. Finally, scientific reproducibility is the foundation of research: scientists need to replicate scientific findings from others when they publish in high-impact journals or even when looking at results from within their own laboratory.
Challenges
The following are the challenges that life sciences customers faced while using tools like CellProfiler software on stand-alone instances:
- Challenges in adapting to fluctuating workloads
- Productivity issues with complex long running jobs
- Collaboration issues with teams spread across the organization
- Struggles to meet the demand for compute-heavy operations
- Cluster capacity issues often lead to incomplete jobs, delays and inefficiencies
- Absence of a centralized data hub causes data access issues
Solution Overview
To address these challenges, AWS solution architects worked with life sciences customers to develop a new solution called Cell Painting Batch (CPB). CPB uses CellProfiler image from Broad Institute to run CellProfiler Pipelines on AWS in a scalable and distributed architecture. CPB enables researchers to run large scale image processing tasks without worrying about complexity involved in infrastructure management. Additionally, CPB solution is built using AWS Cloud Development Kit (CDK) which makes it easy to deploy and manage infrastructure.
The entire workflow is automated, once images are uploaded, an Amazon Simple Queue Service (SQS) message is sent which triggers the entire process, from image processing to result storage. This provides researchers with an efficient, automated, and scalable solution for their large-scale image processing needs.
Figure 1 – This diagram shows an Amazon S3 bucket to drop images from microscopes. When a user submits an SQS message, it triggers AWS Lambda. The Lambda function submits an AWS Batch job based on a container image pulled from the Amazon Elastic Container Registry (ECR). AWS Batch processes the images and publishes the results to an Amazon S3 bucket for further analysis.
Workflow
The Cell Painting Batch (CPB) solution on AWS is designed to streamline the complex process of cell image processing, ensuring that researchers can focus on what truly matters: deriving insights from the data. Here is a step-by-step breakdown of how the CPB solution operates:
- Researchers obtain images from microscopes or other sources.
- These images are then uploaded to a designated Amazon Simple Storage Service (S3) bucket, acting as an image repository.
- Once images are stored, researchers submit an Amazon Simple Queue Service (SQS) message with details about the image location and the desired CellProfiler pipeline. This message is sent to the SQS service, essentially acting as a request for image processing.
- On the receipt of an SQS message, an AWS Lambda function is automatically triggered. This function’s primary role is to initiate the AWS Batch job for the particular image processing request.
- AWS Batch evaluates the job requirements. Depending on the task, AWS Batch dynamically provisions the necessary Amazon Elastic Compute Cloud (EC2) instances.
- The specified container image, stored in Amazon Elastic Container Registry (ECR) is fetched. Within AWS Batch, this container executes, running the defined CellProfiler pipeline. Amazon FSx for Lustre integrated with the S3 bucket ensures data is promptly available to containers.
- The CellProfiler software within the container processes the image following a defined pipeline. This can involve segmenting the image, extracting features, and other image processing tasks.
- Post-processing by CellProfiler, the results are saved back to the designated S3 bucket at a location specified in the SQS message.
- Researchers access the S3 bucket to download and analyze results for their studies.
Figure 2 – This diagram shows input cell image before processing (on the left), & output cell image after processing (on the right). Image source: https://github.com/CellProfiler/examples/tree/master/ExampleHuman
The workflow is automated, ensuring that once an image is uploaded and an SQS message is sent, the solution initiates image processing and storage of results. This provides researchers with an efficient, automated, and scalable solution for their large-scale image processing needs.
Deployment
Customers can either use Amazon Cloud9 or their personal computer to clone the CPB solution from Github repository. Once it’s cloned, customers can follow instructions on README file to deploy the infrastructure to their AWS account.
Security
The Cell Painting Batch (CPB) on AWS offers a robust security framework for cell painting datasets and pipelines. With encrypted data storage at rest and transit, regulated access via AWS Identity and Access Management (IAM), and enhanced network protection using an isolated VPC, the solution provides top-tier data safety. Additionally, continuous monitoring using security tools like Amazon Cloud Watch improves the security posture. To further strengthen the security, it’s advisable to implement additional mitigations such as strong authentication with multi-factor authentication (MFA), version control for system configurations, protect excessive resource usage using Amazon Cloud Watch and AWS Service Quotas, cost monitoring with AWS Budgets and container scanning using Amazon Inspector.
Life Sciences Customers’ Success Story
The transition to CPB has been transformational for life sciences customers. Deploying this has reduced the image processing times, streamlined processing pipelines, and created a culture of collaboration. The system’s inherent scalability ensures that these companies are future-proof, ready to handle even larger datasets to accelerate drug discovery.
Customizing the Solution
The CPB solution is designed with a modular architecture that allows for seamless integration with other AWS services. Whether it’s AWS Step Functions for optimized workflow orchestration, Amazon AppStream to access scientific tools on a browser, AWS Service Catalog for self-service deployments, and Amazon SageMaker for machine learning workloads, the possibilities are endless. Additionally, Github code includes parameters file to apply customizations such as instance class, timeout duration and more.
Conclusion
Implementing the cell painting batch solution can substantially enhance researchers’ productivity, freeing them from infrastructure management complexity. This solution enables faster and scalable image analysis to accelerate therapeutics development. Furthermore, it equips researchers with self-managed deployment and processing options, significantly reducing reliance on infrastructure management.
The CPB solution on AWS has not only addressed the challenges of life sciences customers; it has set a new approach for the biopharma industry for cell image analysis. By offering an innovative solution with a blend of efficiency, scalability, and automation, life sciences organizations can now easily run large scale cell imaging workloads and accelerate drug discovery.
Call to Action
If your organization is experiencing similar challenges in cell image processing, it’s time to explore Cell Painting Batch solution for image analysis. You can get in touch with your AWS Account team, AWS community and explore Github code to deploy this solution and start this new journey for cell painting analysis.