AWS Partner Network (APN) Blog

Accelerating Collaborative Omics Research with EPAM’s HealthOmics Platform on AWS

By Kevin Prime – Managing Principal, Scientific Informatics Consulting – EPAM
By Anton Isaiev – Systems Architect – EPAM
By Sanjay Aggarwal – Sr. Solutions Architect – AWS
By Mohit Mehta – Principal Solutions Architect – AWS

EPAM Systems Logo
EPAM
EPAM-Systems-APN-Blog-CTA-2023

Life sciences and healthcare have always been crucial fields, as they directly impact our well-being. A groundbreaking discovery can revolutionize our lifestyles, quality of life, and longevity, while a mistake or a delay can have a dramatic impact on individuals or entire populations.

Recent advances have led to development of omics technologies, which enable the study of an individual’s genetic makeup. These technologies are driving a revolution in healthcare, guiding the development of personalized medicines and more accurate predictive or prognostic models for specific conditions or diseases.

Successful omics research and the development of omics technologies require accurate analysis of vast amounts of data. Performing this analysis cost-effectively and efficiently is incredibly challenging, considering not only the sheer volume of data that needs to be analyzed but also the critical importance of accurate analysis.

This blog post demonstrates how EPAM’s Collaborative Omics Research Accelerator™ (CORA™), powered by AWS HealthOmics, overcomes the challenges of performing cost-effective and efficient omics data analysis at scale.

Omics Computation Challenges

Organizations using omics data to further research and patient care face numerous challenges. One of them is the exponential growth of genomic data volumes, which are doubling every seven months. By 2025, the capacity needed for genomic data is estimated to reach 40 Exabytes.

Another major challenge is the growth of the role and volume of modeling and simulation (M&S), also known as biosimulation or model-informed drug discovery and development (MID3). M&S can deliver significant business, scientific, and clinical value to firms that fully utilize it as an integral part of their drug development strategy.

The need to conduct M&S at scale, coupled with exponential growth of omics data, necessitates storing and processing vast volumes of data. Storing the data locally is not only expensive but also a massive drain on resources, which is simply unsustainable. On-premises resources for genomics analysis are limited in capacity and require significant investment to keep with the continually increasing demand for both computation and storage resources.

EPAM CORA Significantly Boosts Omics Research

EPAM Collaborative Omics Research Accelerator (CORA™) for AWS HealthOmics provides life science and health care organizations with a user-friendly, secure and Good Practice (GxP) ready environment for analyzing and correlating omics data. EPAM CORA also takes full advantage of the power of AWS HealthOmics with features like multiomic and multimodal analysis, population sequencing, and fully managed bioinformatics computation.

EPAM CORA is a web-based accelerator for omics research. It provides a platform for managing highly-scalable computing environments hosted on AWS and for scripting simple-to-complex computational workflows to accelerate an organization’s omics research. A key advantage of EPAM CORA, when compared to commercial SaaS solutions, is that its platform is elastic, meaning that the resources for each computational job are allocated from AWS only when needed and then released again, helping to minimize the cost of performing multiple parallel computations on different compute instances.

How EPAM CORA Works

Built using EPAM’s proven open-source Cloud Pipeline platform, EPAM CORA enables users to integrate AWS HealthOmics workflows and data storage into complex computational workflows, managing the work of provisioning the correct computing environment type and size needed for each step.

EPAM CORA also provides detailed security and access controls within an organization’s cloud environment. With extensive version controls for input data, workflows, and environments, EPAM CORA enables organizations to reproduce computations over time — a key requirement for dealing with regulatory and IP challenges.

EPAM's Collaborative Omics Research Accelerator Architecture on AWS

Figure 1 –EPAM’s Collaborative Omics Research Accelerator Architecture on AWS.

The CORA platform provides a comprehensive and flexible cloud-based infrastructure for bioinformatics and data analysis. Its architecture is built on AWS services, with the core functionality hosted on Amazon Elastic Kubernetes Service (Amazon EKS). The platform offers a user-friendly GUI accessible via HTTPS, managed by an Elastic Load Balancing and AWS Load Balancer Controller. Key components include the CORA GUI, API, Authentication Service, and Jobs Scheduler, all supported by an Amazon Relational Database Service (Amazon RDS) instance for data storage. The platform accommodates diverse computational needs, from custom workloads executed on Amazon EKS Worker nodes to specialized bioinformatics pipelines run through AWS HealthOmics Workflows. This architecture enables CORA to deliver a scalable, secure, and adaptable environment for a wide range of scientific computing tasks.

User Experience

A scientist reviews a variant callout file calculated from the original sequencer output with EPAM CORA. The detailed sequence information is displayed in the EPAM Genome Browser, an open-source add-on to EPAM CORA.

Detailed sequencer output

Figure 2 – Detailed sequencer output.

EPAM CORA Key Benefits

With a convenient and user-friendly interface, EPAM CORA allows scientists to perform their research easily and flexibly in the cloud, benefitting from:

  • Time Savings: Rapidly deploy the computational tools used by scientists.
  • Increased Accessibility: Easily access any tool via a user-friendly interface with fine-grained access and version controls for all artifacts.
  • Streamlined Cloud Migration: Streamline migrations of on-premise clusters into the cloud and provide a versatile solution for application hosting and sharing.
  • Scalability: Harness elastically scalable cloud high performance computing with many CPUs and GPUs.
  • Optimized Workflows: Construct complex computational workflows using AWS HealthOmics and several other workflow languages.
  • Enhanced Accounting: Generate clear accounting reports for individual and team usage of AWS resources.
  • Cost Savings: Utilize cost-effective and secure storage of large-scale omics data in AWS HealthOmics’ sequence, variant, and annotation storage.

EPAM CORA Features

EPAM CORA provides a single platform that can support a wide variety of computation-related capabilities, particularly for organizations in life sciences and healthcare. A few examples of how it is being used are detailed below.

Computation & Data Processing

Create complex, highly scalable, and parallel computing and data processing pipelines, and then execute them in their Amazon Virtual Private Cloud (Amazon VPC) using the user interface, a command-line scriptable interface, or a REST API. Each pipeline represents a workflow script with versioned source code, documentation, and configuration. Users can create scripts in the EPAM CORA environment or upload them from a local machine. Pipelines can fully leverage AWS HealthOmics Worfklows, including Ready2Run Workflows, as well as the specialized storage engines provided by AWS HealthOmics. A major advantage of EPAM CORA is that users can view pipelines during execution, examine outputs from individual steps in a computation, and even stop pipeline that isn’t working as expected.

Data Storage & Tools Management

Create data storage zones, manage input and output data, and maintain version control to efficiently reproduce previous runs and store and share raw genomics data efficiently using the AWS HealthOmics data stores. Deploy customized calculation environments using Docker containers.

Scientific Computing GUI Applications

Launch and run Graphical User Interface (GUI)-based applications using a self-service web interface, allowing users to easily choose cloud instance configurations or use a cluster. Applications are launched as Docker containers exposing web endpoints or as a remote desktop.

HPC-Based Solutions

Migrate from an on-premise computing center to a cloud-native environment to easily “lift and shift” your High-Performance Compute (HPC) payloads, including schedulers and other tools and run them “as is.” From there, clients can replace cluster-based solutions with natively parallelized, autoscaling pipelines as needed or leave legacy solutions to run in cloud-based clusters.

Conclusion

EPAM CORA powered by AWS HealthOmics democratize access to high-performance omics computations. It is being used in client environments ranging from small biotech startups to global pharmaceutical companies to underpin successful omics research as well as development of omics technologies. In doing so, it is providing the springboard to groundbreaking advances in life science and healthcare.
.
EPAM-Systems-APN-Blog-Connect-2023.


EPAM Systems – AWS Partner Spotlight

EPAM Systems is an AWS Premier Tier Services Partner that simplifies your infrastructure and applications by moving them to the cloud and eliminating technical debt, allowing you to focus on your business objectives.

Contact EPAM | Partner Overview | AWS Marketplace