AWS for Industries

Democratize Omics Data Analysis with Basepair on AWS HealthOmics

Introduction

Advancements in next-generation sequencing (NGS) technology have created new opportunities for omics data analysis, unlocking valuable insights for precision medicine, clinical diagnostics, and drug discovery. To keep pace with this high-throughput technology and handle the fluctuations in data volume, healthcare and life sciences (HCLS) customers seek a secure, reliable, and scalable environment for data storage and analysis. However, they often have limited in-house expertise or resources to set up this infrastructure and perform analysis, which creates a roadblock for large-scale omics data analysis and collaborative research.

To address this challenge, AWS launched AWS HealthOmics, a purpose-built service that helps customers store, query, and analyze genomic and other omics data. By removing the undifferentiated heavy lifting of provisioning and managing infrastructure, it enables HCLS organizations to focus on scientific discoveries and improve patient outcomes. Basepair, an AWS Qualified Software Partner, offers a next-generation bioinformatics platform that accelerates migration, deployment, orchestration, and scaling of bioinformatics workflows on AWS. Its intuitive point-and-click interface enables scientists with little to no computational experience to connect to their data, process it, and then explore it through interactive reports customized for each data type.

In this blog post, we share how customers can leverage the Basepair bioinformatics platform, powered by AWS HealthOmics, to access user-friendly, scalable, and flexible infrastructure for omics data analysis with predictable costs. We give an overview of the platform and key benefits of this integration. We show how this Software-as-a-Service (SaaS) solution allows you to use storage and compute resources in your own AWS account. Finally, we demonstrate how the integrated out-of-the-box interactive visualization tools can accelerate time to scientific insight.

Basepair Overview

Basepair has been designed to make bioinformatics easier, faster, and cost-effective on AWS. It offers a SaaS platform that democratizes not just access to, but analysis and interpretation of, omics data. The platform can be provisioned on a customer’s AWS account to use their own storage and compute resources, thereby eliminating the overhead and risk of moving data. Its point-and-click graphical user interface (GUI) enables end users, with minimal programming experience, to leverage existing industry standard tools or build custom workflows, while supporting reproducibility. Its built-in visualization tools generate interactive reports to uncover valuable insights from the data, improving collaboration between R&D teams and freeing bioinformaticians up to focus on more advanced downstream analysis. Finally, it offers application programming interfaces (APIs) and a powerful command line interface (CLI) for automation and integration with third-party applications, as well as brand labeling to look and feel like an organization’s own web portal.

Ease of Use

Basepair’s platform, powered by AWS, improves user experience by enabling scientists of all backgrounds to perform complex analyses and interpret the resulting data with ease. The point-and-click interface ensures that end-users, even bioinformatics novices, can access and utilize the tools they need with minimal training. Moreover, the results of an analysis aren’t just a series of flat files and static web pages available for download. Instead, users are able to quickly assess the quality of their data and interactively explore it through a series of dynamic, interactive reports optimized for each data type (Figure 1). Then, if there are questions about its use or how to best analyze and interpret the data, the samples, as well as analyses, and results can be shared with either that organization’s bioinformatics team or Basepair’s technical support team, facilitating collaboration and accelerating support for R&D projects.

The Basepair platform, powered by AWS HealthOmics, offers an easy-to-use graphical user interface (GUI) that enables customers to leverage HealthOmics’ storage and workflow capabilities. Its built-in visualization tools generate interactive reports that can accelerate time to scientific insight.

Figure 1: The Basepair platform, powered by AWS HealthOmics, offers an easy-to-use graphical user interface (GUI) that enables customers to leverage HealthOmics’ storage and workflow capabilities. Its built-in visualization tools generate interactive reports that can accelerate time to scientific insight.

Connected Cloud

Traditional bioinformatics platforms typically involve one of two deployment methods. They either require the movement of genomic data into centralized environments of hosted bioinformatics platforms, or they require installation inside a customer’s AWS account, which may increase ongoing operations and maintenance. Basepair on the other hand, can be configured to assume an Identity Access Management (IAM) role to interface with a customer’s existing environment. Via a series of API calls, it is then able to execute read/write operations to the customer’s Amazon Simple Storage Service (S3) bucket or AWS HealthOmics data stores. With this architecture, data movement is eliminated, not only addressing most of the compliance, security, and data privacy concerns, but also allowing customers to control cloud costs while staying connected to other tools and resources in their AWS account, as shown in Figure 2.

Diagram outlining Basepair’s connected cloud architecture. The orchestration plane on the left is in Basepair’s AWS account whilst the customer’s AWS account, hosting the compute and storage resources, is on the right, accessed via a limited IAM role.

Figure 2: Diagram outlining Basepair’s connected cloud architecture. The orchestration plane on the left is in Basepair’s AWS account whilst the customer’s AWS account, hosting the compute and storage resources, is on the right, accessed via a limited IAM role.

Integrating Basepair Platform with AWS HealthOmics

The Basepair platform comprises two fundamental components: storage and workflow engine. The storage module is tasked with efficiently storing, retrieving, and organizing customers’ omics data. Historically, Basepair has utilized Amazon S3 as a storage layer, a capability we are now expanding to encompass AWS HealthOmics. This extension involves the utilization of HealthOmics APIs to establish connections with AWS HealthOmics sequence and reference stores. The workflow engine component is dedicated to the design, supervision, and execution of customers’ workflows. This integration with HealthOmics introduces the ability for our customers to leverage Ready2Run and private workflow functionalities within Basepair.

Features supported by the integration:

  1. Streamlined sample uploading facilitated through the Basepair Console and Basepair CLI (Figure 3).
  2. Direct, interactive visualization of HealthOmics read sets.
  3. Archived HealthOmics read sets automatically activated upon utilization in workflow execution.
  4. Custom workflow creation enabled by a user-friendly drag-and-drop interface, supporting workflows authored in Nextflow, Workflow Description Language (WDL), or Common Workflow Language (CWL).
  5. Interactive timeline chart available during workflow execution, providing insight into resource utilization and execution times.
  6. Comprehensive management and sharing capabilities for samples, workflows, and analyses among multiple users.
  7. Connected Cloud functionality allowing customers to integrate their own cloud infrastructure for sample storage and workflow execution.

Steps for uploading an input data sample to Basepair platform.

Figure 3: Steps for uploading an input data sample to Basepair platform.

Steps for starting an analysis by selecting a sample and one of the Ready2Run workflows offered by AWS HealthOmics.

Figure 4: Steps for starting an analysis by selecting a sample and one of the Ready2Run workflows offered by AWS HealthOmics.

Steps for accessing Basepair’s interactive visualization dashboard for data analysis.

Figure 5: Steps for accessing Basepair’s interactive visualization dashboard for data analysis.

The Basepair platform, powered by AWS HealthOmics, offers an easy-to-use graphical user interface (GUI) that enables customers to leverage HealthOmics’ storage and workflow capabilities. Its built-in visualization tools generate interactive reports that can accelerate time to scientific insight.

Figure 6: Steps for viewing execution summary and monitoring performance of workflows, including resource utilization and runtime.

Benefits of the Integrated Platform

One of the primary benefits is that it offers a user-friendly point-and-click GUI for scientists of all backgrounds to access AWS HealthOmics and its capabilities, including storage and workflows. This helps organizations build upon their existing investment in AWS HealthOmics to gain access to its inherent benefits, such as up to 50% cost savings (over traditional object storage), pricing predictability, and enhanced scalability to the wider organization.

The out-of-the-box visualization tools from Basepair augments AWS HealthOmics to generate interactive reports and help researchers of all backgrounds explore their data before collaborating with a bioinformatician on an informed question. This ultimately improves collaboration between R&D teams and accelerates time to scientific and diagnostic insight by as much as 50%, as reported by Nkarta Therapeutics.

The integrated platform provides customers with an extensive list of Ready2Run workflows tailored to meet the diverse requirements of omics data analysis. Additionally, customers have the flexibility to incorporate their proprietary pipelines into private workflows, defined in workflow languages like Nextflow, WDL, and CWL. Through a seamless GUI, users can effortlessly upload their code and define parameters for workflow execution.

It offers an execution summary, enabling bioinformaticians to monitor the performance of each workflow and identify any failing steps through task-level logs, as shown in Figure 6. An interactive timeline chart is also available during workflow execution, offering valuable insights into resource utilization and execution times. With these robust features, the platform delivers a comprehensive suite of tools necessary for designing, developing, running, and monitoring bioinformatics workflows.

Basepair’s connected cloud capability enables the compute and storage to be provisioned in the customer’s own AWS account, putting them in complete control of not just their data, but also the resources needed to store and analyze it. The complete elimination of data movement that might be required with more traditional commercial platforms means this federated approach in turn leads to improved data security and privacy. It also enhances connectivity to other tools and resources, adherence to local data residency laws, and economies of scale from a cloud consumption perspective.

Finally, another benefit is not having to resource the DevOps that would otherwise be needed to build, extend, support, and maintain the infrastructure. By removing the undifferentiated heavy lifting, the AWS HealthOmics-powered Basepair platform can help customers significantly reduce time to market or production.

Conclusion

By making AWS HealthOmics storage and workflow capabilities accessible in Basepair’s Software-as-a-Service (SaaS) solution, HCLS organizations now have a push button way of leveraging lower cost omics-optimized storage as well as deploying and running NGS analysis pipelines. This reduces development delays, security complications, and internal resource requirements, freeing them to focus their efforts on new scientific discoveries and getting critical therapies to patients. Furthermore, it is more efficient and cost effective for these organizations to build-out their own infrastructure to process omics data at scale and has enabled Basepair to focus on the differentiating aspects of its platform. This helps HCLS organizations to quickly, easily, and securely analyze large, complex omics data to accelerate scientific discovery and time to market.

“As more healthcare and life science information moves to the cloud, a growing need is to create an environment where research scientists can execute their workflows and interactively visualize their data.” said Tehsin Syed, general manager of Health AI services at AWS. “Basepair helps bring a simplified GUI-driven experience to make it easier for scientists to execute their research. Moreover, this execution is done within a customer’s own AWS account, allowing them to maintain control of their data governance, security, and usage commitments.”

If you are interested in evaluating the AWS HealthOmics-powered Basepair platform, you can sign up for a free trial on AWS Marketplace, where more information on Basepair’s unique pay-as-you-go per sample licensing model can be found. Further information and resources on the platform. Information about Basepair.

Olivia Choudhury

Olivia Choudhury

Olivia Choudhury, PhD, is a Senior Partner SA at AWS. She helps partners, in the Healthcare and Life Sciences domain, design, develop, and scale state-of-the-art solutions leveraging AWS. She has a background in genomics, healthcare analytics, federated learning, and privacy-preserving machine learning. Outside of work, she plays board games, paints landscapes, and collects manga.

Samkeet Jain

Samkeet Jain

Samkeet Jain, is an accomplished Engineering Lead at Basepair Inc, spearheading the design, development, and scaling of cutting-edge technological solutions. With a robust background in cloud architecture, Samkeet holds certifications as an AWS SA Professional and an AWS DevOps Professional, positioning him as a seasoned expert in cloud-based solutions. Samkeet brings to the table extensive experience in crafting high-traffic, scalable systems across diverse domains, including Life Science, E-commerce, and Open Banking.

Simon Valentine

Simon Valentine

Simon is a Biological Sciences graduate from Birmingham University in the UK with over 25 years of commercial leadership experience in a variety of scientific software companies. Before joining Basepair as Chief Commercial Officer, Simon led the global enterprise informatics sales team at Illumina and also served as the VP of North America Sales at Seven Bridges Genomics.