AWS Solutions Library

Guidance for Bioinformatics Workflow Development Using DevOps on AWS

Important: This Guidance requires the use of AWS CodeCommit, which is no longer available to new customers. Existing customers of AWS CodeCommit can continue using and deploying this Guidance as normal.

Go to sample code

Overview

This Guidance demonstrates how to set up a continuous integration and continuous delivery (CI/CD) pipeline to automate the lifecycle of your bioinformatics workflows on AWS HealthOmics. By integrating your existing workflows with source control systems like Git, the CI/CD pipeline enables efficient development, testing, version management, and deployment of workflow updates. Whenever code changes are committed, the pipeline automatically builds, tests, and deploys the new workflow version. This approach streamlines your workflow processes, reducing manual effort while maintaining data provenance and consistent, repeatable results across all versioned workflows.

How it works

These technical details feature an architecture diagram to illustrate how to effectively use this solution. The architecture diagram shows the key components and their interactions, providing an overview of the architecture's structure and functionality step-by-step.

Download the architecture diagram

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

Operational Excellence

For efficient and effective operations, this Guidance helps you automate your build, test, and deployment processes with CodeBuild and CodePipeline . CodeCommit is another service that supports your operations; it stores code in private Git repositories for version control. You can track changes, test new versions, and roll back if necessary. You can also orchestrate your automated tests with Step Functions to validate the quality and reliability of your deployments. Finally, you can centralize the management of your workflows in HealthOmics to improve visibility and monitoring; HealthOmics is a managed service purpose-built for healthcare and life science organizations to store, query, and analyze omics data.

Read the Operational Excellence whitepaper

Security

CodeCommit , HealthOmics , and Amazon ECR work in tandem to protect your systems, applications, and data from potential threats. Specifically, CodeCommit provides secure storage and version control for your workflow code, with access controls, change tracking, and encryption. HealthOmics offers isolated, secure, and scalable processing of your bioinformatics workflows. Amazon ECR helps ensure secure storage and access control for your container images. Additionally, by separating your CI/CD and production environments, implementing least-privilege access, and securely managing your artifacts, you can achieve a higher level of isolation and security for your bioinformatics workflows.

Read the Security whitepaper

Reliability

Building resilient and highly available systems that can withstand failures requires services like CodePipeline , Step Functions , and HealthOmics . CodePipeline provides an automated way to build, test, and deploy new versions of your workflows. Step Functions orchestrates the various steps in your CI/CD pipeline, setting the framework for a resilient and fault-tolerant way to coordinate and automatically retry failed steps. HealthOmics manages the underlying infrastructure and resource management, supporting the reliability and availability of your workflow processing.

Read the Reliability whitepaper

Performance Efficiency

You can optimize the use of computing resources while maximizing efficiency with CodeBuild , CodePipeline , Step Functions , and HealthOmics . CodeBuild is a service with capabilities to support a fully managed build and test workflow with features like cache and auto-discovery. The efficient deployment processes, powered by CodePipeline and Step Functions , minimize the risk of performance regressions. Finally, HealthOmics provides a managed service for running your bioinformatics workflows, handling the provisioning and scaling of the underlying compute resources and storage systems for optimal workflow performance.

Read the Performance Efficiency whitepaper

Cost Optimization

By supporting cross-account deployments, this Guidance helps you maintain secure and isolated environments for development, testing, and production, reducing the risk of inadvertent resource utilization and costs. It utilizes CodeBuild , CodePipeline , Lambda , Amazon ECR , and HealthOmics to do this. For example, the automated build and deployment processes of CodeBuild and CodePipeline allow only the necessary resources to be provisioned. By using Lambda for lightweight tasks, you reduce the need for always-on compute resources. Also, storing your built container images in Amazon ECR allows for reuse across multiple workflow deployments, saving time and compute costs. Furthermore, HealthOmics , as a managed service, eliminates the need for you to manage the underlying infrastructure and configuration complexities and reduces your operational costs.

Read the Cost Optimization whitepaper

Sustainability

Minimize your carbon footprint and support responsible resource utilization with CodeBuild , Lambda , Amazon ECR , and HealthOmics . CodeBuild only provisions the necessary compute resources to perform the build and deployment tasks, scaling up and down as required, reducing energy consumption and the associated environmental impacts. Lambda avoids the need to provision and manage dedicated server infrastructure, running only when needed and shutting down when idle. Amazon ECR provides centralized, scalable, and durable storage of your container images, eliminating the need for additional container registries or storage solutions and reducing the overall hardware and energy footprint. By utilizing HealthOmics , you can use the service's scalable and serverless architecture to run your bioinformatics workflows and help lower your overall energy consumption.

Read the Sustainability whitepaper

Deploy with confidence

Ready to deploy? Review the sample code on GitHub for detailed deployment instructions to deploy as-is or customize to fit your needs.

Go to sample code

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages