Guidance for Migration & Storage of Sequence Data with AWS HealthOmics
Overview
How it works
These technical details feature an architecture diagram to illustrate how to effectively use this solution. The architecture diagram shows the key components and their interactions, providing an overview of the architecture's structure and functionality step-by-step.
Well-Architected Pillars
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
Operational Excellence
This Guidance is implemented using AWS CDK where the business logic, infrastructure, and configuration are defined as code. This allows changes and integration to perform as code within a version control system.
Security
Amazon S3 is protected by the AWS secure global network infrastructure. Security and Compliance are a shared responsibility between AWS and the customer. And this shared model helps relieve the operational burden from the customer because AWS operates, manages, and controls the components of the operating system.
Amazon S3 secures data from unauthorized access with encryption features and access management tools. HealthOmics provides encryption by default to protect sensitive customer data at rest by using a service-owned AWS Key Management Service (AWS KMS) key. Customer-managed KMS keys are also supported. For more on protection with HealthOmics, follow Data protection in AWS HealthOmics.
Reliability
By building this Guidance using AWS serverless and managed services, AWS is responsible for the efficient operation of its services and enables the applications to scale with demand. This ensures that the workload performs its intended function correctly and consistently when it's expected to. It also allows customers to operate and test the workload through its total lifecycle.
Performance Efficiency
The backbones of this Guidance are AWS serverless and managed services that minimize operational overhead, such as server management. HealthOmics Storage is purpose built for omics sequence data, allowing customers to store, discover, and share raw sequence data efficiently, securely, and at low cost.
Cost Optimization
This Guidance includes the functionality to move data into HealthOmics Storage. HealthOmics provides a cost-effective, omics-aware storage option for reference and sequence data that can reduce the Total Cost of Ownership (TCO) for storing raw sequence data. Such data can include BAMs, CRAMs, and FASTQ file formats.
HealthOmics automatically moves data to the less expensive storage class if the data are not regularly accessed (such as data that has not been accessed for more than 30 days). This is similar to the Amazon S3 Intelligent-Tiering storage class that automates storage cost savings by moving data when access patterns change, resulting in cost savings for customers.
This Guidance is built with the AWS serverless service, Lambda, for event-driven computing. Step Functions is used for orchestration, sequencing the data import workflow. AWS serverless services and products allow applications to scale quickly with demand, while ensuring that only the minimum resources are required.
Sustainability
When building cloud workloads, the practice of sustainability is knowing the impacts of the services used and applying design principles to reduce those impacts. In the case of this Guidance, because it relies extensively on serverless and managed services, the services scale to continually match the load, but with just the minimum resources needed, reducing the risk of over-provisioning resources.
Disclaimer
Did you find what you were looking for today?
Let us know so we can improve the quality of the content on our pages