Amazon Omics makes it easier to store, query, and analyze genomic, transcriptomic, and other omics data and then generate insights from that data. It simplifies and accelerates the process of storing and analyzing multiomic information for research and clinical applications, so you can focus on deriving deeper insights from your data.
With Amazon Omics storage, you can store petabytes of omics data efficiently and cost effectively, allowing scientific discovery at population scale. Amazon Omics workflows automate provisioning and scaling of compute infrastructure, so you can run bioinformatics analysis pipelines at production scale and spend less time managing infrastructure and more time conducting research. Amazon Omics analytics simplifies preparing omics data for multimodal analyses, letting you bring multiomics and health record data together and generate more targeted and personalized therapy. These features are also HIPAA eligible.
Amazon Omics storage is compatible with bioinformatics file formats such as FASTQ, BAM, and CRAM and allows you to store, discover, and share this data efficiently and at low cost. These file formats are stored as read-set objects within a sequence store. You can also store reference genomes in the FASTA format. Data is imported as immutable objects with unique identifiers to support workloads that require strict data provenance. Access to individual data objects, including references and read-set objects, can be controlled using tags and attribute-based access controls through AWS Identity and Access Management (IAM). To reduce long-term storage costs, data objects that have not been accessed within 30 days are automatically moved to an archive storage class. Archived objects can be reactivated at any time with an API call.
Amazon Omics helps you run bioinformatics workflows at scale. Specify your workflow definition, the tools you want to use, and the data to analyze, and Amazon Omics will provision the underlying infrastructure and implement the workflow. Workflow definitions compliant with WDL 1.1 and Nextflow 22.10.0 DSL2 specifications are supported. Workflows use OCI-compliant containerized tooling stored in private registries in Amazon Elastic Container Registry (ECR). You can analyze data from S3 buckets or Amazon Omics sequence stores. You can control who has access to specific workflows, control the total amount of resources used, and manage the priority of implementation through workflow run groups.
Analysis at scale
With Amazon Omics, you can quickly ingest and transform genomics data formats such as (g)VCF, GFF3, and TSV/CSVs into Apache Parquet. You can make the genomics data accessible through analytics services such as Amazon Athena. You can transform both variant data (data from an individual sample) and annotation data (known information about positions in the genome). You can control access to analytics stores with AWS Lake Formation, making it easier to perform queries across diverse data sources while implementing fine-grained access controls. For example, you can securely combine the genome data of individuals with their medical history from Amazon HealthLake—which can include prior treatments, medications, or lab reports—to facilitate precision medicine.
Data collaboration and provenance
Amazon Omics makes it easier for researchers to tag collaborators, set up their permissions, and share data securely with them. This simplifies how you make your omics data findable, accessible, interoperable, and reusable (FAIR). With domain-specific metadata, you can link Amazon Omics data stores with other omics and healthcare data to facilitate multiomic and multimodal analyses.
Security, privacy, and compliance
Amazon Omics is HIPAA eligible. You can apply attribute-based controls to define fine-grained data access and governance. Comprehensive logging and provenance capture is built in so you know what data was accessed, who accessed it, and when.