Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Sign in
Your Saved List Become a Channel Partner Sell in AWS Marketplace Amazon Web Services Home Help

NIH NCBI Sequence Read Archive (SRA) on AWS

NIH NCBI Sequence Read Archive (SRA) on AWS

This product is part of the AWS Open Data Sponsorship Program and contains data sets that are publicly available for anyone to access and use. No subscription is required. Unless specifically stated in the applicable data set documentation, data sets available through the AWS Open Data Sponsorship Program are not provided and maintained by AWS.

Description

The Sequence Read Archive (SRA), produced by the National Center for Biotechnology Information (NCBI)  at the National Library of Medicine (NLM)  at the National Institutes of Health (NIH) , stores raw DNA sequencing data and alignment information from high-throughput sequencing platforms. The SRA provides open access to these biological sequence data to support the research community's efforts to enhance reproducibility and make new discoveries by comparing data sets. Buckets in this registry contain public SRA data in the original (user submitted) format from select high value and newly-released studies as well as all public-access SRA formatted ETL+BQS data. Also included is all SRA metadata that can be leveraged for attribute-based data discovery.

How to cite

NIH NCBI Sequence Read Archive (SRA) on AWS was accessed on DATE from https://registry.opendata.aws/ncbi-sra .

Update frequency
Daily
Support information

Contact: sra@ncbi.nlm.nih.gov

General AWS Data Exchange support

Resources on AWS

Description

.bam, .cram, and .fastq files in a public S3 bucket. This is the first of two S3 buckets for source submissions from sequencing methodologies such as PacBio, Oxford Nanopore Technologies, and 10X Genomics.

Resource type
S3 Bucket
Amazon Resource Name (ARN)
arn:aws:s3:::sra-pub-src-1
AWS Region
us-east-1

AWS CLI Access (No AWS account required)

aws s3 ls --no-sign-request s3://sra-pub-src-1/
Description

.bam, .cram, and .fastq files in a public S3 bucket. This is the second of two S3 buckets for source submissions from sequencing methodologies such as PacBio, Oxford Nanopore Technologies, and 10X Genomics.

Resource type
S3 Bucket
Amazon Resource Name (ARN)
arn:aws:s3:::sra-pub-src-2
AWS Region
us-east-1

AWS CLI Access (No AWS account required)

aws s3 ls --no-sign-request s3://sra-pub-src-2/
Description

.sra files in a public S3 bucket. This bucket contains all open access SRA submissions in the SRA Normalized format with full base quality scores. The SRA Toolkit will be required to convert these objects into FASTQ or SAM files. Note that this bucket is updated on a daily basis, however, there may be a one to two day lag between accessions being findable via the SRA Search  and being accessible in this S3 bucket.

Resource type
S3 Bucket
Amazon Resource Name (ARN)
arn:aws:s3:::sra-pub-run-odp
AWS Region
us-east-1

AWS CLI Access (No AWS account required)

aws s3 ls --no-sign-request s3://sra-pub-run-odp/
Description

.sra files in a controlled-access S3 bucket. This bucket contains controlled-access dbGaP submissions in the SRA Normalized format with full base quality scores. The SRA Toolkit will be required to convert these objects into FASTQ or SAM files. Note that this bucket is updated on a daily basis, however, there may be a one to two day lag between accessions being findable via the SRA Search  and being accessible in this S3 bucket.

Resource type
S3 Bucket
Amazon Resource Name (ARN)
arn:aws:s3:::sra-ca-run-odp
AWS Region
us-east-1
Description

Metadata files for the Sequence Read Archive, ready to load into AWS Glue and query with Amazon Athena.

Resource type
S3 Bucket
Amazon Resource Name (ARN)
arn:aws:s3:::sra-pub-metadata-us-east-1/sra/metadata
AWS Region
us-east-1

AWS CLI Access (No AWS account required)

aws s3 ls --no-sign-request s3://sra-pub-metadata-us-east-1/sra/metadata/
Description

Update notifications for s3://sra-pub-run-odp. Users can subscribe to this SNS topic with AWS Lambda or AWS Simple Queue Service.

Resource type
SNS Topic
Amazon Resource Name (ARN)
arn:aws:sns:us-east-1:867126678632:sra-pub-run-odp-objects
AWS Region
us-east-1