Listing Thumbnail

    NIH NCBI Sequence Read Archive (SRA) on AWS

     Info
    Open data
    |
    Deployed on AWS
    The Sequence Read Archive (SRA), produced by the [National Center for Biotechnology Information (NCBI)](https://www.ncbi.nlm.nih.gov/) at the [National Library of Medicine (NLM)](http://nlm.nih.gov/) at the [National Institutes of Health (NIH)](http://www.nih.gov/), stores raw DNA sequencing data and alignment information from high-throughput sequencing platforms. The SRA provides open access to these biological sequence data to support the research community's efforts to enhance reproducibility and make new discoveries by comparing data sets. Buckets in this registry contain public SRA data in the original (user submitted) format from select high value and newly-released studies as well as all public-access SRA formatted ETL+BQS data. Also included is all SRA metadata that can be leveraged for attribute-based data discovery.

    Overview

    The Sequence Read Archive (SRA), produced by the National Center for Biotechnology Information (NCBI)  at the National Library of Medicine (NLM)  at the National Institutes of Health (NIH) , stores raw DNA sequencing data and alignment information from high-throughput sequencing platforms. The SRA provides open access to these biological sequence data to support the research community's efforts to enhance reproducibility and make new discoveries by comparing data sets. Buckets in this registry contain public SRA data in the original (user submitted) format from select high value and newly-released studies as well as all public-access SRA formatted ETL+BQS data. Also included is all SRA metadata that can be leveraged for attribute-based data discovery.

    Features and programs

    Open Data Sponsorship Program

    This dataset is part of the Open Data Sponsorship Program, an AWS program that covers the cost of storage for publicly available high-value cloud-optimized datasets.

    Pricing

    This is a publicly available data set. No subscription is required.

    How can we make this page better?

    We'd like to hear your feedback and ideas on how to improve this page.
    We'd like to hear your feedback and ideas on how to improve this page.

    Legal

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    AWS Data Exchange (ADX)

    AWS Data Exchange is a service that helps AWS easily share and manage data entitlements from other organizations at scale.

    Open data resources

    Available with or without an AWS account.

    How to use
    To access these resources, reference the Amazon Resource Name (ARN) using the AWS Command Line Interface (CLI). Learn more 
    Description
    .bam, .cram, and .fastq files in a public S3 bucket. This is the first of two S3 buckets for source submissions from sequencing methodologies such as PacBio, Oxford Nanopore Technologies, and 10X Genomics.
    Resource type
    S3 bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::sra-pub-src-1
    AWS region
    us-east-1
    AWS CLI access (No AWS account required)
    aws s3 ls --no-sign-request s3://sra-pub-src-1/
    Description
    .bam, .cram, and .fastq files in a public S3 bucket. This is the second of two S3 buckets for source submissions from sequencing methodologies such as PacBio, Oxford Nanopore Technologies, and 10X Genomics.
    Resource type
    S3 bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::sra-pub-src-2
    AWS region
    us-east-1
    AWS CLI access (No AWS account required)
    aws s3 ls --no-sign-request s3://sra-pub-src-2/
    Description
    .sra files in a public S3 bucket. This bucket contains all open access SRA submissions in the SRA Normalized format with full base quality scores. The SRA Toolkit will be required to convert these objects into FASTQ or SAM files. Note that this bucket is updated on a daily basis, however, there may be a one to two day lag between accessions being findable via the [SRA Search](https://www.ncbi.nlm.nih.gov/sra/advanced) and being accessible in this S3 bucket.
    Resource type
    S3 bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::sra-pub-run-odp
    AWS region
    us-east-1
    AWS CLI access (No AWS account required)
    aws s3 ls --no-sign-request s3://sra-pub-run-odp/
    Description
    .sra files in a controlled-access S3 bucket. This bucket contains controlled-access dbGaP submissions in the SRA Normalized format with full base quality scores. The SRA Toolkit will be required to convert these objects into FASTQ or SAM files. Note that this bucket is updated on a daily basis, however, there may be a one to two day lag between accessions being findable via the [SRA Search](https://www.ncbi.nlm.nih.gov/sra/advanced) and being accessible in this S3 bucket.
    Resource type
    S3 bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::sra-ca-run-odp
    AWS region
    us-east-1
    AWS CLI access (No AWS account required)
    aws s3 ls --no-sign-request s3://sra-ca-run-odp/
    Description
    Metadata files for the Sequence Read Archive, ready to load into [AWS Glue](https://aws.amazon.com/glue/) and query with [Amazon Athena](https://aws.amazon.com/athena/).
    Resource type
    S3 bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::sra-pub-metadata-us-east-1/sra/metadata
    AWS region
    us-east-1
    AWS CLI access (No AWS account required)
    aws s3 ls --no-sign-request s3://sra-pub-metadata-us-east-1/sra/metadata/
    Description
    Update notifications for s3://sra-pub-run-odp. Users can subscribe to this SNS topic with [AWS Lambda](https://aws.amazon.com/lambda/) or [AWS Simple Queue Service](https://aws.amazon.com/sqs/).
    Resource type
    SNS topic
    Amazon Resource Name (ARN)
    arn:aws:sns:us-east-1:867126678632:sra-pub-run-odp-objects
    AWS region
    us-east-1

    Resources

    Support

    How to cite

    NIH NCBI Sequence Read Archive (SRA) on AWS was accessed on DATE from https://registry.opendata.aws/ncbi-sra .

    Similar products