Listing Thumbnail

    Somatic Mosaicism across Human Tissues (SMaHT)

     Info
    Open data
    |
    Deployed on AWS
    The Somatic Mosaicism across Human Tissues (SMaHT) project is an NIH Common Fund consortium (2023-) aimed to comprehensively characterize somatic variation ("mosaicism") in normal human tissues. While most genetic studies have relied on blood-derived DNA, SMaHT captures the full spectrum of DNA variation across cell types, tissues, and organs from phenotypically normal individuals to better understand the role of somatic mosaicism in human development, aging, and disease progression. Researchers in the consortium develop and apply experimental and computational methods, paired with the state-of-the-art sequencing technologies, to accurately detect even rare mutations (frequency < 1%) in subpopulations of cells. In addition to generating the production data across ~20 tissue types from 150 post-mortem donors, SMaHT also produces datasets from cell line and tissue homogenate samples, to benchmark and develop new technologies and computational tools for mosaic variant detection. Th[...]

    Overview

    The Somatic Mosaicism across Human Tissues (SMaHT) project is an NIH Common Fund consortium (2023-) aimed to comprehensively characterize somatic variation ("mosaicism") in normal human tissues. While most genetic studies have relied on blood-derived DNA, SMaHT captures the full spectrum of DNA variation across cell types, tissues, and organs from phenotypically normal individuals to better understand the role of somatic mosaicism in human development, aging, and disease progression.

    Researchers in the consortium develop and apply experimental and computational methods, paired with the state-of-the-art sequencing technologies, to accurately detect even rare mutations (frequency < 1%) in subpopulations of cells. In addition to generating the production data across ~20 tissue types from 150 post-mortem donors, SMaHT also produces datasets from cell line and tissue homogenate samples, to benchmark and develop new technologies and computational tools for mosaic variant detection.

    The resulting data include high-coverage whole-genome and transcriptome data using both short-read and long-read sequencing technologies from multiple platforms (e.g., Illumina, PacBio, Oxford Nanopore Technologies, Ultima Genomics). SMaHT will also generate comprehensive genome-wide catalogs of somatic variants. We anticipate that this resource will be valuable not only for researchers studying somatic mosaicism, but also for the broader scientific community interested in large-scale WGS data from normal human tissues. More about the SMaHT project: program announcement, https://commonfund.nih.gov/smaht , and https://smaht.org/ . More about the data portal: https://data.smaht.org/  and types of data generated: https://data.smaht.org/about/consortium/data 

    Features and programs

    Open Data Sponsorship Program

    This dataset is part of the Open Data Sponsorship Program, an AWS program that covers the cost of storage for publicly available high-value cloud-optimized datasets.

    Pricing

    This is a publicly available data set. No subscription is required.

    How can we make this page better?

    We'd like to hear your feedback and ideas on how to improve this page.
    We'd like to hear your feedback and ideas on how to improve this page.

    Legal

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    AWS Data Exchange (ADX)

    AWS Data Exchange is a service that helps AWS easily share and manage data entitlements from other organizations at scale.

    Open data resources

    Available with or without an AWS account.

    How to use
    To access these resources, reference the Amazon Resource Name (ARN) using the AWS Command Line Interface (CLI). Learn more 
    Description
    SMaHT Open-Access Data - Publicly available data files without restriction, including aligned reads from WGS and RNA-Seq, as well as variants identified from cell line samples that are commercially available without restriction. Somatic (non-inherited) variants from donor tissue samples are also open-access data.
    Resource type
    S3 bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::smaht-open-data-public
    AWS region
    us-east-1
    AWS CLI access (No AWS account required)
    aws s3 ls --no-sign-request s3://smaht-open-data-public/
    Description
    SMaHT Controlled Access Data - Controlled-access data files, including aligned reads from WGS and RNA-Seq, as well as germline (inherited) from donor tissue samples. Access to these data is managed through dbGaP.
    Resource type
    S3 bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::smaht-open-data-protected
    AWS region
    us-east-1
    AWS CLI access (No AWS account required)
    aws s3 ls --no-sign-request s3://smaht-open-data-protected/
    Description
    Amazon SNS topic that publishes notifications when public access data is added for this dataset.
    Resource type
    SNS topic
    Amazon Resource Name (ARN)
    arn:aws:sns:us-east-1:874962955096:smaht-open-data-public-object_created
    AWS region
    us-east-1
    Description
    Amazon SNS topic that publishes notifications when new controlled access data is added for this dataset.
    Resource type
    SNS topic
    Amazon Resource Name (ARN)
    arn:aws:sns:us-east-1:874962955096:smaht-open-data-protected-object_created
    AWS region
    us-east-1

    Resources

    Support

    Managed By

    SMaHT Data Analysis Center (DAC)

    How to cite

    Somatic Mosaicism across Human Tissues (SMaHT) was accessed on DATE from https://registry.opendata.aws/smaht .

    License

    Similar products