
Overview
The Somatic Mosaicism across Human Tissues (SMaHT) project is an NIH Common Fund consortium (2023-) aimed to comprehensively characterize somatic variation ("mosaicism") in normal human tissues. While most genetic studies have relied on blood-derived DNA, SMaHT captures the full spectrum of DNA variation across cell types, tissues, and organs from phenotypically normal individuals to better understand the role of somatic mosaicism in human development, aging, and disease progression.
Researchers in the consortium develop and apply experimental and computational methods, paired with the state-of-the-art sequencing technologies, to accurately detect even rare mutations (frequency < 1%) in subpopulations of cells. In addition to generating the production data across ~20 tissue types from 150 post-mortem donors, SMaHT also produces datasets from cell line and tissue homogenate samples, to benchmark and develop new technologies and computational tools for mosaic variant detection.
The resulting data include high-coverage whole-genome and transcriptome data using both short-read and long-read sequencing technologies from multiple platforms (e.g., Illumina, PacBio, Oxford Nanopore Technologies, Ultima Genomics). SMaHT will also generate comprehensive genome-wide catalogs of somatic variants. We anticipate that this resource will be valuable not only for researchers studying somatic mosaicism, but also for the broader scientific community interested in large-scale WGS data from normal human tissues. More about the SMaHT project: program announcement, https://commonfund.nih.gov/smaht , and https://smaht.org/ . More about the data portal: https://data.smaht.org/ and types of data generated: https://data.smaht.org/about/consortium/data
Features and programs
Open Data Sponsorship Program
Pricing
This is a publicly available data set. No subscription is required.
How can we make this page better?
Legal
Content disclaimer
Delivery details
AWS Data Exchange (ADX)
AWS Data Exchange is a service that helps AWS easily share and manage data entitlements from other organizations at scale.
Open data resources
Available with or without an AWS account.
- How to use
- To access these resources, reference the Amazon Resource Name (ARN) using the AWS Command Line Interface (CLI). Learn more
- Description
- SMaHT Open-Access Data - Publicly available data files without restriction, including aligned reads from WGS and RNA-Seq, as well as variants identified from cell line samples that are commercially available without restriction. Somatic (non-inherited) variants from donor tissue samples are also open-access data.
- Resource type
- S3 bucket
- Amazon Resource Name (ARN)
- arn:aws:s3:::smaht-open-data-public
- AWS region
- us-east-1
- AWS CLI access (No AWS account required)
- aws s3 ls --no-sign-request s3://smaht-open-data-public/
- Description
- SMaHT Controlled Access Data - Controlled-access data files, including aligned reads from WGS and RNA-Seq, as well as germline (inherited) from donor tissue samples. Access to these data is managed through dbGaP.
- Resource type
- S3 bucket
- Amazon Resource Name (ARN)
- arn:aws:s3:::smaht-open-data-protected
- AWS region
- us-east-1
- AWS CLI access (No AWS account required)
- aws s3 ls --no-sign-request s3://smaht-open-data-protected/
- Description
- Amazon SNS topic that publishes notifications when public access data is added for this dataset.
- Resource type
- SNS topic
- Amazon Resource Name (ARN)
- arn:aws:sns:us-east-1:874962955096:smaht-open-data-public-object_created
- AWS region
- us-east-1
- Description
- Amazon SNS topic that publishes notifications when new controlled access data is added for this dataset.
- Resource type
- SNS topic
- Amazon Resource Name (ARN)
- arn:aws:sns:us-east-1:874962955096:smaht-open-data-protected-object_created
- AWS region
- us-east-1
Resources
Vendor resources
Support
Managed By
SMaHT Data Analysis Center (DAC)
How to cite
Somatic Mosaicism across Human Tissues (SMaHT) was accessed on DATE from https://registry.opendata.aws/smaht .
License
NIH Genomic Data Sharing Policy - https://gdc.cancer.gov/access-data/data-access-policies
Similar products

