The Cancer Genome Atlas (TCGA) is a joint effort of the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) to accelerate our understanding of the molecular basis of cancer. TCGA-funded researchers across the United States have produced a corpus of raw and processed genomic, transcriptomic, and epigenomic data from thousands of cancer patients.

These data are now freely available on AWS via the National Cancer Institute’s Cancer Genomics Cloud pilot to credentialed researchers subject to NIH data sharing policies. As the NIH Trusted Partner for this project, Seven Bridges Genomics is responsible for authorizing access to the data.

The Cancer Genome Atlas is one of the world’s largest collections of cancer genome data available. Making the data available on a cloud platform greatly lowers the barrier to entry for researchers that are seeking to work with these data to create better models of disease, and ultimately develop new treatments for cancer. Qualified researchers can use the data on-demand without worrying about download time or storage costs.

For more information, please visit http://www.cancergenomicscloud.org/. If you have any questions, please email cgc@sbgenomics.com.

While the data are hosted within Amazon S3, access is currently only possible through the National Cancer Institute’s Cancer Genomics Cloud Pilot. Researchers wishing to access the TCGA controlled data must be registered within that system, and also be listed on an approved TCGA Data Access Request.

For more information on gaining accessing to these data, visit: http://www.cancergenomicscloud.org/controlled-access-data or http://docs.cancergenomicscloud.org/.

The Cancer Genomics Cloud provides visual and programmatic methods of querying, analyzing, and securely collaborating with TCGA data. A semantic triplestore allows you to query more than 120 properties of TCGA data to find the data that is most relevant to your work.

Hundreds of Common Workflow Language-compliant tools and workflows are available, enabling you to immediately run the most common cancer genomics analyses. Additionally, a software development kit allows you to easily deploy your own tools in a reproducible and portable manner.

Tutorials (coming soon):

  • Visually querying and accessing TCGA data.
  • Programmatically querying and accessing TCGA data.
  • Building and executing a computational workflow
Source

The Cancer Genome Atlas, a joint effort of the National Cancer Institute (NCI) and the National Human Genome Research Institute

Category

Genomics, Life Sciences

Format

Typical genomics data formats are used throughout. These vary based on the type of analysis performed and include everything from raw files to delimited summarizations and metadata xml files.

License

Data use is subject to the access and publication polices of the source, including the NIH Genomic Data Sharing policy. Distribution of the data is subject to NIH Trusted Partner policies. 

Storage Service

Amazon S3

Location

Amazon S3 in the US East region (N. Virginia)

Update Frequency

Weekly

The Cancer Genomics Cloud Pilot, operated by Seven Bridges Genomics, has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, Department of Health and Human Services, under Contract No. HHSN261201400008C.

DNA_resize

If you are interested in using the TCGA data or learning more about this project, please fill out the form below.