The International Cancer Genome Consortium (ICGC) coordinates projects with the common aim of accelerating research into the causes and control of cancer. The PanCancer Analysis of Whole Genomes (PCAWG) study is an international collaboration to identify common patterns of mutation in whole genomes from ICGC. More than 2,400 consistently analyzed genomes corresponding to over 1,100 unique ICGC donors are now freely available on Amazon S3 to credentialed researchers subject to ICGC data sharing policies. These data include reference genome alignments, SNV and indel calls, copy number, and structural variation calls. These data are the first installment of ICGC data to be posted, and the dataset is expected to grow with the addition of data from more cancer patients.

Providing one of the world’s largest collections of curated cancer genome data in the cloud to qualified researchers will enhance collaboration and potentially accelerate the development of new treatments for cancer patients. Authorized researchers are now able to analyze the data on-demand without worrying about storage costs and download time.

For more information, please visit https://dcc.icgc.org/icgc-in-the-cloud/aws. If you have any questions, please email dcc-support@icgc.org.

Users can search for files using the ICGC Data Portal and access individual or related sets of alignment and variant files through the ICGC Storage Client. These data are controlled access and users must follow the procedures outlined at: https://dcc.icgc.org/icgc-in-the-cloud/guide.

Users can access the experimental metadata as XML files within the Amazon S3 bucket at s3://oicr.icgc.meta/metadata. The metadata are not restricted access and users can access using native AWS tools and SDKs, or via a simple web request. The contents of the metadata files are described within a README file located at http://oicr.icgc.meta.s3.amazonaws.com/metadata/README

For more information on the ICGC data, visit: https://dcc.icgc.org/icgc-in-the-cloud/aws.

Users can search for files using the ICGC Data Portal and access individual or related sets of alignment files through the project’s command line tool, the PancCancer Launcher, described below. The experimental metadata is also available as XML files within the S3 bucket as noted above.

Based on the Consonance project, the Docker-based PanCancer Launcher provides pre-installed alignment workflows, allowing users to align their own sequence data on AWS identically to the method used for PanCancer genomes. More information can be found on the ICGC on the Cloud page.

Source

The International Cancer Genome Consortium

Category

Genomics, Life Sciences

Format

BAM and VCF

License

Data use is subject to the access and publication polices of the source. Distribution of the data is subject to ICGC Trusted Partner Approval.  More information on terms of use is available at https://icgc.org/daco.

Storage Service

Amazon S3

Location

s3://icgc in US Standard (N. Virginia)

Update Frequency

New data is added as soon as it is available.

DNA_resize

Educators, researchers and students can apply for free promotional credits to take advantage of Public Datasets on AWS. If you have a research project that could take advantage of ICGC on AWS, you can apply for AWS Cloud Credits for Research.