The 3000 Rice Genome Project is an international effort to sequence the genomes of 3,024 rice varieties from 89 countries. The collaborating organizations are comprised of the Chinese Academy of Agricultural Sciences, BGI Shenzhen, and the International Rice Research Institute (IRRI). Rice is the leading food source across the globe, and is a vital crop to study to address food security and other global issues. Through analysis of these genomes, researchers can potentially identify genes for important agronomic traits such as better nutrition, climate change tolerance, and disease resistance.

AWS has made the 3000 Rice Genome data freely available on Amazon S3 so that anyone can use our on-demand computing resources to perform analysis and create new products without needing to worry about the cost of storing the data or the time required to download it.

For more information about the 3000 Rice Genomes Project, please visit http://iric.irri.org/resources/3000-genomes-project.  

The whole genome sequence data was analyzed on the DNAnexus platform, comparing each of the 3,024 varieties against five different reference genomes. Over 100TB of results consist of:

  1. Alignment of pair-end reads from whole-genome resequencing of 3,024 rice accessions to 5 published rice reference genomes (BWA-MEM version 0.7.10)
  2. Discovery of Single Nucleotide Polymorphisms and small indels (GATK version 3.2.2)

A description of the analysis steps is available at: s3://3kricegenome/README-snp_pipeline.txt or http://s3.amazonaws.com/3kricegenome/README-snp_pipeline.txt

The 3,000 Rice Genome on AWS data set makes available the reference alignments and variant calls available in sorted and indexed BAM files and indexed VCF files, respectively.

The data are organized using a simple directory structure based on the reference genome and source sample.For example, given the source sample IRIS_313–15896 analyzed against the 93–11 reference genome, you would find these associated BAM and VCF files in the following locations:

s3://3kricegenome/9311/IRIS_313–15896.realigned.bam

s3://3kricegenome/9311/IRIS_313–15896.snp.vcf.gz

Or:

http://s3.amazonaws.com/3kricegenome/9311/IRIS_313–15896.realigned.bam

http://s3.amazonaws.com/3kricegenome/9311/IRIS_313–15896.snp.vcf.gz

The index of BAM and VCF files are co-located for fast random access of files. As an example, here we query for alignments on chromosome 1 from position 1000 to 1100 using samtools:

# Query for the chromosome 1 from base position 1000 to 1100

samtools view http://s3.amazonaws.com/3kricegenome/9311/IRIS_313-15896.realigned.bam 9311_chr01:1000-1100

Experimental metadata for the study are available via the original publication (doi:10.1186/2047-217X-3-7) . Summarized experimental metadata is available in ISATAB format at

s3://3kricegenome/ERP005654.zip

Or:

http://s3.amazonaws.com/3kricegenome/ERP005654.zip

A manifest of all files in the bucket is also available at:

s3://3kricegenome/MANIFEST

Or:

http://s3.amazonaws.com/3kricegenome/MANIFEST

Source sequence data, as well as more details on the experimental data, are available from the Sequence Read Archives (SRA) at NCBI, EBI, and DDBJ:

The five reference genomes are not part of this Public Data Set, but are available from the following sources:

Reference Genome

File Name

URL
Nipponbare IRGSP-1.0_genome.fasta.gz http://rapdb.dna.affrc.go.jp/
9311 9311.fa.gz ftp://public.genomics.org.cn/BGI/rice_seq/93-11/
IR64

os.ir64.cshl.draft.1.0.scaffold.fa.gz

http://schatzlab.cshl.edu/data/rice/
Kasalath

kasalath_genome.tar.gz

http://rice50ks.dna.affrc.go.jp/
DJ123

os.dj123.cshl.draft.1.0.scaffold.fa.gz

http://schatzlab.cshl.edu/data/rice/
400x303_RiceGenome_green

If you would like to show us what you can do with 3000 Rice Genome Data on AWS or would like to receive updates on IRRI data on AWS, please complete this form.

Educators, researchers and students can also apply for free credits to take advantage of the utility computing platform offered by AWS, along with Public Datasets such as the IRRI data on AWS. If you have a research project that could take advantage of 3000 Rice Genome data on AWS, you can apply for AWS Cloud Credits for Research.


Source
International Rice Research Institute
Category
Genomic
Format BAM, VCF
License
This data is available for anyone to use under the terms of the Toronto Statement
Storage Service Amazon S3
Location
s3://3kricegenome in US Standard (N. Virginia)
Update Frequency None

The Rice SNP-Seek Database

The International Rice Informatics Consortium (IRIC) has integrated the data into their Rice SNP-Seek site that provides Genotype, Phenotype, and Variety Information for rice.

IRIC seeks to centralize information access to rice research data and provide computational tools to facilitate rice improvement via discovery of new gene-trait associations and accelerated breeding.