The 3000 Rice Genome Project is an international effort to sequence the genomes of 3,024 rice varieties from 89 countries. The collaborating organizations are comprised of the Chinese Academy of Agricultural Sciences, BGI Shenzhen, and the International Rice Research Institute (IRRI). Rice is the leading food source across the globe, and is a vital crop to study to address food security and other global issues. Through analysis of these genomes, researchers can potentially identify genes for important agronomic traits such as better nutrition, climate change tolerance, and disease resistance.

AWS has made the 3000 Rice Genome data freely available on Amazon S3 so that anyone can use our on-demand computing resources to perform analysis and create new products without needing to worry about the cost of storing the data or the time required to download it.

For more information about the 3000 Rice Genomes Project, please visit  

The whole genome sequence data was analyzed on the DNAnexus platform, comparing each of the 3,024 varieties against five different reference genomes. Over 100TB of results consist of:

  1. Alignment of pair-end reads from whole-genome resequencing of 3,024 rice accessions to 5 published rice reference genomes (BWA-MEM version 0.7.10)
  2. Discovery of Single Nucleotide Polymorphisms and small indels (GATK version 3.2.2)

A description of the analysis steps is available at: s3://3kricegenome/README-snp_pipeline.txt or

The 3,000 Rice Genome on AWS data set makes available the reference alignments and variant calls available in sorted and indexed BAM files and indexed VCF files, respectively.

The data are organized using a simple directory structure based on the reference genome and source sample.For example, given the source sample IRIS_313–15896 analyzed against the 93–11 reference genome, you would find these associated BAM and VCF files in the following locations:




The index of BAM and VCF files are co-located for fast random access of files. As an example, here we query for alignments on chromosome 1 from position 1000 to 1100 using samtools:

# Query for the chromosome 1 from base position 1000 to 1100

samtools view 9311_chr01:1000-1100

Experimental metadata for the study are available via the original publication (doi:10.1186/2047-217X-3-7) . Summarized experimental metadata is available in ISATAB format at



A manifest of all files in the bucket is also available at:



Source sequence data, as well as more details on the experimental data, are available from the Sequence Read Archives (SRA) at NCBI, EBI, and DDBJ:

The five reference genomes are not part of this Public Data Set, but are available from the following sources:

Reference Genome

File Name

Nipponbare IRGSP-1.0_genome.fasta.gz
9311 9311.fa.gz




International Rice Research Institute
Format BAM, VCF
This data is available for anyone to use under the terms of the Toronto Statement
Storage Service Amazon S3
s3://3kricegenome in US Standard (N. Virginia)
Update Frequency None

The Rice SNP-Seek Database

The International Rice Informatics Consortium (IRIC) has integrated the data into their Rice SNP-Seek site that provides Genotype, Phenotype, and Variety Information for rice.

IRIC seeks to centralize information access to rice research data and provide computational tools to facilitate rice improvement via discovery of new gene-trait associations and accelerated breeding.

Educators, researchers and students can also apply for free credits to take advantage of the utility computing platform offered by AWS, along with Public Datasets such as 3000 Rice Genome on AWS. If you have a research project that could take advantage of 3000 Rice Genome data on AWS, you can apply for AWS Cloud Credits for Research.