The Cannabis Sativa Genome

Public Data Sets>The Cannabis Sativa Genome
Whole Genome Shotgun Sequencing of the Cannabis Sativa Cultivar "Chemdawg"


Submitted By: nimbusinformatics
US Snapshot ID (Linux/Unix): snap-f8af5298
US Snapshot ID (Windows): snap-f8af5298
Size: 1 TB
License: Free for Academic use. Commercial use please inquire for a license. Improvements to the data please repost.
Source: Medicinal Genomics Corporation
Created On: August 22, 2011 10:33 PM GMT
Last Updated: August 22, 2011 10:33 PM GMT

This EBS snapshot contains the whole genome shotgun sequencing of the Cannabis Sativa cultivar "Chemdawg." The data is provided by Medicinal Genomics with the help of Nimbus Informatics. Academic use is free of charge and Amazon EC2 costs are the responsibility of the user. If you are a commercial enterprise please contact for a commercial license.

The sequence data is derived from an ILMN HiSeq v2.0 chemistry with 2x100 reads. There are 7 Lanes in total which add up to 131Gb of sequence. Quality statistics for the run can be found at here. The genome is estimated to be 400Mb thus an estimated 327X coverage.

There are several ways in which we anticipate people will want to use this data:

  • Reassembly of the data with different assemblers. Only two have been tried so far. SOAPdenovo and CLC bio and neither have assembled more than 2 lanes of data. Its possible a far better assembly could be made by using contrail, or the celera assembler found on the web.
  • SNP and indel calling. We have performed preliminary calls and are mapping these to blastX hits to prioritize functional variants. Sativa strain is more polymorphic than the Indica strain currently being assembled.
  • Other cloud based annotation tools.
If improvements are made to the assembly or variant calls we ask people post those to Amazon in public EBS volumes and send a note to so we can link to your improvements from our website.

©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved.