This EBS snapshot contains the whole genome shotgun sequencing of the Cannabis Sativa cultivar "Chemdawg." The data is provided by Medicinal Genomics with the help of Nimbus Informatics. Academic use is free of charge and Amazon EC2 costs are the responsibility of the user. If you are a commercial enterprise please contact Medicinalgenomics@gmail.com for a commercial license.
The sequence data is derived from an ILMN HiSeq v2.0 chemistry with 2x100 reads. There are 7 Lanes in total which add up to 131Gb of sequence. Quality statistics for the run can be found at here. The genome is estimated to be 400Mb thus an estimated 327X coverage.
There are several ways in which we anticipate people will want to use this data:
- Reassembly of the data with different assemblers. Only two have been tried so far. SOAPdenovo and CLC bio and neither have assembled more than 2 lanes of data. Its possible a far better assembly could be made by using contrail, or the celera assembler found on the web.
- SNP and indel calling. We have performed preliminary calls and are mapping these to blastX hits to prioritize functional variants. Sativa strain is more polymorphic than the Indica strain currently being assembled.
- Other cloud based annotation tools.