Listing Thumbnail

    1000 Genomes Phase 3 Reanalysis with DRAGEN 3.5, 3.7, 4.0, 4.2, and 4.4

     Info
    Open data
    |
    Deployed on AWS
    # Description ## Overivew This dataset contains alignment files and small variant (includes single nucleotide variants (SNV) and indels), copy number variant (CNV), short tandem repeat (*i.e.*, repeat expansion; STR), structural variant (SV) and other variant call files from the [1000 Genomes Project (1KGP) Phase 3 dataset](https://www.internationalgenome.org/) (3,202 individuals, 602 trios) using Illumina DRAGEN v3.5.7b, v3.7.6, v4.0.3, v4.2.7, and v4.4.7 software. All DRAGEN analyses were performed in the cloud using the [Illumina Connected Analytics](https://www.illumina.com/products/by-type/informatics-products/connected-analytics.html) bioinformatics platform powered by Amazon Web Services (see ['Data solution empowering population genomics'](https://www.illumina.com/science/genomics-research/articles/data-solution-empowering-population-genomics-research.html) for more information). The v3.7.6, v4.2.7, and v4.4.7 datasets include results from trio small variant, *de novo* s[...]

    Overview

    Description

    Overivew

    This dataset contains alignment files and small variant (includes single nucleotide variants (SNV) and indels), copy number variant (CNV), short tandem repeat (i.e., repeat expansion; STR), structural variant (SV) and other variant call files from the 1000 Genomes Project (1KGP) Phase 3 dataset  (3,202 individuals, 602 trios) using Illumina DRAGEN v3.5.7b, v3.7.6, v4.0.3, v4.2.7, and v4.4.7 software. All DRAGEN analyses were performed in the cloud using the Illumina Connected Analytics  bioinformatics platform powered by Amazon Web Services (see 'Data solution empowering population genomics'  for more information). The v3.7.6, v4.2.7, and v4.4.7 datasets include results from trio small variant, de novo structural variant, and de novo copy number variant calls on 602 trio families comprised of members from the 1KGP Phase 3 dataset. Trio repeat expansion calling was included in the v3.7.6 dataset only. Joint cohort analysis was also performed on the entire 1KGP sample dataset for the v3.7.6, v4.0.3, v4.2.7, and v4.4.7 re-analyses using DRAGEN Iterative gVCF Genotyper  v3.8.3, v4.2.0, v4.2.7, v4.4.7, respectively (see 'Genotyping variants at population scale using DRAGEN gVCF Genotyper'  and 'Population Genotyping' ).

    DRAGEN Versions

    v3.7

    User Guide  | Release Notes 

    Improvements and new features in the v3.7.6 individual samples analyses include CYP2D6 variant calling (see 'Overcoming high homology to detect variation in CYP21A2 with whole-genome sequencing in DRAGEN ') and joint detection and use of graph-based hg19 and hg38 reference hash tables (see 'DRAGEN Wins at PrecisionFDA Truth Challenge V2 Showcase Accuracy Gains from Alt-aware Mapping and Graph Reference Genomes'  and 'Demystifying the versions of GRCh38/hg38 reference genomes, how they are used in DRAGEN and their impact on accuracy'  for details).

    v4.0

    User Guide  | Release Notes 

    The DRAGEN v4.0.3 dataset features improved small variant calling accuracy due to utilization of a newly integrated machine learning functionality  with an updated graph based reference for difficult to map regions (see 'DRAGEN Sets New Standard for Data Accuracy in PrecisionFDA Benchmark Data. Optimizing Variant Calling Performance with Illumina Machine Learning and DRAGEN Graph' ); accuracy and runtime improvements in the SV caller; new targeted callers including CYP2B6, GBA, SMN and a Star Allele PGx caller; and an expanded catalog for use with Expansion Hunter STR caller.

    v4.2

    User Guide  | Release Notes 

    DRAGEN v4.2.7 offers significant accuracy improvements in small variant, CNV, and SV calling, includes new targeted callers (HBA, LPA, RH, CYP21A2, SMN silent carrier variant), and supports Star Allele calling for five additional pharmacogenes (BCHE, ABCG2, NAT2, F5, and UGT2B17). These are further improved by upgraded machine learning models. See DRAGEN 4.2: Enhanced machine learning, new targeted callers, and more  for further details on these and other enchancements.

    v4.4

    User Guide  | Release Notes 

    DRAGEN v4.4.7 boosts the speed and accuracy of all callers via the official release of an optimized pangenome graph reference ('The quest for accuracy gains in the dark regions of the genomes: Presenting the DRAGEN multigenome mapper and pangenome reference updates in version 4.3 '). Namely, SV calling accuracy is substantially increased via the implementation of a multigenome mapper capable of exploiting the power of a pangenome reference. Runtime is further reduced by supporting AWS F2 EC2 instances (Enabling Rapid Genomic and Multiomic Data Analysis with Illumina DRAGEN™ v4.4 on Amazon EC2 F2 Instances )

    Annotation

    Starting with the v4.0.3 reanalysis, annotation using the Illumina Connected Annotations (also known as Illumina Annotation Engine or Nirvana) was included as part of the analysis (see Illumina Connected Annotations documentation  for more information). For the v4.0.3, v4.2.7, and v4.4.7 datasets, annotation was performed on the merged small variant VCF generated by the DRAGEN Iterative gVCF Genotyper for the entire 1KGP cohort. For v4.2.7 and v4.4.7, annotation was also performed on the merged CNV, SV, and STR VCFs for the entire cohort.

    Features and programs

    Open Data Sponsorship Program

    This dataset is part of the Open Data Sponsorship Program, an AWS program that covers the cost of storage for publicly available high-value cloud-optimized datasets.

    Pricing

    This is a publicly available data set. No subscription is required.

    How can we make this page better?

    We'd like to hear your feedback and ideas on how to improve this page.
    We'd like to hear your feedback and ideas on how to improve this page.

    Legal

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    AWS Data Exchange (ADX)

    AWS Data Exchange is a service that helps AWS easily share and manage data entitlements from other organizations at scale.

    Open data resources

    Available with or without an AWS account.

    How to use
    To access these resources, reference the Amazon Resource Name (ARN) using the AWS Command Line Interface (CLI). Learn more 
    Description
    BAM, SNV-vcf, SNV-gvcf, STR-vcf, STR-bam, SV-vcf, ROH-vcf, CNV-vcf, CNV-bw, metrics and other supporting files from DRAGEN v3.5.6b analyses in a public S3 bucket.
    Resource type
    S3 bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::1000genomes-dragen
    AWS region
    us-west-2
    AWS CLI access (No AWS account required)
    aws s3 ls --no-sign-request s3://1000genomes-dragen/
    Description
    BAM, SNV-vcf, SNV-gvcf, STR-vcf, STR-bam, SV-vcf, ROH-vcf, CNV-vcf, CNV-bw, cyp2d6-tsv, metrics and other supporting files from DRAGEN v3.7.6 analyses in a public S3 bucket.
    Resource type
    S3 bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::1000genomes-dragen-3.7.6
    AWS region
    us-west-2
    AWS CLI access (No AWS account required)
    aws s3 ls --no-sign-request s3://1000genomes-dragen-3.7.6/
    Description
    BAM, SNV-vcf, SNV-gvcf, STR-vcf, STR-bam, SV-vcf, ROH-vcf, CNV-vcf, CNV-bw, cyp2d6-tsv, metrics and other supporting files from DRAGEN v3.7.6 analyses in a public S3 bucket. This is a clone of the 1000genomes-dragen-3.7.6 bucket in the us-east-1 region.
    Resource type
    S3 bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::1000genomes-dragen-v3.7.6
    AWS region
    us-east-1
    AWS CLI access (No AWS account required)
    aws s3 ls --no-sign-request s3://1000genomes-dragen-v3.7.6/
    Description
    CRAM, SNV-vcf, SNV-gvcf, STR-vcf, STR-bam, SV-vcf, ROH-vcf, CNV-vcf, CNV-bw, cyp2b6-tsv, cyp2d6-tsv, gba-tsv, smn-tsv, star-allele-tsv, metrics and other supporting files from DRAGEN v4.0.3 analyses and Nirvana Annotation in a public S3 bucket.
    Resource type
    S3 bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::1000genomes-dragen-v4.0.3
    AWS region
    us-east-1
    AWS CLI access (No AWS account required)
    aws s3 ls --no-sign-request s3://1000genomes-dragen-v4.0.3/
    Description
    CRAM, SNV-vcf, SNV-gvcf, STR-vcf, STR-bam, SV-vcf, ROH-vcf, CNV-vcf, CNV-bw, cyp2b6-tsv, cyp2d6-tsv, gba-tsv, smn-tsv, star-allele-tsv, hla-tsv, gvcf, json, metrics and other supporting files from DRAGEN v4.2.7 analyses and Nirvana Annotation in a public S3 bucket.
    Resource type
    S3 bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::1000genomes-dragen-v4-2-7
    AWS region
    us-east-1
    AWS CLI access (No AWS account required)
    aws s3 ls --no-sign-request s3://1000genomes-dragen-v4-2-7/
    Description
    CRAM, SNV-vcf, SNV-gvcf, STR-vcf, STR-bam, SV-vcf, ROH-vcf, CNV-vcf, CNV-bw, cyp2b6-tsv, cyp2d6-tsv, gba-tsv, smn-tsv, star-allele-tsv, hla-tsv, gvcf, json, metrics and other supporting files from DRAGEN v4.4.7 analyses and Nirvana Annotation in a public S3 bucket.
    Resource type
    S3 bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::1000genomes-dragen-v4-4-7
    AWS region
    us-east-1
    AWS CLI access (No AWS account required)
    aws s3 ls --no-sign-request s3://1000genomes-dragen-v4-4-7/

    Resources

    Support

    How to cite

    1000 Genomes Phase 3 Reanalysis with DRAGEN 3.5, 3.7, 4.0, 4.2, and 4.4 was accessed on DATE from https://registry.opendata.aws/ilmn-dragen-1kgp .

    License

    TBD

    Similar products