AWS Public Sector Blog

Alzheimer’s disease research portal enables data sharing and scientific discovery at scale

According to the World Health Organization (WHO), more than 55 million people globally have dementia, with the most common form, Alzheimer’s disease, accounting for approximately 60-70%. The annual financial impact is estimated to be $1.3 trillion USD. Identifying causes of Alzheimer’s disease and developing diagnostics and potential cures for the condition requires multi-modal, multi-omics analysis. This is possible through public and private partnerships that enable access to large genetic, genomic, and neuroimaging datasets and expertise in big data processing, informatics, and algorithm development.

Unifying genomics data for Alzheimer’s disease research using AWS

Li San Wang, PhD, the Peter C. Nowell, M.D. Professor and Vice Chair for Research in the Department of Pathology and Laboratory Medicine at the University of Pennsylvania’s Perelman School of Medicine, is co-director for the Penn Neurodegeneration Genomics Center (PNGC) and directs multiple National Institutes of Health (NIH) funded projects on Alzheimer’s disease genetics.

In 2011, PNGC Director Dr. Gerard Schellenberg met with Dr. Wang and his research team to begin exploring cloud technology for large-scale genomic computations. They began to use Amazon Web Services (AWS) to build a small scale proof-of-concept of 36 exomes, or 1.1 terabytes (TB). Since then, their work has evolved to become one of the largest databases of genomic data for Alzheimer’s disease and related conditions: the National Institute on Aging Genetics of Alzheimer’s Disease Data Storage Site (NIAGADS DSS), powered by AWS.

Pictured: Li San Wang, PhD, the Peter C. Nowell, M.D. Professor and Vice Chair for Research in the Department of Pathology and Laboratory Medicine at the University of Pennsylvania’s Perelman School of Medicine.

Pictured: Li San Wang, PhD, the Peter C. Nowell, M.D. Professor and Vice Chair for Research in the Department of Pathology and Laboratory Medicine at the University of Pennsylvania’s Perelman School of Medicine.

The NIAGADS genomic database on AWS is a searchable annotation resource that provides access to publicly available datasets for Alzheimer’s disease and related neuropathologies. Created to make Alzheimers-genetics knowledge more accessible to researchers, NIAGADS has genomics data on 172,701 samples from 98 datasets and is now 1.3 petabytes (PB) in total size. Data types include whole-genome/exome sequencing; genome-wide-association studies (GWAS) and imputation; RNASeq; single-neuron whole genome sequencing (WGS); proteomics; and metabolomics. The database’s interface is designed to guide users unfamiliar with genetic data in not only exploring, but also interpreting this ever-growing volume of data.

Researchers can identify and interpret genomic regions of interest compiled from harmonized datasets via interactive search and the NIAGADS genome browser. The data is curated along with variant and gene annotations, as well as their functional significance based on public or Alzheimer’s disease-related experimental data sources

Enabling data sharing to accelerate Alzheimer’s disease research

NIAGADS is creating a system that promotes scientific discovery through data sharing with a large cadre of institutions. The NIAGADS Data Sharing Service facilitates the deposition and sharing of genomic data from the Alzheimer’s Disease Sequencing Project (ADSP) and other National Institute on Aging (NIA)-funded dementia genomic studies with approved researchers from the broader community. Identifying the genetic variants that increase the risk of Alzheimer’s disease or protect against it requires sequencing and analyzing the genomes of many individuals—something that’s impossible with data from a single institution alone.

To date, more than 90 genome-wide significant locations (loci) associated with Alzheimer’s disease risk have been discovered (Kunkle AD GWAS NG2019, Bellenguez AD GWAS NG2022, Bis Mol Psychiatry and Holstege WES 2022). The data housed in NIAGADS represents some of these major advances and findings in the field. Association with other related clinical outcomes, such as age at onset and cerebrospinal fluid biomarker levels, have led to the discovery of hundreds of loci and associations with the potential to help researchers better understand the biology of dementia, test new hypotheses, and develop novel therapeutic strategies. The Alzheimer’s Disease Variant Portal (ADVP) at NIAGADS maintains a collection of such genetic findings with links to publications and annotations of genes and variants.

As a result of the vision and hard work of many individuals from academia, industry, and federal government, principal investigators can request available data through a data access request management system by logging in using their eRA Commons ID. Each data access request is reviewed by the NIAGADS data access committee.

AWS infrastructure supporting the NIAGADS DSS

NIAGADS uses AWS for the transfer, processing, storage, and archival of genomics data, as well as monitoring of data access patterns. For the data sharing infrastructure, NIAGADS uses Amazon Simple Storage Service (Amazon S3), Amazon S3 Glacier Deep Archive, Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic File System (Amazon EFS), Amazon Elastic Block Store (Amazon EBS), and the AWS Transfer Family. For security and compliance, the team leverages services such as AWS CloudTrail, Amazon GuardDuty, AWS Config, AWS Security Hub, and Amazon CloudWatch.

Diversifying the datasets and more next steps for NIAGADS

With the WHO reporting that over 60% of individuals with a diagnosis of dementia live in low- and middle-income countries, expanding the pool of researchers, including international collaborators, is a key goal for the program. NIAGADS is excited to continue to build on AWS to further expand its global reach, ability to support collaborative analysis of all types of Alzheimer’s disease data, and the data sharing ecosystem.

It will take a village to help identify protective gene variants and pathways for therapy and prevention. Researchers from qualifying institutions are encouraged to visit the NIAGADS website and work with the NIAGADS team on contributing to and analyzing data.

Read more about open science models on AWS:

Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.

Please take a few minutes to share insights regarding your experience with the AWS Public Sector Blog in this survey, and we’ll use feedback from the survey to create more content aligned with the preferences of our readers.

Ashwini Davison, MD

Ashwini Davison, MD

Ashwini Davison is a board certified physician in internal medicine and clinical informatics. In her role as a healthcare executive advisor at Amazon Web Services (AWS), she supports academic medical centers accelerate their journey to the cloud. Prior to joining AWS, Dr. Davison was a full time faculty member at Johns Hopkins where her work focused on evaluating clinical decision support in EHRs, incorporating health systems science curricula into medical education, and growing online education programs in population health management and health informatics.

Chris Griffin

Chris Griffin

Chris Griffin is a regional manager at Amazon Web Services (AWS), and has been supporting enterprise academic medical centers and R1 higher education customers healthcare and education customers since 2018. Chris has a background in management consulting and engineering, which enables him to work backwards from the needs of researchers and academic program builders who are leveraging the power of the AWS Cloud to transform the way they share data, accelerate scientific discovery and research, and ultimately improve the lives of populations.

Ken Harris

Ken Harris

Ken leads the Amazon Web Services (AWS) vertical focused on academic medicine and state and local government providers. His team of principal trusted advisors in precision medicine, hospital modernization, clinical informatics, and artificial intelligence (AI) and machine learning (ML) are both field-based and customer-facing healthcare executives. He has over 30 years of healthcare experience, including founding and taking public a cell and gene therapy company. Prior experience includes being a chief clinical officer, president, and chief executive officer (CEO) for 10 years prior to joining AWS. Ken has a strong background in building and commercializing regulated products in the device, combination product, clinical software, and therapeutic biologics space.