AWS Public Sector Blog

Breaking Down Barriers: How AWS Democratizes Genomic Data for the World

Breaking Down Barriers: How AWS Democratizes Genomic Data for the World

Part 1 of 3: Democratizing Access to Genomic Data and Analytics

Every human deserves access to innovations that could save their life. Yet for decades, groundbreaking genomic research remained locked behind institutional walls, accessible only to well-funded laboratories.

At Amazon Web Services (AWS), we believe global IT infrastructure and advanced analytical capabilities are necessary tools to address these challenges. We’re transforming how researchers discover treatments, how clinicians diagnose conditions, and, ultimately, how millions of people receive care.

This blog is the first in a three-part series that explores how AWS is transforming genomic research through democratized data access, sovereign-by-design outbreak intelligence platforms, and population-scale biobanks that enable global collaboration while maintaining data security and privacy.

The Challenge: A World of Data, Islands of Access

Genomics has become a big data industry, and researchers everywhere face multifaceted challenges:

  • Volume of storage. A single Next-Generation Sequencing (NGS) instrument can produce over 1TB of data per day, with estimates predicting organizations will generate between two and 40 exabytes of genomic data within the next decade.
  • Computational barriers. Analyzing genomic data requires massive computing power previously available only to major research institutions.
  • Geographic and sovereignty barriers. Researchers in low- and middle-income countries often lack infrastructure, but countries prefer to maintain control over sensitive health data while still enabling collaborative research

When only wealthy institutions can analyze genomic data, medical breakthroughs apply only to smaller, wealthier populations, leaving billions underserved by the promise of precision medicine.

Cloud Infrastructure for Global Health Equity

AWS addresses these challenges through open data initiatives, purpose-built genomic services, and strategic credit programs that put powerful technology within reach of organizations supporting underserved populations.

AWS Registry of Open Data: 95+ Genomic Datasets and Growing

The AWS Registry of Open Data democratizes data access. The registry hosts 95+ genomic data sources—including the Sequence Read Archive (40PB+), Cancer Genome Atlas (2.5PB+), and Human Cell Atlas (300TB)—making them readily available in Amazon Simple Storage Service (Amazon S3). Researchers can analyze data where it lives, dramatically reducing time-to-insight and costs.

AWS Services for Genomic Workloads

Real-World Impact: Opening Doors for Researchers Worldwide

At AWS, we believe our cloud and AI services are powerful tools to address the world’s urgent and complex health challenges.

The true measure of democratization isn’t in technology specifications, it’s in whose lives are changed. In 2021, we launched the AWS Health Equity Initiative (HEI), a three-year $60M commitment to advance global health equity.

Here are two examples of these investments:

Korea University: Advancing Female Autism Research in East Asia

Autism research historically focused on male subjects and Western populations, leaving critical gaps in our understanding of how the condition manifests in women and in East Asian populations. Dr. Joon-Yong An at Korea University set out to change this by analyzing 1.4 petabytes of genomic data to identify sex-specific genetic factors.

Using AWS credits, Dr. An’s team processed whole-genome sequencing data from over 42,000 individuals across Korean and international autism cohorts, facilitating large-scale collaborative studies in East Asia where autism research is limited.

Using scalable AWS services like EC2 with GPU acceleration and S3, they conducted a Category-Wide Association Study framework to prioritize noncoding variants associated with autism. The team also trained high-performance deep learning models to predict sex-specific genetic factors.

“By efficiently processing large-scale genomic datasets, our solution accelerates the discovery of female-specific risk factors, facilitating more accurate diagnoses and personalized interventions. Ultimately, our findings will contribute to reducing health disparities and improving health outcomes for autistic individuals,” reported Dr. An.

“Our work opened new avenues for researchers to explore sex-related genetic factors in various neurodevelopmental and psychiatric conditions. By democratizing access to computationally intensive genomic research, AWS empowered us to bridge gender disparities in autism diagnosis and treatment, ensuring that overlooked populations receive the medical attention and resources they deserve.”

Imagenomix: Precision Cancer Diagnostics for All Backgrounds

Approximately 98% of global cancer patients lack access to targeted genetic testing, largely due to the high cost (~$6,000 per patient) and slow turnaround (33+ days) of conventional NGS panels.

Using AWS infrastructure, Imagenomix developed IGX Predict™, an AI-powered platform that analyzes standard pathology slides to predict gene mutations in as little as three minutes, at a fraction of the cost of traditional Next-Generation Sequencing (NGS). By training their models on diverse patient populations, Imagenomix ensures their diagnostic tools work accurately across all racial and ethnic backgrounds.

The social impact of this approach is profound. IGX Predict dramatically lowers barriers to patients’ access to NGS by achieving a 100% tissue success rate compared to a 26% failure rate with traditional methods. This enables clinicians to rapidly identify the right patients for clinical trials and targeted therapies, regardless of where they are in the world.

“Finding the right patients is the biggest bottleneck in drug development, and that bottleneck disproportionately excludes patients from underrepresented populations,” said Travis Wold, CEO of Imagenomix. “Our mission is to make precision cancer diagnostics accessible to everyone — not just the few — by turning a standard pathology slide into genetic insight in minutes.”

With a growing product pipeline spanning lung, breast, and brain cancers, Imagenomix is positioning itself as the new standard in precision access, ensuring that advances in genomic medicine reach every patient, everywhere.

Bridging Data and Discovery: The AWS Open Data-NVIDIA Knowledge Graph Hackathon

Dr. An’s autism research and Imagenomix’s AI driven precision diagnostics platform, showcase how AWS credit programs and scalable infrastructure democratize computational power for genomic discovery. The second pillar of democratization — open access to foundational genomic datasets hosted freely on AWS — creates opportunities for collaborative innovation at unprecedented scale, exemplified by a hackathon that united researchers globally to advance trustworthy AI in biomedicine.

In October 2025, AWS partnered with NVIDIA to host a transatlantic hackathon bringing together 53 researchers from the US and UK. Over three days, seven teams built prototype systems combining knowledge graphs with GraphRAG to make AI outputs more trustworthy in biomedical research, using Amazon Neptune, Open Data on AWS, and NVIDIA’s PyTorch Geometric RAG resources.

Teams created solutions addressing critical challenges, from GeNETwork’s precision oncology knowledge graph integrating cancer genomics and pharmacological data, to BioGraphRAG’s citation-supported biomedical question answering system.

The Path Forward: From Access to Action

Democratizing access to genomic data is just the beginning. Part 2 of this blog series explores how AWS enables global pathogen surveillance and outbreak intelligence through sovereign-by-design platforms, which allow countries to collaborate on infectious disease tracking while maintaining control over their sensitive health data within national borders.

The future of global health depends on ensuring that genomic insights benefit everyone, not just those in wealthy nations or well-funded institutions. Through open data initiatives, purpose-built services, and strategic support for underserved researchers, AWS helps build that future one genome at a time.

About the Authors:

This blog series was developed by AWS Skilling and Social Impact (SSI) in collaboration with AWS Healthcare and Life Sciences specialists. AWS SSI exists to fuel, align, and amplify the good that AWS does in the world, helping organizations transform health outcomes through cloud and AI technology.

Dr. Dawn Heisey-Grove

Dr. Dawn Heisey-Grove

Dr. Dawn Heisey-Grove is the global health lead for AWS Skilling and Social Impact. She has spent her career finding new ways to innovate with health organizations using the best technology. Dawn brings deep expertise in public health systems, health informatics, and technology-driven solutions to advance health equity and resilience on a global scale.

Dr. Beryl Rabindran

Dr. Beryl Rabindran

Beryl is the life sciences lead for AWS Open Data. Beryl is a cell biologist by training and led clinical research for a medical technology AI startup in cancer imaging before joining AWS. She is passionate about working directly with researchers from around the world to grow the community of open life sciences data users.

Ankit Malhotra

Ankit Malhotra

Ankit is the Head of Genomics and Precision Medicine, AWS Healthcare and Life Sciences. He partners with healthcare organizations, biomedical research institutions, pharmaceutical companies, and life sciences enterprises worldwide to integrate genomics into their cloud workloads. A key focus of his work is oncology and enabling trusted research environments (TREs) that allow organizations to collaborate on sensitive genomic data while maintaining security, compliance, and data sovereignty. Ankit helps customers accelerate discovery and innovation by leveraging AWS’s purpose-built genomic services and secure infrastructure. With cross-training in computer science, molecular biology, and genetics, Ankit has over 20 years of experience in the genomics industry, including a decade as an NIH-funded computational genomic scientist.