CHOP Accelerates Pediatric Research Using AWS-Powered Data Resource


Industry Challenge

As medical researchers generate more and more clinical data, they’re faced with the challenge of storing and organizing that data so that researchers can access, study, and cross-reference it to facilitate medical breakthroughs. The Children’s Hospital of Philadelphia (CHOP) decided to use Amazon Web Services (AWS) to build the Gabriella Miller Kids First Data Resource Center (KFDRC), a transformative healthcare discovery solution that coalesces cross-disciplinary medical research, advancing genomic, clinical, and imaging data availability toward deriving insights for children all over the world and across a wide spectrum of diseases.

HCLS Symposium 2021 - Children's Hospital of Philadelphia (CHOP)

All of our system is currently built on AWS. . . . We went from zero to managing a few petabytes of genomic data within a year using this setup."

Allison Heath
Director of Data Technology and Innovation,
Center for Data-Driven Discovery in Biomedicine, the Children’s Hospital of Philadelphia

CHOP’s Solution

CHOP seeks to support data sharing within the pediatric research community through KFDRC, a research data resource that focuses on pediatric cancer and structural birth defects. The data resource is open to anyone and lets researchers query, search, discover, build, and visualize synthetic cohorts. “A lot of times, diseases are classified by certain kinds of organs—like brain cancer, lung cancer, and congenital heart defects . . . but there might be connections between these organ systems,” says Allison Heath, director of data technology and innovation at CHOP’s Center for Data-Driven Discovery in Biomedicine. KFDRC brings together researchers, providing access to genomic, clinical, and imaging data that helps them cross-analyze diseases, think of new hypotheses, and make discoveries.

Supported by a variety of AWS services, KFDRC is able to store, organize, and release over 1.5 PB of genomic, clinical, and imaging data. A robust health data service is crucial for CHOP, given the massive amounts of data it must process: the current genomic variant database contains over 26 billion occurrences of over 215 million unique genomic variants—and that’s just from 5,000 participants. “Using the scalability of the cloud, we’ve been able to solve a lot of those big-data problems,” says Heath. Additionally, to better understand how genetics impacts treatment options, health outcomes, and follow-ups, KFDRC integrates clinical data from electronic medical records and research forms using Amazon HealthLake—a HIPAA-eligible service that stores, transforms, queries, and analyzes health data at scale. Using Amazon HealthLake, KFDRC can facilitate clinical data sharing using the Fast Healthcare Interoperability Resources (FHIR) open-industry standard, and with integrated medical natural language processing, it can further structure clinical data, which often comes in many forms and formats.

Benefits of Using AWS

CHOP plans to scale KFDRC with data from hundreds of thousands more participants, and it is confident in its ability to do so, having already demonstrated this capability on AWS. “All of our system is currently built on AWS. . . . We went from zero to managing a few petabytes of genomic data within a year using this setup,” says Heath. 

And because Amazon HealthLake facilitates greater interoperability between healthcare systems, technological devices, and personnel, CHOP has increased the collaborative potential of KFDRC. “This project has highlighted how cloud-based approaches can create collaboration, bring together people across many different rare diseases in pediatrics, and help us understand and find new discoveries,” says Heath.

Visit for more information.

About the Children’s Hospital of Philadelphia (CHOP)

Founded in 1855, the Children’s Hospital of Philadelphia is the first US hospital devoted exclusively to pediatric care. Its main campus is in Philadelphia, and it operates several other facilities in Pennsylvania and New Jersey.

Benefits of AWS

  • Provides the research community with access to genomic and associated clinical data
  • Indexed 1.5 PB of genomic, clinical, and imaging data within 1 year
  • Increased KFDRC’s collaborative potential
  • Helps researchers visualize synthetic cohorts for data analysis
  • Achieved a scalable infrastructure
  • Stores 26 billion occurrences of 215 million unique genomic variants from 5,000 participants
  • Meets the FHIR industry standard

AWS Services Used

Amazon Healthlake

Amazon HealthLake is a HIPAA-eligible service offering healthcare and life sciences companies a complete view of individual or patient population health data for query and analytics at scale.

Learn more »


Amazon Comprehend

Amazon Comprehend is a natural-language processing (NLP) service that uses machine learning to uncover valuable insights and connections in text.

Learn more »

Get Started

Leading healthcare providers are already using AWS. Contact our experts and start your own AWS Cloud journey today.