Ancestry Accelerates Genomics Data Insights Using Amazon EFS

2020

Ancestry® is a global leader in family history and consumer genomics, using family trees, historical records, and DNA to help people on their journeys of personal discovery. Ancestry has 18 million-plus people in its consumer DNA network. AncestryDNA® uses advanced genomic science to help members uncover new details about their family history by giving them more ways to explore their DNA matches, connect to more precise regions, and gain insights into genetic health information and personal traits.

The AncestryDNA team includes dozens of scientists, including population geneticists, computational biologists, statisticians, epidemiologists, genomic data scientists, and bioinformaticians who develop algorithms to analyze genetic and other data representing multiple terabytes of storage. The team previously self-managed its scale-out Network Attached Storage (NAS) clusters on premises, but it lacked the ability to quickly scale storage and compute resources without monitoring, provisioning, and advanced planning to anticipate future requirements. “Our data is growing constantly, and one of our challenges was how to scale as our genetic network grew,” says Dr. Eurie Hong, PhD, vice president of genomics at AncestryDNA. “We wanted to expand compute capacity to meet a quadratically increasing dataset size for analysis.”

The AncestryDNA science team also needed more elasticity to support unpredictable workloads. “Our workflows can be very spiky, and it was difficult to allocate budget when we couldn’t predict how much disk and compute we would need for the year,” says Dr. Asher Baltzell, PhD, bioinformatics manager at Ancestry.

lab analysis of new corona virus, clinic laboratory chemical research of infection. Covid-19 concept.
kr_quotemark

Using Amazon EFS, we don’t have to worry about scaling research workloads—the system can grow automatically to meet our researchers’ needs, no matter what the compute and storage requirements are.”

Dr. Eurie Hong, PhD
Vice President of Genomics, AncestryDNA

Moving Genomics Research Workloads to AWS

The AncestryDNA science team decided to move to Amazon Web Services (AWS). “Our company overall had started moving to AWS, and we were interested in the scalability and flexibility of the cloud,” says Baltzell.

The team uses Amazon Elastic Compute Cloud (Amazon EC2) for on-demand compute and Amazon Elastic File System (Amazon EFS)—a scalable, fully managed, elastic Network File System (NFS)—as a shared data file system. “Scientists usually work with traditional file servers, so we knew we wanted something that was similar to what our researchers had used before,” says Hong. “Using Amazon EFS, we don’t have to worry about implementation or ongoing management, as Amazon EFS provides the scalability and elasticity to address our changing workload.” AncestryDNA also relies on Amazon Simple Storage Service (Amazon S3).

The AncestryDNA science team was able to complete the migration ahead of schedule, with no impact on project timelines or disruption to the productivity of the data science team.

Easily Scaling to Meet Scientists’ Compute and Storage Needs

Ancestry can now perform research aligned with its Ancestry Human Diversity Project without having to worry about data storage limits. “Using Amazon EFS, we don’t have to worry about scaling research workloads—the system can grow automatically to meet our researchers’ needs, no matter what the compute and storage requirements are,” says Hong.

Additionally, because Amazon EFS is a fully managed cloud file system, AncestryDNA avoided the need to build and manage its own NFS servers. “We do not want to spend our time and money creating and managing our own file system—we want to focus on the research,” Hong says. “We can do that by using Amazon EFS.”

Gaining Elasticity to Support Workload Spikes and Optimize Costs

Ancestry now has the elasticity it needs when it has to manage unpredictable workload increases or decreases. “The elasticity and flexibility we get with Amazon EFS is huge for us,” says Baltzell. Additionally, using Amazon EC2, the team can optimize costs. “Researchers can use more resources at one time, and not pay for idle resources. Rather than running 10 servers all the time, we can run 100 servers only for the time they are needed. That also makes it easier for us to predict and manage costs.”

Onboarding New Scientists Faster

Amazon EFS provides an environment consistent with Ancestry’s prior on-premises system, giving data scientists shared project and personal folders, all mountable from their Jupyter and RStudio data science analysis notebooks for easy job management. In addition, onboarding new scientists is easy as the cloud environment they are working in utilizes the same methods for accessing and storing data that they are used to. Because they have a familiar file system, scientists can use adjacent AWS services that could potentially help them accelerate their pace of innovation, instead of spending their time learning how to do compute and analysis. The team’s scientists also use Amazon EMR to support research that relies on the Hadoop big data framework.

AncestryDNA scientists can now focus more on innovation. “Using AWS, we can spend more of our time identifying new ways to help customers discover their unique family history,” says Hong. “We will continue to try to find methods that help our customers better understand their families and find out how their genetics can inform them about their future health.”

About Ancestry

Ancestry is a leading provider of family history and consumer genomics. With a collection of over 27 billion records and more than 18 million people in its growing DNA network, Ancestry helps customers discover their family story and gain actionable insights about their health and wellness. For over 30 years, millions of people have chosen Ancestry as the platform for discovering, preserving, and sharing the most important information about themselves and their families.

Benefits of AWS

  • Enables multiple scientists to perform genomics research
  • Automatically scales compute and storage resources up or down
  • Onboards new scientists faster and easier

AWS Services Used

Amazon Elastic File System

Amazon Elastic File System (Amazon EFS) provides a simple, scalable, fully managed elastic NFS file system for use with AWS Cloud services and on-premises resources.

Learn more »

Amazon S3

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.

Learn more »

Amazon Elastic Compute Cloud

Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable compute capacity in the cloud.

Learn more »

Amazon EMR

Easily run and scale Apache Spark, Hive, Presto, and other big data frameworks

Learn more »


Get Started

Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.