Andy Nelson Associate Director, Informatics & Cloud Operations, Illumina
  • About Illumina

    Illumina develops, manufactures, and markets integrated systems for analyzing genetic variation and biological function. The company’s primary data-analysis platform is BaseSpace Sequence Hub, used by global research organizations to perform data analysis for genome sequencing.

  • Benefits of AWS

    • Reduced monthly costs by nearly $400,000
    • Charges customers less and fuels global business expansion
    • Enables faster time to scientific discovery  
  • AWS Services Used

Illumina Massively Scales Its DNA Sequencing Technologies Using AWS

Across the globe, thousands of researchers use Illumina sequencing systems to perform genome sequencing—the process of determining the DNA sequence of an organism’s genome. Illumina also offers BaseSpace Sequence Hub, an integrated software platform designed for genomic data analysis. More than 90,000 users rely on the solution to process, analyze, and manage the genomic data generated on their systems. “With BaseSpace Sequence Hub, our goal is to assist researchers toward quickly and efficiently developing scientific insights from their next-generation sequencing data,” says Andy Nelson, associate director, informatics and cloud operations for Illumina. “The platform enables our customers to identify novel genetic variants, compare and contrast them with other reference genomes, and associate the variants with specific disease patterns and/or treatments.”

BaseSpace Sequence Hub has run on the Amazon Web Services (AWS) Cloud since day one. “Our instruments generate a terabyte of data every day, so we needed the scalability of AWS to support that,” Nelson says. “Also, our platform has peaks of incoming data, and it would be very expensive and slow to manage that using traditional infrastructure.”

As Illumina attracted more BaseSpace Sequence Hub customers, the company found itself spending more on AWS services such as Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3). “Sequence Hub grew very rapidly, to the point where we were paying $1 million each month for AWS services to support internal and external customers,” says Nelson. “Whole-genome analysis involved pulling a 100-gigabyte file from Amazon S3, running a 10-hour calculation, and pushing the results back to S3. These were long-running calculations on compute-intensive Amazon EC2 instances, with each calculation costing $30 to run. We were spending more than $400,000 a month just
on compute.”

As the cost of genome sequencing continues to drop, Illumina’s customers can now sequence an entire human genome for less than $1,000. In response, Illumina knew it needed to reduce its data analysis and storage costs as well. The company became concerned about the amount it was spending on compute and storage. “We internally sequence around 7,000 genomes every month, and as the cost of sequencing goes down, compute was becoming a bigger part of our overall costs,” says Al Maynard, associate director, software engineering for Illumina. “We needed to drive down those costs and make BaseSpace Sequence Hub the fastest and cheapest solution in the market.”

As part of an overall cost-optimization project, Illumina began to run Sequence Hub using Amazon EC2 Spot Instances, which are unused Amazon EC2 instances available at a significant discount over on-demand prices.

“Amazon EC2 Spot Instances proved to be robust and reliable for us, and we were excited about the savings opportunity,” Nelson says. By using Amazon EC2 Spot Instances, Illumina reduced its monthly compute costs from more than $400,000 to just over $100,000, while gaining more compute power. “This really helps our bottom line, especially as we’re the second-biggest customer of BaseSpace Sequence Hub,” says Nelson. Illumina saved another $90,000 in monthly storage costs by tiering some of its data into Amazon S3 Standard-Infrequent Access (S3 Standard-IA).

Illumina is also able to pass its cost savings onto its customers. “Using Amazon EC2 Spot Instances, we can drive the cost of compute down for our largest customers to around $2.50 per sample, which is a huge savings for them. By dramatically dropping the prices we charge, we will also further increase usage of BaseSpace Sequence Hub,” says Nelson. “We offer a free 30-day trial for BaseSpace Sequence Hub, and we can now reduce the costs we incur for providing that service.”

By running BaseSpace Sequence Hub on AWS, Illumina has the agility to meet its customers’ need for on-demand research. “Using AWS, we can spin up 2,000 instances for just a few hours if we need to, without having to fill a data center with hardware,” Nelson remarks. “We can also run more workloads in parallel, faster than we could in an on-premises environment, without a massive upfront cost.” As a result, Illumina can speed business growth. “We can move faster, so we can expand to more customers in additional countries,” says Nelson. “For instance, creating a Sequence Hub instance in Australia takes much less time than if we had to build a data center there.”

By giving its customers faster and cheaper genomic analysis, Illumina is helping accelerate its customers’ research efforts. “Using AWS, we are able to offer our customers a lower cost, high-performance genomic analysis platform, which can help them speed their time to answers,” says Nelson. “This will become especially important for our customers in clinical markets. For example, in a children’s hospital, getting a diagnosis as quickly as possible for a child with a disease is key to a successful outcome. Getting answers quickly is very important, and we can enable that by running our platform on AWS.”

