AWS Case Study: Spiral Genetics
About Spiral Genetics
Spiral Genetics, a Seattle, Washington-based bioinformatics company formed in 2009, makes high-performance software that helps researchers analyze DNA in the cloud. The company provides computational processing and storage of DNA through Spiral Cloud, a proprietary application that runs on Amazon Web Services (AWS). Spiral Cloud offers pay-as-you-go and subscription pricing so their customers can spend more money on science and less on hardware. Spiral Genetics attracts clients from a number of different industries, including healthcare, pharmaceutical, agrogenomic companies and universities.
“When Spiral Genetics was founded, the cost of sequencing a human genome was $100,000,” says Adina Mangubat, Spiral Genetics’ Chief Executive Officer. The chemical process of sequencing took 30 days, and computational processing in a traditional hardware-based infrastructure took several weeks. Since then, various sequencers have been developed that can analyze a human genome in one day for a few thousand dollars. “We are headed toward a world where everybody has their genome sequenced and it becomes part of their medical record,” Mangubat says. “The amount of genetic information being sequenced is booming, thanks to innovations in the industry that have reduced the cost of sequencing.”
These innovations have created a new challenge for researchers: keeping up with the influx of data from sequencers. After a customer runs a DNA through a sequencer, they hand off the data to Spiral Genetics—either by uploading it via the company’s web front end or by putting it on disk and sending it to the company. “Transferring data to the cloud is not really a problem with sequencers such as Illumina’s, which allow for the data to stream to Amazon Simple Storage Service (Amazon S3) as the data is produced. Plug-ins that Spiral Genetics provides for Ion Torrent sequencers work the same way.” Customers then choose the algorithms they would like to use to analyze their data, and let Spiral Cloud do the rest.
When a DNA sequencer reads a human genome, it essentially cuts the DNA into millions of fragments and reads each piece. The result is millions of small data files producing roughly 200 gigabytes of data.
Bioinformatics firms like Spiral Genetics take that output and perform massive pattern recognition to identify genetic markers for susceptibility to certain conditions or drug interactions. “Analyzing a human genome gives us insights into the genetic markers that a person may carry, indicating a higher susceptibility to conditions like Alzheimer’s or cancer,” Mangubat says. “It’s exciting to be a part of the toolset driving new genetic discoveries and to see how cloud computing can change the world.”
Why Amazon Web Services
Since its inception, Spiral Genetics has run its operations on the AWS Cloud. The team knew they needed two things to be successful: the computational power to process immense data sets, and the ability to scale to meet the increasing demands of analyzing genomics data. They also wanted to avoid making a costly infrastructure investment. “It wouldn’t be possible for us to pursue big-data business without AWS,” Mangubat says. “If we had to build an infrastructure in-house, we wouldn’t come close to the speed and scale needed in the industry.”
Starting off in the cloud allowed Spiral Genetics to avoid making a large hardware investment up front, since an on-premise infrastructure would have presented insurmountable challenges. The cost to build and maintain such an infrastructure was simply too high. AWS presented Spiral Genetics with a financially viable alternative that also provided the scalability and computing speed that the company required in a competitive and rapidly expanding market. “We can process thousands of datasets simultaneously, making this platform well suited to keep up with the rate of data production.”
In addition, with so much genetic data to protect, Spiral Genetics needed a solution that would comply with the strictest requirements of Spiral Genetics' customer base. “Genetic data is some of the most sensitive data on the planet,” Mangubat says. “When it comes to providing a highly secure and robust infrastructure, AWS blows the competition out of the water.”
AWS spins up instances as necessary, using Amazon Elastic Compute Cloud (Amazon EC2) for the computation. Spiral Genetics stores the results in Amazon Simple Storage Service (Amazon S3), and uses Amazon Elastic Block Store (Amazon EBS) for their reference database.
After the analysis concludes, the Spiral Genetics platform allows researchers to review the results and, if they like, download variant data and insert it into their own analysis application. The company leverages encryption technologies to protect sensitive data and uses multiple regions to meet customer needs across the globe.
Spiral Genetics has benefitted from the ability to analyze DNA sequences quickly, keeping pace with demand. “We can serve very large labs that have 15 or 20 sequencers—so we’re analyzing 15-20 sequences a day from one lab,” Mangubat says. “Normally, analyzing one DNA sequence would take about a week. But between Spiral Genetics’ proprietary software and the scalability and computational speed available through AWS, Spiral Genetics has cut the analysis time down to three hours per whole genome sequence.”
With AWS, Spiral Genetics can provide the speed and scale its researchers require. “The consistency of AWS services has been great for us in terms of innovation,” Mangubat says. ”If AWS didn’t offer that kind of consistency, it wouldn’t be possible for us to do some of the things we’re doing.”
Using multiple regions has also been helpful for Spiral Genetics. Not only does the company have customers all over the globe, but some customers require their sensitive data to be stored in a certain country or region to protect their clients’ data. “AWS allows us to do what we need to make sure our customers’ data is protected,” Mangubat says.
Spiral Genetics will continue to innovate as the industry expands. “With AWS, the types of algorithms we can write are very different—AWS allows for advancements in accuracy, speed, and flexibility that wouldn’t be possible otherwise,” Mangubat says.
By running Spiral Cloud on AWS, Spiral Genetics offers its customers the flexibility of paying for only what they use—and freedom from building their own internal computer environment. “Our customers don’t have to make a big capital expenditure,” Mangubat says, “and they don’t have to buy very expensive servers. We’re fast, and they don’t have to worry about scaling, storage or security. With AWS, we have no limitations on scale, and we can pursue very large-scale opportunities.”
To learn more about how AWS can help your data needs, visit our Big Data details page: http://aws.amazon.com/big-data/.