AWS Case Study: Seven Bridges Genomics
About Seven Bridges Genomics
Seven Bridges Genomics, a Cambridge, Massachusetts-based bioinformatics firm, offers researchers and labs a cloud platform for analyzing genetic data generated through next-generation sequencing (NGS) technologies. Through its IGOR platform, Seven Bridges provides a one-stop solution for managing NGS projects and enables customers to create and run complex data analysis pipelines easily using a drag-and-drop interface. The AWS Cloud provides highly scalable computation and the means to easily consume, share and reproduce results.
Analyzing genomes promises to make personalized healthcare a reality by providing the gene markers for susceptibility to certain medical conditions or drug interactions. Advances in genomics have resulted in faster, less expensive DNA sequencing, and the amount of data being generated has increased swiftly. But once a genome has been sequenced, it must be analyzed. Researchers are challenged to keep up with the demand.
The DNA sequencing of a single human genome can produce several gigabytes of data, which include hundreds of millions of DNA fragments. Analyzing that data is akin to pattern recognition on a massive scale and requires specific knowledge about the methodology, tools and specific genomes. “The work is computationally intensive,” says Seven Bridges CEO and Founder Deniz Kural. “It can easily take over 200 hours of computation (without parallelization) to analyze a single human genome.”
DNA sequencers have become significantly faster at sequencing data in recent years, which has created a data processing bottleneck for researchers. “The speed at which we’re generating biological data is outpacing Moore’s Law,” Kural says. “A few years ago, researchers could perform all their analysis on desktop computers, but now they have to look at server clusters and parallel computation. Biologists and biochemists are not comfortable with this because it’s not their area of expertise. We needed a way to provide them with an easily accessible, customizable, and powerful infrastructure.”
To overcome these challenges, Seven Bridges Genomics developed IGOR, a cloud-based genetic data analysis and discovery platform. When Seven Bridges began planning IGOR, the company quickly came to the conclusion that AWS was the cloud solution for them. “No other cloud provider came close to the range of offerings AWS provided,” says Sebastian Wernicke, Seven Bridges’ Head of Business Development. “We can provide our customers with exactly the computing power and memory they need at a highly competitive price.”
Why Amazon Web Services
Seven Bridges has used AWS since it started developing the platform. “Our customers need access to irregular but intense computational capacity,” Wernicke says. “For some projects, they may need to scale to 100 servers for a few days, but once analysis is done, they scale back down to virtually none while they evaluate the results.”
Seven Bridges’ IGOR platform provides customers with a graphical user interface to set up data processing pipelines, either by using an existing pipeline as a model or by creating a new one. “It’s important to provide some guidance to users, as data analysis for genomics is usually very complex and requires dozens of tools to be configured and linked together in the right way,” explains Wernicke. An example of this can be seen in the diagram below.
“Once the researcher is ready to begin the analysis, they flip a switch and walk away,” Kural says. “We enable our customers to use AWS in a very intuitive way.”
The company uses Reserved Instances (RI) to meet its customers’ diverse needs. “If we know someone needs a lot of power, we will reserve an instance with a lot of power,” Kural says. “If not, we will switch a user to something that doesn’t need as much power. And since we’re trying to optimize the instances that are being used for computation time and cost, it minimizes overhead for our customers.”
Seven Bridges is using Amazon Elastic Compute Cloud (Amazon EC2) for computation. “We give our customers access to the computational power they need through AWS, and we make it possible for them to link together open-source and proprietary analysis tools in different ways,” Wernicke says. “That way, they can customize the way they analyze a genetic sequence.”
The company also uses Amazon Simple Storage Service (Amazon S3) to store its data. “Our in-house file system can handle data on a petabyte scale,” Kural says, “so no matter how many genomes are thrown at us, we can handle it.” Wernicke adds, “Many customers are wondering about the cost and security of long-term data storage, which is why Seven Bridges is also enabling Amazon Glacier.”
To further optimize the efficiency of genome analysis, the IGOR platform stores reference data on Amazon Elastic Block Store (Amazon EBS) snapshots so that it can dynamically mount Amazon EBS volumes with the data they need to run their pipeline. Seven Bridges is also using Amazon Route 53 to dynamically map DNS records for their customer-facing demonstration environment. See the diagram below for more information.
“We take full advantage of what AWS offers us,” Kural adds. “We can always offer the right instances for the computation we’re about to do by predicting how the tools will behave based on the parameters and input data. This way, we can optimize resource allocation and ensure that the sometimes very demanding tools don’t crash. Reserving the right Amazon EC2 instances also lets us find the sweet spot for our customers in terms of computation time and cost.”
The IGOR platform, powered by AWS, is enabling researchers to analyze massive data sets—even tackling hundreds of whole human genomes at once. “By using the AWS Cloud, researchers around the world are expanding our understanding of the human genome,” Kural says.
“DNA analysis is more accessible and financially viable as a result of the AWS Cloud,” Kural adds. Customers who use existing pipelines as models and run common types of analysis can save at least 40 percent from traditional analysis costs. “Thanks to AWS, we can offer our customers pay per use. They’re used to fixed prices, but once they run a couple of pipelines and see the prices we can offer, they love it.”
Seven Bridges customers also see several advantages from using Amazon Glacier for long-term storage, Wernicke adds, including the ability to create redundant backups of even very large data sets. It is also a benefit to publishing research, since publication schedules can be lengthy and storage can be expensive. “Amazon Glacier is a very convenient way to keep data around as long as you need it without having to pay for active storage,” Wernicke says.
“By basing our IGOR platform on AWS, we can empower researchers to do genomic analyses without breaking a sweat,” Kural says. “It wouldn’t be possible to do it any other way. We wouldn’t exist as a company without having access to the AWS Cloud.”
To learn more about how AWS can help your data needs, visit our Big Data details page: http://aws.amazon.com/big-data/.