Seven Bridges Genomics Case Study

2014

Seven Bridges Genomics, a Cambridge, Massachusetts-based bioinformatics firm, offers researchers and labs a cloud platform for analyzing genetic data generated through next-generation sequencing (NGS) technologies. Through its IGOR platform, Seven Bridges provides a one-stop solution for managing NGS projects and enables customers to create and run complex data analysis pipelines easily using a drag-and-drop interface. The AWS Cloud provides highly scalable computation and the means to easily consume, share and reproduce results.

Over the Shoulder Shot of Senior Medical Scientist Working with CT Brain Scan Images on a Personal Computer in Laboratory. Neurologists in Research Center Work on Brain Tumor Cure.
kr_quotemark

By basing our IGOR platform on AWS, we can empower researchers to do genomic analyses without breaking a sweat. It wouldn’t be possible to do it any other way. We wouldn’t exist as a company without having access to the AWS Cloud.”

Deniz Kural
Founder and CEO, Seven Bridges

The Challenge

Analyzing genomes promises to make personalized healthcare a reality by providing the gene markers for susceptibility to certain medical conditions or drug interactions. Advances in genomics have resulted in faster, less expensive DNA sequencing, and the amount of data being generated has increased swiftly. But once a genome has been sequenced, it must be analyzed. Researchers are challenged to keep up with the demand.

The DNA sequencing of a single human genome can produce several gigabytes of data, which include hundreds of millions of DNA fragments. Analyzing that data is akin to pattern recognition on a massive scale and requires specific knowledge about the methodology, tools and specific genomes. “The work is computationally intensive,” says Seven Bridges CEO and Founder Deniz Kural. “It can easily take over 200 hours of computation (without parallelization) to analyze a single human genome.”

DNA sequencers have become significantly faster at sequencing data in recent years, which has created a data processing bottleneck for researchers. “The speed at which we’re generating biological data is outpacing Moore’s Law,” Kural says. “A few years ago, researchers could perform all their analysis on desktop computers, but now they have to look at server clusters and parallel computation. Biologists and biochemists are not comfortable with this because it’s not their area of expertise. We needed a way to provide them with an easily accessible, customizable, and powerful infrastructure.”

To overcome these challenges, Seven Bridges Genomics developed IGOR, a cloud-based genetic data analysis and discovery platform. When Seven Bridges began planning IGOR, the company quickly came to the conclusion that AWS was the cloud solution for them. “No other cloud provider came close to the range of offerings AWS provided,” says Sebastian Wernicke, Seven Bridges’ Head of Business Development. “We can provide our customers with exactly the computing power and memory they need at a highly competitive price.”

Why Amazon Web Services

Seven Bridges has used AWS since it started developing the platform. “Our customers need access to irregular but intense computational capacity,” Wernicke says. “For some projects, they may need to scale to 100 servers for a few days, but once analysis is done, they scale back down to virtually none while they evaluate the results.”

Seven Bridges’ IGOR platform provides customers with a graphical user interface to set up data processing pipelines, either by using an existing pipeline as a model or by creating a new one. “It’s important to provide some guidance to users, as data analysis for genomics is usually very complex and requires dozens of tools to be configured and linked together in the right way,” explains Wernicke.

“Once the researcher is ready to begin the analysis, they flip a switch and walk away,” Kural says. “We enable our customers to use AWS in a very intuitive way.”

The company uses Reserved Instances (RI) to meet its customers’ diverse needs. “If we know someone needs a lot of power, we will reserve an instance with a lot of power,” Kural says. “If not, we will switch a user to something that doesn’t need as much power. And since we’re trying to optimize the instances that are being used for computation time and cost, it minimizes overhead for our customers.”

Seven Bridges is using Amazon Elastic Compute Cloud (Amazon EC2) for computation. “We give our customers access to the computational power they need through AWS, and we make it possible for them to link together open-source and proprietary analysis tools in different ways,” Wernicke says. “That way, they can customize the way they analyze a genetic sequence.”

The company also uses Amazon Simple Storage Service (Amazon S3) to store its data. “Our in-house file system can handle data on a petabyte scale,” Kural says, “so no matter how many genomes are thrown at us, we can handle it.” Wernicke adds, “Many customers are wondering about the cost and security of long-term data storage, which is why Seven Bridges is also enabling Amazon Glacier.

To further optimize the efficiency of genome analysis, the IGOR platform stores reference data on Amazon Elastic Block Store (Amazon EBS) snapshots so that it can dynamically mount Amazon EBS volumes with the data they need to run their pipeline. Seven Bridges is also using Amazon Route 53 to dynamically map DNS records for their customer-facing demonstration environment. See the diagram below for more information.

“We take full advantage of what AWS offers us,” Kural adds. “We can always offer the right instances for the computation we’re about to do by predicting how the tools will behave based on the parameters and input data. This way, we can optimize resource allocation and ensure that the sometimes very demanding tools don’t crash. Reserving the right Amazon EC2 instances also lets us find the sweet spot for our customers in terms of computation time and cost.”  

Example of a Seven Bridges Genomics Data Analysis Pipeline

Seven Bridges Genomics Architecture Diagram

Seven Bridges Genomics Architecture Diagram

Seven Bridges Genomics Architecture Diagram

The Benefits

The IGOR platform, powered by AWS, is enabling researchers to analyze massive data sets—even tackling hundreds of whole human genomes at once. “By using the AWS Cloud, researchers around the world are expanding our understanding of the human genome,” Kural says.

“DNA analysis is more accessible and financially viable as a result of the AWS Cloud,” Kural adds. Customers who use existing pipelines as models and run common types of analysis can save at least 40 percent from traditional analysis costs. “Thanks to AWS, we can offer our customers pay per use. They’re used to fixed prices, but once they run a couple of pipelines and see the prices we can offer, they love it.”

Seven Bridges customers also see several advantages from using Amazon Glacier for long-term storage, Wernicke adds, including the ability to create redundant backups of even very large data sets. It is also a benefit to publishing research, since publication schedules can be lengthy and storage can be expensive. “Amazon Glacier is a very convenient way to keep data around as long as you need it without having to pay for active storage,” Wernicke says.

“By basing our IGOR platform on AWS, we can empower researchers to do genomic analyses without breaking a sweat,” Kural says. “It wouldn’t be possible to do it any other way. We wouldn’t exist as a company without having access to the AWS Cloud.”


About Seven Bridges Genomics

Seven Bridges Genomics, a Cambridge, Massachusetts-based bioinformatics firm, offers researchers and labs a cloud platform for analyzing genetic data generated through next-generation sequencing (NGS) technologies.


AWS Services Used

Amazon EC2

Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers.

Learn more »

Amazon Route 53

Amazon Route 53 is a highly available and scalable cloud Domain Name System (DNS) web service.

Learn more »

Amazon S3

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.

Learn more »

Amazon Glacier

Amazon S3 Glacier and S3 Glacier Deep Archive are a secure, durable, and extremely low-cost Amazon S3 cloud storage classes for data archiving and long-term backup.

Learn more »

Amazon EBS

Amazon Elastic Block Store (EBS) is an easy to use, high performance block storage service designed for use with Amazon Elastic Compute Cloud (EC2) for both throughput and transaction intensive workloads at any scale.

Learn more »


Get Started

Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.