DNAnexus offers data management, next-generation sequence analysis, and visualization for DNA sequencing centers and researchers. DNAnexus’ services are provided through a single, unified system that scales to meet its clients’ unique academic or commercial needs. This unified system includes on-demand infrastructure for computation and storage, quality control, bioinformatics support, and genome mapping, among many other features.
Due to the complexity and large storage requirements of DNA data management and sequence analysis, DNAnexus had to carefully examine its cloud service provider options in order to choose a service that could meet its strenuous demands. The company ultimately chose Amazon Web Services (AWS) because it allows DNAnexus the flexibility to create its own software stack and the scalability to store 100 terabytes of data, with the potential to grow into petabytes of data storage.
Andreas Sundquist, DNAnexus’ CEO and Co-Founder, says of AWS “It is great that we don’t have to think about capacity planning or hiring IT people—it’s been a huge help to us.”
Today, DNAnexus’ entire data management and analysis system is built upon AWS. The company uses Amazon Simple Storage Service (Amazon S3) for all of its own storage needs and those of its clients. DNAnexus created a command-line tool that gives its clients the option to upload their DNA data directly to Amazon S3, thus preventing a bottleneck in the system by avoiding data uploads through the company’s Web server.
While Amazon S3 handles DNAnexus’ storage requirements, Amazon Elastic Compute Cloud (Amazon EC2) is responsible for the company’s interactive services and, most importantly, the DNA analysis itself. To conduct the DNA analysis, the company developed a custom queuing system that operates on Amazon EC2 Spot Instances. The benefit of Spot Instances is that they allow DNAnexus to reduce overhead expenses by bidding on unused Amazon EC2 instances. DNAnexus believes it saves approximately fifty percent by utilizing the Spot Instances. The company’s custom queuing system was designed to handle interruptions in data processing and will move individual analysis jobs to new Spot Instances as necessary.
DNAnexus did not initially use Spot Instances for its analysis, but would now recommend them. Andreas Sundquist explains that the company’s transition to Spot Instances was “A slightly different paradigm, not just flipping a switch. Using Spot Instances looks very different on the surface. But don’t be dissuaded—it’s actually very easy.”
Inspired by Amazon EC2 Spot Instances, DNAnexus is considering creating its own spot pricing strategy that could let its clients pay for analysis jobs based on time-sensitivity. Such a pricing strategy could allow time-critical analysis to be prioritized first, while allowing clients with less urgent requirements to pay lower prices.
DNAnexus uses Amazon EC2 On-Demand Instances for its interactive services, such as its website, customer front end portal, and DNA visualization tools. Amazon EC2 On-Demand Instances give DNAnexus the convenience of purchasing only the computing capacity that it needs on an hourly-basis.
Just as DNAnexus serves as a unified data management and analysis service for its clients, AWS serves as an inclusive cloud-based infrastructure for DNAnexus. Andreas Sundquist says, “We don’t have to think about capacity and storage. We know AWS will support our needs today and grow with us in the future.”
To learn more about how AWS can help with your large-scale analysis needs, visit our Big Data details page: http://aws.amazon.com/big-data/.