Eagle Genomics has been using AWS since its beginning. According to Operations and Delivery Director and Co-Founder Richard Holland, when Eagle Genomics first launched, AWS was the only affordable and reliable source of on-demand compute power. “As a small company with limited resources, it was not an option to purchase our own machines or rent them full-time in someone else’s data center. Plus, the machine sizes we often needed were simply not available via this route,” says Holland. “Amazon was the market leader at the time and seemed the obvious choice. A simple experiment to see if it really worked rapidly turned into routine everyday use, and we now do almost everything on AWS.”
Although Eagle Genomics continues to investigate AWS competitors as they appear on the market, “we have yet to find anything that offers the same range of functionality and flexibility as AWS,” says Holland.
Eagle Genomics uses Amazon Elastic Block Storage (Amazon EBS) to back instances on Amazon Elastic Compute Cloud (Amazon EC2) for their private dedicated mirrors, with databases provided by Amazon Relational Database Service (Amazon RDS) and Security Groups settings to firewall the system.
For customers who require load-balancing and auto-scaling, Eagle uses Amazon Elastic Load Balancing and Amazon Auto Scaling. When end-to-end HTTPS and advanced traffic management is required, this arrangement is replaced with Zeus Traffic Manager via Amazon DevPay. For user logins via SSO, Eagle Genomics integrates OpenAM (formerly known as OpenSSO) into the system.
Eagle Genomics software is written mostly in Perl, although some projects such as the OpenAM and Zeus integration involve Java. At this time, Eagle Genomics uses eHive architecture-agnostic software written by the Ensembl group at the European Bioinformatics Institute (EBI) for customers running on anything from Condor to SGE to LSF, as well as AWS. However, plans are underway at Eagle Genomics to build an Amazon-only in-house software based on eHive in order to take advantage of advanced AWS features including Amazon’s messaging services.
Eagle Genomics finds Amazon’s command-line tools for automatically firing up and tearing down nodes highly useful, as they allow the pipeline to self-scale within set limits and match variations in the time necessary for processing data step-by-step. Where data is too big to share via Amazon RDS, it is shared instead by a virtual filesystem backed by Amazon Simple Storage Service (Amazon S3), using third-party Python tools to make the Amazon S3 bucket mimic a real Linux filesystem.
The major benefit Eagle Genomics finds in AWS for their dedicated mirror service is the flexibility of instantaneous scaling up and down to additional resources. Stress-testing of the Eagle Genomics proposed hosted mirror service, currently at the proof-of-concept (PoC) stage, has shown positive results regarding scaling, as well. In addition, the Eagle Genomics team is working on a Web interface to allow publication of pipelines for the more common tasks, allowing users to configure and run those pipelines themselves.
“AWS is flexible, low-cost, and totally reliable,” says Holland. “The security features are top-notch, better than anything I’ve seen in any of the private corporations and public institutes I’ve worked at in the past.” Adds Holland with a smile, “We love you!”
Update, August 2011: Eagle Genomics has recently been using Spot instances in the development of a novel microRNA discovery pipeline for the ARK Genomics at the Roslin Institute, Edinburgh, UK. Dubbed mirHive, the project developed a new, scalable method using parallel computing techniques to identify, verify and report on potential miRNA features within the chicken genome. Spot instances allowed Eagle to save development costs by running test cycles overnight when prices were lower as the results from each cycle were not needed urgently. The production system currently runs on a traditional cluster but will migrate to the cloud in future - and almost certainly to Spot instances given that they have worked so well in development.
To learn more, visit http://www.eaglegenomics.com/.
Added May 18, 2011