Due to the complex nature of Bioproximity’s services, the company requires an abundance of computing power and associated data storage. Bioproximity’s founder and managing partner, Brian Balgley, knew that the expense of creating a private computing system, with all of the necessary peripherals, would be cost prohibitive. In response, Balgley and his associates turned to Amazon Web Services (AWS) to fill the company’s computing and storage needs.
Brian Balgley says, “We did not have the cash to setup an advanced analytical laboratory and a compute cluster. The pay-as-you-go model of AWS suited us perfectly, and still does. We do not have to worry about hardware obsolescence, maintenance, etc. We have much greater control over our IT costs than we would if we had an on-site infrastructure.”
Currently, Bioproximity is utilizing Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Block Store (Amazon EBS), and Amazon Simple Storage Service (Amazon S3). On Amazon EC2, the company is using Cluster Compute Instances to run on-demand Message Passing Interface (MPI) clusters. Bioproximity’s individual analysis processes and search algorithms can require dozens of computing hours.
Bioproximity has created a Ruby-based front end application for uploading its data to the MPI clusters within Amazon EC2 and for managing the individual computing jobs. The MPI clusters receive data, such as protein sequence libraries, stored in both Amazon S3 and Amazon EBS. Upon completion of their work, the MPI clusters return the resulting data back to the Amazon EBS, which is then backed up to Amazon S3 at regular intervals.
Bioproximity benefits from the flexibility of AWS and is confident in AWS’s stability. Brian Balgley explains, “Having the ability to launch machine images from scratch in an automated fashion helps enforce good operations and development practices. Beyond any financial benefits from surge computing, development and deployment time is saved by having consistent environments that can be easily upgraded or replaced instantly. Large compute clusters in the cloud (or in a local data center) will eventually have node failures. As a tenant in a larger cloud, these outages can be adjusted for quickly, but in a proprietary data center there can be considerable downtime or overhead to resolve the issues.”
The availability of Amazon EC2 Cluster Compute Instances has improved the reliability of Bioproximity’s cluster computing application. Brian Balgley explains that before Bioproximity used Cluster Compute Instances, “we had occasional issues with MPI communications. We don’t know with certainty but suspect network latency caused these. The result was that searches would fail and have to be re-run. Since switching to the [Cluster Compute] instances we have not had a single failure.”
In the future, Bioproximity plans to increase the number of AWS services it uses. The company will use Amazon Virtual Private Cloud (Amazon VPC) to meet the security requirements of certain clients. Amazon VPC is a secure bridge between a private technology infrastructure and AWS via a virtual private network (VPN) connection. Bioproximity is testing Amazon Elastic MapReduce for mining the company’s search outputs. Amazon Elastic MapReduce is a cost-effective option for processing large amounts of data in a Hadoop framework built upon Amazon EC2 and Amazon S3.
Bioproximity believes AWS gives the company a competitive edge within the contract research industry. Brian Balgley says, “We could not turn-around data to our clients in a reasonable amount of time and at a reasonable cost without AWS. Partly because of this, we have been able to offer some of the lowest pricing in the industry for these types of services.”
To learn more, visit http://www.bioproximity.com/ .
Added March 17, 2011