AWS Case Study: Harvard Medical School
About Harvard Medical School
The Laboratory for Personalized Medicine (LPM), of the Center for Biomedical Informatics at Harvard Medical School, run by Dr. Peter Tonellato, took the power of high throughput sequencing and biomedical data collection technologies and the flexibility of Amazon Web Services (AWS) to develop innovative whole genome analysis testing models in record time. “The combination of our approach to biomedical computing and AWS allowed us to focus our time and energy on simulation development, rather than technology, to get results quickly,“ said Tonellato. “Without the benefits of AWS, we certainly would not be as far along as we are.”
Tonellato’s lab focuses on personalized medicine—preventive healthcare for individuals based on their genetic characteristics—by creating models and simulations to assess the clinical value of new genetic tests.
Other projects include simulating large patient populations to aid in clinical trial simulations and predictions. To overcome the difficulty of finding enough real patient data for modeling, LPM creates patient avatars—literally “virtual” patients. The lab can create different sets of avatars for different genetic tests and then replicate huge numbers of them based on the characteristics of hospital populations. Tonellato needed to find an efficient way to manipulate many avatars, sometimes as many as 100 million at a time. “In addition to being able to handle enormous amounts of data,” he said, “I wanted to devise system where postdoctoral researchers can scope a genetic risk situation, determine the appropriate simulation and analysis to create the avatars, and then quickly build web applications to run the simulations, rather than spend their time troubleshooting computing technology.”
Why Amazon Web Services
In 2006, Tonellato turned to cloud computing to address the complex and highly variable computational need. “I evaluated several alternatives but found nothing as flexible and robust as Amazon Web Services,” he said. Having built datacenters previously, Tonellato could not afford the time he knew would be required to set up servers and then write code. Instead, he decided to conduct a test to see how fast his team could put together a series of custom Amazon Machine Images (AMIs) that would reflect the optimal development environment for researchers’ web applications.
Now, Tonellato’s lab has extended their efforts to integrate Spot Instances into their workflows so that they could stretch their grant money even further. According to Tonellato, “We leverage Spot Instances when running Amazon Elastic Cloud Compute (Amazon EC2) clusters to analyze entire genomes. We have the potential to run even more worker nodes at less cost when using Spot Instances, so it is a huge saving in both time and cost for us. To take advantage of these savings, it just took us a day of engineering, and saw roughly 50% savings in cost.” Tonellato’s lab leverages MIT’s StarCluster tools, which has built-in capabilities to manage an Oracle Grid Engine Cluster on Spot Instances. Erik Gafni, a programmer in Tonellato’s lab, performed the integration of StarCluster into our workflow. According to Gafni, “Using StarCluster, it was incredibly easy to configure, launch, and start using a running Spot Cluster in less than 10 minutes.”
In addition the LPM recognized the need for published resources about how to effectively use cloud computing in an academic environment and published an educational primer in PLoS Computational Biology to address this need. “We believe this article clearly shows how an academic lab can effectively use AWS to manage their computing needs. It also demonstrates how to think about computational problems in relation to AWS costs and computing resources,” says Vincent Fusaro, lead author and senior research fellow in the LPM.
“The AWS solution is stable, robust, flexible, and low cost,” Tonellato commented. “It has everything to recommend it.”
Tonellato runs his simulations on Amazon EC2, which provides customers with scalable compute capacity in the cloud. Designed to make web-scale computing easier for developers, Amazon EC2 makes it possible to create and provision compute capacity in the cloud within minutes.
Tonellato’s lab is thrilled with their AWS solution. “The number of genetic tests available to doctors and hospitals is constantly increasing,” Tonellato explained, “and they can be very expensive. We’re interested in determining which tests will result in better patient care and better results.” He added, “We believe our models may dramatically reduce the time it usually takes to identify the tests, protocols, and trials that are worth pursuing aggressively for both FDA approval and clinical use.”
To learn more about how AWS can help your big data needs, visit our Big Data details page: http://aws.amazon.com/big-data/.