Crowdsourcing a cure for COVID-19: How the cloud and Folding@home are accelerating research and drug discovery
Today more than 200,000 volunteers around the world are helping accelerate research toward COVID-19 therapies—by walking away from their computers. That’s because of a concept called distributed computing, which allows anyone with a home computer, laptop, or virtual machine to contribute computing power to a common cause. This month, nonprofit Folding@home has started sharing one of the world’s largest public protein simulation databases as an AWS Open Data Set so that researchers around the world can easily access this data to speed up the search for therapies for COVID-19.
The information Folding@home uploaded includes a batch of crowd-sourced molecular simulations intended to help researchers develop an antiviral therapy to treat individuals infected with COVID-19. This data set is the result of Folding@home’s collaboration with COVID Moonshot, an open science and open source project to develop a low-cost, patent-free antiviral therapy that targets SARS-CoV-2, the coronavirus that causes COVID-19 disease. Folding@home is currently supporting the COVID Moonshot project through a series of one-week “sprints,” during which volunteers (called “folders”) can choose to direct all of their computing power to run simulations to improve on promising potential therapies that emerged from thousands of molecule ideas submitted by chemists around the world working together on a COVID antiviral. Molecules can be thought of as ingredients, which in the right combination can potentially yield a recipe for a COVID drug.
Moonshot has received over 13,000 design ideas from scientists around the world and experimentally tested 1,000 of these designs, finding more than 60 promising compounds to date. To carry these promising compounds all the way to drugs, these Folding@home sprints are helping researchers prioritize which compounds to make and test for efficacy against the virus, accelerating progress by allowing the team to focus finite resources on the compounds that are most likely to be useful. Thanks to Folding@home volunteers, tens of thousands of potential improvements to the current best potential therapies can be evaluated in a matter of days—rather than weeks—aided by servers donated by Amazon Web Services (AWS) handling the hundreds of terabytes flowing through Folding@home. The more volunteers joining the effort, the more potential designs can be evaluated, and the faster the Moonshot can identify a potent and safe therapy to bring to clinical trials.
The first Folding@home sprint for the COVID Moonshot completed on August 16, taking three weeks to assess over 1,000 molecules. As more folders joined forces, Sprint 2 took less than two weeks to evaluate over 6,000 potential inhibitors. Since then, multiple additional sprints have helped sort through tens of thousands potential molecules to synthesize to address difficult questions in optimizing the binding of COVID Moonshot lead compounds. On the back end, these sprints were powered entirely by a single Amazon Elastic Compute Cloud (Amazon EC2) instance, creating and distributing work units at a rate of up to 16,000 work units per hour, processing over six million work units from folders in a single week during the latest sprint. Using the results of these work units, over 40 promising molecules are currently being synthesized for testing, with more molecules sent off for synthesis after each sprint. Once synthesis is complete, the new molecules are shipped to collaborators around the world to assess how well they inhibit SARS-CoV-2 in biochemical and viral assays, and the data is shared immediately online to scientists and the public through the COVID Moonshot website. With the Registry of Open Data on AWS, Folding@home is able to share the detailed datasets that come from these sprints—orders of magnitude larger than datasets that have been generated in the past—to help accelerate progress in the science around prioritizing compounds for synthesis targeting COVID-19 and other diseases.
Beyond powering the Moonshot sprints, Folding@home is helping to advance the scientific community’s understanding of COVID-19. Simulating protein dynamics—how atoms in a protein move relative to one another—requires significant computing power and is crucial to developing therapies and for understanding how viruses like SARS-COV-2 take hold and progress. Anyone can participate in this research by simply installing the Folding@home program on their computer or virtual machines. While the computer sleeps, it’s actually running simulations of protein dynamics in the background. To date, Folding@home has produced over 100,000 times more data on SARS-CoV-2 than is typically created for other simulations studies, providing an unprecedentedly rich dataset to mine for insight, as reported in their recent bioRxiv preprint.
“AWS is helping to provide the added scale, speed, and technical expertise we need to manage the rapid pivot we’ve made to focus our community research efforts on the coronavirus,” said Dr. Greg Bowman, director of Folding@home and an associate professor at Washington University in St. Louis. “Among the various projects we are working on with AWS, we’re looking forward to this next milestone of publishing our data on the Registry of Open Data on AWS. Folding@home was founded on the idea of using scale to accelerate scientific progress, and working with AWS to publish the community-generated insights is an important extension of this concept that we hope will accelerate the search for a cure to COVID-19.”
Prior to sharing their data as an AWS Open Data Set, the file-sharing process was time-consuming and cumbersome given the file sizes, sometimes even requiring physical hard drives to be mailed to be able to share a data set. By uploading this data to the cloud, the information can be downloaded quickly on-demand by any interested parties or computed on directly in parallel within AWS.
Pivoting to fight the pandemic
Since its launch in 2000, Folding@home has studied numerous diseases—including diseases like cancer, Parkinson’s, and Dengue—but when COVID-19 evolved to a global pandemic, the organization shifted its efforts to fully focus on better understanding the coronavirus. The number of devices running Folding@home grew from 30,000 devices pre-pandemic to over one million by May 2020, crossing one exaflop in computing power—the equivalent of running over a 1,000,000,000,000,000,000 operations per second.
The surge in interest by volunteers to support Folding@home required additional server power and infrastructure capable of handling the sudden growth in throughput required by nearly two orders of magnitude. The AWS Disaster Response Program, which has coordinated a number of initiatives to use AWS cloud resources to mitigate the impacts of COVID-19, began to explore how AWS could help. An AWS team of solution architects, technical account managers, software development engineers, program managers, and product managers engaged with the Folding@home team to collaborate on ways to combine the power of community donated compute with the scalability, elasticity, and agility of the cloud to accelerate Folding@home work on COVID-19 drug discovery. AWS worked quickly with Folding@home to scale work servers so that more volunteer computers could be engaged to process more work units, eliminating infrastructure bottlenecks that slowed down scientific progress. The faster the work servers create work units, the faster that COVID-19 research progresses. In a short time, several Amazon EC2 instances were up and distributing work units for Folding@home. As of August 2020, these work servers have created and distributed over 30 million work units.
In the coming months, Folding@home will be pursuing as many parallel routes to accelerating the development of COVID-19 therapies as possible. The more volunteers who contribute their compute power, the better the project’s chances of making rapid progress.