AWS HPC Blog

Accelerating research and development of new medical treatments with HPC on AWS

The following article was written by Erick Jan-Vareschard, Head of AWS Public Sector, Gilles Tourpe, AWS Business Development Executive, and Nicolas Malaval, Solution Architect at AWS and the team at Sorbonne University.

Today, more than 290,000 researchers in France are working to provide better support and care for patients through modern medical treatment. To fulfill their mission, these researchers must be equipped with powerful tools. At AWS, we believe that technology has a critical role to play in medical research. Why? Because technology can take advantage of the significant amount of data generated in the healthcare system and in the research community to enable opportunities for more accurate diagnoses, and better treatments for many existing and future diseases.

Imagine we have a key that represents the cure, and a lock that represents the disease. Researchers are trying to find the right key to match a particular lock, for example, a molecule that interacts with a disease receptor. Now imagine that you have billions of keys and billions of locks. Trying to manually match the keys and locks would take years. Today, it is possible to accelerate this research using the cloud as it can store, structure, and analyze data at petabyte scale in minutes.

But capacity isn’t everything; security concerns when dealing with health data are also critical. Security is a top priority at AWS, which enables us to build applications that comply with the regulatory framework for health data security and privacy under the French HDS certification, as well as the European General Data Protection Regulation (GDPR). It is essential to equip researchers with technologies that will allow them to work in complete safety, while providing the confidentiality of the analyzed data.

To support elite research in France, we are proud to be a sponsor of two French organizations:  Gustave Roussy and Sorbonne University. AWS is providing them with the computing power and machine learning technologies needed to accelerate cancer research and develop a treatment for COVID-19.

Finding a cure for Covid-19 with the Sorbonne Université theoretical chemistry laboratory

In May 2020, the Theoretical Chemistry Laboratory (Sorbonne University / French National Centre for Scientific Research (CNRS)) began research to better understand the molecular functioning of the COVID-19 virus. For this project, it relies on the power of the national supercomputing center (GENCI) and of AWS Cloud to perform high performance computing (HPC) simulations, modeling different proteins involved in the SARS-Cov-2 and Cov-1 viruses. In this way, researchers in the lab hope to better understand the virus and contribute to drug and treatment development.

Molecular dynamics (MD) is an active field of research that is continually progressing. Among the various evolutions of the field, the definition of polarizable force fields themselves grows more complex. Indeed, beyond the popular pairwise additive models that remain extensively used, polarizable force field (PFF) approaches are becoming increasingly mainstream and start to be more widely adopted. This is mainly because accounting for polarizability is often crucial for complex applications and adding new physics to the model through the use of many-body potentials can lead to significant accuracy enhancements.

Numerous approaches are currently under development, but a few methodologies such as the Drude2 and the AMOEBA models have emerged. These models are usually employed because of the alleviation of the main bottleneck of these methods: their larger computational cost compared to classical pairwise models.

The availability of HPC implementations of such models within popular packages such as NAMD or GROMACS for Drude or Tinker-HP for AMOEBA fosters the diffusion of these new generation techniques within the research community.

Tinker-HP, which is part of the Tinker distribution, was initially introduced as a double precision, massively parallel MPI addition to Tinker dedicated to the acceleration of the various polarizable force fields (PFFs) and non-polarizable force fields (n–PFFs) present within the Tinker package. The code was shown to be efficient, when scaling  up to tens of thousands of cores on modern petascale supercomputers. Over the years, it has been optimized on various platforms, to take advantage of vectorization and the evolution of recent CPUs (Central Processing Units). However, in the last 15 years, the field has been increasingly using GPUs (Graphic Processor Unit) to take advantage of low precision arithmetic. Such platforms offer important computing capabilities at both low cost and high energy efficiency, which allows for reaching routine microsecond simulations on standard GPUs cards with pair potentials.

Using AWS to run Tinker-HP, it is estimated that the laboratory will be able to perform the necessary calculations in less than 6 months instead of several years. Based on the identified dynamic protein models that make up viruses, researchers will be able to refine their collaborations with experimental teams seeking to synthesize new active ingredients. The program, known as ‘open science’, will provide free access to data and results for use by other parties for the common good.

The outcomes of the research will be a public data set that will be shared with the community and  could help identify potent drugs. All data will be freely available and shared with the community. Key results will be shared in the MolSSI/Bioexcel community GitHub (https://covid.molssi.org/). The AWS Open Data team supports MolSSI and their collaborators in their commitment to open science, especially around COVID-19.

Beyond technology: The common good

In France, with the strength of researchers, doctors and available technology, we believe we can advance patient care and treatment. AWS’ HPC complete portfolio of HPC focused solutions allow HPC workloads from car design to drug discovery. With Sorbonne University’s innovative approach and AWS’ large computing infrastructure, we will be able to better our understanding of the COVID-19 virus to accelerate the development of new drugs, and improve patient outcomes.

References:

Journal of Chemical Theory and Computation – Tinker-HP: Accelerating Molecular Dynamics Simulations of Large Complex Systems with Advanced Point Dipole Polarizable Force Fields Using GPUs and Multi-GPU Systems (https://pubs.acs.org/doi/abs/10.1021/acs.jctc.0c01164)

Protease of SARS-CoV2 (https://pubs.rsc.org/am/content/articlehtml/2021/sc/d1sc00145k)

A Community Letter Regarding Sharing Biomolecular Simulation Data for COVID-19 (https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.0c00319)

Angel Pizarro

Angel Pizarro

Angel is a Principal Developer Advocate for HPC and scientific computing. His background is in bioinformatics application development and building system architectures for scalable computing in genomics and other high throughput life science domains.