AWS HPC Blog

Quantum Chemistry Calculation with FHI-aims code on AWS

This article was contributed by Dr. Fabio Baruffa, Sr. HPC and QC Solutions Architect at AWS, and Dr. Jesús Pérez Ríos, Group Leader at the Fritz Haber Institute, Max-Planck Society.  

Introduction

Quantum chemistry – the study of the inherently quantum interactions between atoms forming part of molecules – is a cornerstone of modern chemistry. In particular, the study of quantum chemistry on polyatomic systems is vital to find new materials [1] or new drugs [2], or to understand chemical reactions at the interface between two media [3]. From a theoretical standpoint, all these applications share a common aspect: they require computationally expensive and demanding simulations. Having access to affordable, scalable, and efficient computational platform will help the quantum chemistry community explore previously unreachable scenarios, increasing our knowledge about the intimate life of atoms and molecules.

Cloud computing is emerging as a robust, efficient, and affordable computational solution to address complex problems for the scientific community. One of the benefits of cloud computing is the possibility of deploying virtual clusters with different architectures within minutes to meet the requirements of different applications and workflows. Another benefit is being able to run your computation immediately once it is needed, without waiting on a queue for a shared compute resource. As a result, many scientists and companies worldwide are looking to use cloud computing to find solutions to their problems in an efficient and cost-effective manner.

In this blog post, we will describe our collaboration with the Max Planck Institute leveraging cloud computing on quantum chemistry problems. In particular, we study the dynamics of a molecule reacting over a graphene monolayer, i.e., a periodic system of carbon atoms in a hexagonal lattice. The molecule under consideration is carbon monoxide (CO), chosen because of its relevance in atmospheric chemistry and one of the best-known diatomic molecules with significant binding energies in gas-phase chemistry (~11eV). On the computational front, we present a performance study for FHI-aims [4] using the test case we have previously discussed.

AWS offers over 400 types of compute instances to give customers the best price performance for different workloads. The variety of EC2 instances available allows you to find the best balance between an application’s performance and the cost of running it. For our study, we deployed a high-performance compute (HPC) infrastructure using AWS ParallelCluster with multiple queues and perform a detailed comparison between the different instance types, using both single and multi-instance configurations. From previous studies of HPC applications (OpenFOAM and GROMACS) we found that Amazon EC2 C6g instances powered by AWS-designed Arm-based Graviton2 processors gave better price performance results than other instance types, and we test out whether this holds for our quantum chemistry test application and workflow.

Method details

The system’s dynamics is performed using FHI-aims, an all-electron full-potential numerical atomic orbital basis set code, via a molecular dynamics (MD) approach assuming a microcanonical ensemble. The calculations involve the dynamics of 20 atoms, where 18 of them forming part of a periodic structure and two of them are in a gas form. In our approach, the interaction energy of the system is calculated on-the-fly using the density functional theory approach employing the generalized gradient approximation of Perdew, Burke, and Ernzerhof Functional [5]. The basis set is chosen as “tight”, and the integration is performed numerically in a grid. In addition, the PBE energy has been corrected to include proper van der Waals dispersion interactions [6]. In this work, graphene consists of 18 carbon atoms and is sampled with a 5x5x1 k-point grid. Finally, the initial conditions for the CO molecule are chosen based on the semiclassical quantization rule for the vibration and rotational degrees of freedom within the framework of quasi-classical trajectory calculations.

We compiled FHI-aims version 21.02 from source code with the latest Intel OneAPI compilers to take the advantage of the performance optimization provided by the AVX512 and AVX2 vector instructions available on the system with x86 architecture. For processes run on the ARM-based Graviton2 architecture, we use the GNU compiler version 10.2.0 and the ARM performance library version 21. All tests have been performed using Amazon Linux 2 operating system and a summary of different architectures, compilers, and shared libraries are shown in the following table.

  x86 architecture ARM Graviton2
Operative System Amazon Linux 2 Amazon Linux 2
Compiler Intel OneAPI Compilers 2021.2 GNU 10.2.0
Numerical library Intel OneAPI MKL 2021.2 ARM Perf 21, Scalapack 2.1
MPI library Intel OneAPI MPI 2021.2 OpenMPI 4.1.1

Single-instance performance

In our analysis we have compared the time to perform a single time step of the simulation for 6 different instance types. The instances are:

  • c5.24xlarge: 48 cores Intel Xeon Scalable Processors (Cascade Lake)
  • c5n.18xlarge: 36 cores Intel Xeon Scalable Processors (Skylake) and Elastic Fabric Adapter (EFA) network interface
  • c5a.24xlarge: 48 cores 2nd generation AMD EPYC 7002 series (AMD Rome)
  • m5n.24xlarge: 48 cores Intel Xeon Scalable Processors (Cascade Lake) with higher memory per instance and EFA network interface
  • c6g.16xlarge: 64 cores AWS Graviton2 Processor with 64-bit Arm Neoverse cores
  • c6gn.16xlarge: 64 cores AWS Graviton2 Processor with 64-bit Arm Neoverse cores with EFA support

A more detailed information about the instance types can be found here: https://aws.amazon.com/ec2/instance-types/.

We ran the application with the number of MPI tasks equal to the number of physical cores per instance to achieve the maximum parallelization possible for the current workload. We have disabled hyperthreading for each instance in use. In Figure 1, we plot the time to perform one single time step of the simulation for several instance types. In this case, a time step refers to the computational cost of a single step evolution in the time of the MD simulation under consideration. In general, an entire trajectory to characterize the system’s scattering properties requires 1000-time steps. We observe that the fastest time to solution instance is the c5.24xlarge given by the latest generation Intel Xeon Cascade Lake processor equipped with more cores per instance (48) and a high number of memory channels (12). The second fastest instance is the M5n, which is based on the same Intel architecture as the C5, but with a lower clock frequency (3.1 GHz vs. 3.4 GHz). Even though the m5n.24xlarge has higher memory per core (8 GiB vs. 4 GiB) than c5.24xlarge, FHI-aims does not benefit due to low memory utilization for this workload (75MB per core is sufficient). The c5n instance, based on the Skylake processor, is the third-fastest, which reflects the lower number of cores per instance. The Intel processor instances (C5, M5, C5n) perform the best for FHI-Aims, due to the AVX-512 capability, exposing a higher vectorization potential.

Figure 1: Single instance time per single time step performance for different instance type for FHI-aims application with MGS dataset. Lower bar is better.

Figure 1: Single instance time per single time step performance for different instance type for FHI-aims application with MGS dataset. Lower bar is better.

In Figure 2, we compare the application time per time step with the cost for running 100-time steps. Lower cost, up to 27% compared to C5, can be achieved using the C6g instance based on AWS Graviton2 processor. The C6g performance on this workload is 28% lower compared to the performance on C5. If lower cost is prioritized over shorter runtime, the C6g instance gives the best results.

Figure 2: Cost comparison between the different instance types for a single instance running the simulation for 100-time steps. Lower is better.

Figure 2: Cost comparison between the different instance types for a single instance running the simulation for 100-time steps. Lower is better.

Multi-instance performance

In order to identify the best combination of the number of instances, parallel efficiency, and simulation cost, we used the same workload and scaled it out from 1 to 8 instances and measured the time to solution. The results are shown in Figure 3. In this figure, we observe that the c5 instances provide the fastest solution for the running workload, which confirms the single-node findings. The m5n is very close, but the higher memory per node and the EFA network does not provide a significant performance benefit for using it on a large scale. In terms of parallel efficiency, the instances equipped with EFA (M5n, C5n, C6gn) show the best results, up to 85% for 4 instances of C6gn. However, using more than 4 instances is not justified by the application’s scalability for this workload, whose parallel efficiency can be as low as 55% when running on 8 instances.

In Figure 4 we compare the on-demand price for the simulation running 100-time steps for all different tested instance types using current Ohio Region pricing. This includes only the compute part, excluding the I/O or the filesystems costs. Regarding price/performance for the application running on several instances, we see the AWS Graviton2 instance C6g being the best choice, with a 23% lower cost on 4 instance runs compared to the C5 instance, which is the fastest one. The increase in time to solution is around 30%, which the lower cost highly compensates.

Figure 3: Scalability comparison between different instance types. Lower is better.

Figure 3: Scalability comparison between different instance types. Lower is better.

Figure 4: Cost comparison between different instance types. Price are for the Ohio region. Lower means better.

Figure 4: Cost comparison between different instance types. Price are for the Ohio region. Lower means better.

Conclusion

We have run FHI-aims using a different configuration of the EC2 instances for single and multi-instance simulations. While the time to the solution can be the key metric to understand when a given application will produce desired scientific results, the cost is an essential factor to consider. In Figure 5, we plot the cost for running 100-time steps of the simulation against the time to run a single time step for the most performing instances. Each dot corresponds to the number of instances, from 1 to 8, going from left to right. The c6g provides a significant saving at the cost of lower performance: close to 23% and 33% against the most performing instance c5 and c5n, respectively. The study also shows that the high-speed interconnect EFA for this FHI-aims workload does not provide any additional benefit for the time to solution for such a small number of instances, even though the parallel efficiency can be improved.

Figure 5: Cost of the simulation against the time to perform one step for different instance types. Each dot corresponds to the number of instances, from 1 to 8, going from left to right of the plot. Lower is better.

Figure 5: Cost of the simulation against the time to perform one step for different instance types. Each dot corresponds to the number of instances, from 1 to 8, going from left to right of the plot. Lower is better.

Furthermore, and more importantly, the present study confirms that the AWS infrastructure for cloud computing is well-suited to solve scientific problems in quantum chemistry. In particular, for scenarios in which MD simulations need to be used in conjunction with on-the-fly ab initio quantum chemistry methods. However, many other scenarios relevant for the application of quantum chemistry remain unexplored and will be the topic of further investigation soon.

To get started running your application on AWS, we recommend the following HPC workshop, where you can learn how to build your HPC cluster and environment. To learn more about HPC on AWS, visit https://aws.amazon.com/hpc/.

The content and opinions in this blog are those of the third-party author and AWS is not responsible for the content or accuracy of this blog.

References

[1]G. B. Olson, Designing a New Material World, Science 288, 993 (2000)

[2]W. L. Jorgensen, Efficient Drug Lead Discovery and Optimization, Accounts of Chemical Research 42, 724 (2009).

[3]J. C. Slater and K. H. Johnson, Quantum Chemistry and Catalysis, Physics Today 27, 10, 34 (1975).

[4]V. Blum, R. Gehre, F. Hanke, P. Havu, V. Havu, X. Ren, K. Reuter and M. Scheffler, Ab initio molecular simulations with numeric atom-centered orbitals, Computer Physics Communications 180, 2175 (2009).

[5]J. P. Perdew, K. Burke and M. Ernzerhof, Generalized Gradient Approximation Made Simple, Phys. Rev. Lett. 77, 3865 (1996).

[6] A. Tkatchenko and M. Scheffler, Accurate Molecular Van Der Waals Interactions from Ground-State Electron Density and Free-Atom Reference Data, Phys. Rev. Lett. 102, 073005 (2009).

Dr. Jesús Pérez Ríos

Dr. Jesús Pérez Ríos

Dr. Jesús Pérez Ríos is the group leader of the theoretical atomic, molecular and optical physics group of the department of molecular physics led by Prof. Dr. Gerard Meijer (Fritz Haber Institute of the Max Planck Society). His research focuses on studying theoretical and computational problems at the borderline between chemical physics and other disciplines of physics and chemistry like data science, condensed matter physics, or physics beyond the Standard Model. Most of his research project requires high-performance computing resources to reveal how atoms and molecules live in harmony within the fundamental laws of physics.

Dr. Fabio Baruffa

Dr. Fabio Baruffa

Dr. Fabio Baruffa is a senior specialist solutions architect at AWS. He designs large-scale customer solutions in the high-performance computing area and helps to accelerate quantum computing adoption using the cloud infrastructure. He has more than 10 years of experience in the HPC industry and academia, working as application engineer at Intel and HPC specialist in the largest supercomputing centers in Europe, mainly the Leibniz Supercomputing Center and the Max-Plank Computing and Data Facility in Germany, as well as Cineca in Italy. He holds a PhD in Physics from University of Regensburg for his research in spintronics devices and quantum computing.