AWS HPC Blog

Lattice Boltzmann simulation with Palabos on AWS using Graviton-based Amazon EC2 Hpc7g instances

Computational fluid dynamics (CFD) has grown over several decades to become a widely used tool to study a broad range of important industrial and academic problems, ranging from automotive and aircraft design to the study of blood flow inside the human body.

Whilst traditional Navier-Stokes (NS) based codes using the finite-volume approach are still the most widely used, an alternative approach – the Lattice Boltzmann method (LBM) – has emerged in the last two decades. LBM methods are particularly attractive to both industry and academia.

In this post we’ll show you the performance and cost benefits of running the Parallel Lattice Boltzmann Solver (Palabos) [1] – a specific LBM solver – on the latest generation of Amazon Elastic Compute Cloud instances based on the AWS Graviton.

Palabos is an open-source, C++ solver developed by the University of Geneva. It’s designed to run and scale efficiently on HPC clusters. Palabos is a highly versatile computational tool and has been widely used in the LBM community and is often a reference implementation for many Lattice Boltzmann models.

Lattice Boltzmann models (LBMs)

We mentioned LBMs are popular with both industry and academia. There are several reasons for this.

First, they’re flexible. You can model complex geometries whether they’re stationary or moving. Second, they suffer from minimal numerical dissipation, which makes them ideal for high-fidelity scale-resolving methods and aeroacoustics.

They also have superior computational performance, which stems from two key characteristics: data locality, and the fact that they’re easier to vectorize and set up for multithreading (on both CPUs and GPUs) compared to NS-based methods.

To illustrate these points, we’re going to show you how Palabos simulates blood flows [2][3] that can help researchers better understand vascular diseases and cancer cell movement through blood vessels. This can also provide a controlled environment to evaluate different treatment options.

There’s another reason LBMs are interesting. Given they’re highly scalable, researchers can run them on thousands or tens of thousands of cores to either get answers very quickly, or drive up the fidelity of the simulations themselves – or often both. But the availability of HPC resources can potentially become a bottleneck for many. As you can imagine, that causes a lot of researchers to look for large scale resources beyond their usual environments: enter AWS which can offer all the additional capacity they’re asking for. It’s also robust, secure by design, low cost and incredibly flexible.

Introducing AWS Graviton3E

Amazon EC2 offers hundreds of instance types optimized to fit different use cases. Instances vary based on CPU, memory, storage, and networking bandwidth giving customers the flexibility to choose the right mix of resources for their applications.

Amazon EC2 Hpc7g instances are the latest addition to the extended family of Graviton-based EC2 instances. They carry DDR5 memory with an AWS Graviton3E processor, and 200Gbps low-latency networking delivered by Elastic Fabric Adapter – especially important when you’re scaling to a large number of instances. They also consume up to 60% less energy for the same workloads than other comparable x86-based instances tailored for HPC applications – this is great for the planet.

In an earlier post, we published some performance results for real world workloads from CFD, finite-element analysis (FEA), molecular dynamics, and numerical weather prediction (NWP).

For this post, we compared Palabos performance across two HPC instance types offered by AWS and we found that Hpc7g has competitive performance. In our tests, Hpc7g delivered up to 75% better performance and up to 3x better price-performance compared to the previous generation Graviton instances (C6gn).

Benchmark simulation result and performance

For our tests, we used two instance types: Hpc7g.16xlarge (the latest processor in Graviton family customized for HPC applications) and C6gn.16xlarge (the previous generation processor in Graviton family). We used AWS ParallelCluster to launch our environment, manage these fleets, and provide an Amazon FSx for Lustre (a popular parallel file system). ParallelCluster uses Slurm for its workload manager, making it familiar and easier to use. There’s an example ParallelCluster configuration file in the Graviton HPC best practices guide. You can also find a one-click launchable recipe in the HPC Recipes Library.

Cavity 3D benchmark

In our first test, we used the “lid-driven cavity problem in a cuboid”, a three-dimension benchmark with 1 billion cells (1001x1001x1001). The top wall moves to the right with a constant velocity, while the other walls are stationary. Figure 1 is a snapshot of the simulated velocity field at 10.1 sec. We used Palabos v2.3.0, compiled with GNU Compiler v11.1.0 and Open MPI v4.1.4.

Figure 1. A display of Palabos output velocity field at 10.1 sec for the three-dimension lid driven cavity problem. Arrow shows flow direction and speed.

Figure 1. A display of Palabos output velocity field at 10.1 sec for the three-dimension lid driven cavity problem. Arrow shows flow direction and speed.

We ran the same benchmark on both Hpc7g and C6gn scaling from 2 to 128 instances (8192cores) in both cases. We calculated throughput in million site updates per second (MLUPS) which we’ve shown in Table 1.

At 8192 cores, the solver maintains a strong scaling efficiency of 60% on Hpc7g (Figure 2), which also had 75% higher throughput than the previous generation Graviton 2 (C6gn) instance. The simulation cost on Hpc7g is close to 1/3 of the cost of doing it on C6gn (Figure 3).

Table 1. Cavity 3D benchmark performance (higher is better)

Table 1. Cavity 3D benchmark performance (higher is better)

Figure 2. Cavity 3D benchmark throughput (higher is better) on our two instance types; Hpc7g shows good efficiency at 8192 cores for this strong scaling test.

Figure 2. Cavity 3D benchmark throughput (higher is better) on our two instance types; Hpc7g shows good efficiency at 8192 cores for this strong scaling test.

Figure 3. Cavity 3D benchmark cost for 10000 iterations on our two instance types, lower is better.

Figure 3. Cavity 3D benchmark cost for 10000 iterations on our two instance types, lower is better.

Cellular blood flow simulation benchmark

Next, we studied the cellular blood computation, which has three components: the fluid solver, the solid solver, and the fluid-solid interaction.

The fluid solver solves the weakly-compressible Navier-Stokes equations. The solid solver, based on nodal projective finite elements method (npFEM) resolves the trajectories and deformations of the blood cells. And the fluid-solid interaction is loosely coupled through an immersed boundary condition.

The result of this hybrid simulation is shown in Figure 4. For this second benchmark, we simulated the blood flow in a 50x50x50um cube with 476 red blood cells (RBCs) and 95 platelets (PLTs).

Figure 4. Hybrid simulation of blood plasma (Palabos) and deformable RBCs/PLTs (npFEM).

Figure 4. Hybrid simulation of blood plasma (Palabos) and deformable RBCs/PLTs (npFEM).

Table 2 and Figure 5 show the performance in the number of iterations we can simulate per minute (so higher is better).

Hpc7g delivered up to 50% better performance and 2.45x better price-performance over C6gn for these tests.

Table 2. Cellular bloodFlow benchmark throughput (higher is better)

Table 2. Cellular bloodFlow benchmark throughput (higher is better)

Figure 5. Cellular blood flow benchmark throughput (higher is better) on our two instance types.

Figure 5. Cellular blood flow benchmark throughput (higher is better) on our two instance types.

Conclusions

In this post we showed you that a popular LBM code can be easily run on Graviton-based Amazon EC2 instances with up to 8192 cores. The Amazon EC2 Hpc7g instances showed up to 70% better performance and up to 3x better price-performance over the previous generation Graviton instances for Palabos.

The 3D Cavity benchmark scaled efficiently on Hpc7g instances up to 8192 cores and we think this illustrates how the combination of a computationally efficiency code, and large-scale cloud HPC facility can enable you to run larger cases than you could likely complete using limited on-prem HPC resources.

You can find the instructions to set up and run HPC applications on Graviton in our best practice guide. And you can find easy to consume recipes for building clusters to suite your taste in the HPC Recipe Library. These are open source projects on GitHub so you can let us know directly if you run into any technical issues. Let us know how you get on.

References

[1] Latt, J., Malaspinas, O., Kontaxakis, D., Parmigiani, A., Lagrava, D., Brogi, F., Belgacem, M. B., Thorimbert, Y., Leclaire, S., Li, S., Marson, F., Lemus, J., Kotsalos, C., Conradin, R., Coreixas, C., Petkantchin, R., Raynaud, F., Beny, J., & Chopard, B. (2021). Palabos: Parallel lattice boltzmann solver. Computers and Mathematics with Applications, 81, 334–350. https://doi.org/10.1016/j.camwa.2020.03.022

[2] Kotsalos, C., Latt, J., Beny, J., & Chopard, B. (2020). Digital Blood in massively parallel CPU/GPU systems for the study of platelet transport. Interface Focus, 11(1), 20190116. https://doi.org/10.1098/rsfs.2019.0116

[3] Zavodszky, G., van Rooij, B., Azizi, V., Alowayyed, S., & Hoekstra, A. (2017). Hemocell: A high-performance microscopic cellular library. Procedia Computer Science, 108, 159–165. https://doi.org/10.1016/j.procs.2017.05.084

Jun Tang

Jun Tang

Jun Tang works as a Software Development Engineer in the Annapurna Labs software team. Jun’s responsibilities at AWS include optimizing and benchmarking open source HPC software on Graviton-based instances, diagnosing and addressing customer issues. Jun has over 10 years experience in software development for seismic imaging and medical imaging. He has a Master’s degree in Electrical and Computer Engineering from Rice University.