Deep-dive into Ansys Fluent performance on Ansys Gateway powered by AWS
This post was contributed by Dnyanesh Digraskar, Principal HPC Partner Solutions Architect, AWS, Ashwini Kumar, Senior Principal Engineer, Ansys, Nicole Diana, Director, Fluids Business Unit, Ansys.
Today, we’re going to deep-dive into the performance and associated cost of running computational fluid dynamics (CFD) simulations on AWS using Ansys Fluent through the Ansys Gateway powered by AWS (or just “Ansys Gateway” for the rest of this post).
Ansys Gateway is an AWS Marketplace hosted solution for users to manage their complete Ansys virtual desktop and HPC simulation workflows with more than fifty Ansys products in their AWS cloud environment. Emirates Team New Zealand and Turntide Technologies use Ansys Gateway to accelerate their design and engineering simulation cycles.
Ansys Fluent, an advanced physics modeling simulation software is used by engineers and scientists across industries like automotive, aerospace, manufacturing, and energy to innovate and optimize product development.
In this post, we’ll evaluate the performance and price characteristics for three test cases on various configurations of Amazon Elastic Compute Cloud (Amazon EC2) instance types. Using the results in this post, you’ll be able to make appropriate hardware choices for running Ansys Fluent simulations.
Ansys Gateway recap
Ansys Gateway is a secure, online platform that enables users to create, manage, and execute complete computer-aided engineering (CAE) workflows in their own AWS accounts. Earlier this year, we published a blog post that described solution architecture components, security implementation, and typical end-user workflows for using Ansys Gateway.
The Ansys applications (solvers) available through Ansys Gateway are pre-configured, validated, and extensively benchmarked on various Amazon EC2 hardware for performance and price. Ansys users can refer to the recommended usage guidelines on the Ansys Gateway documentation in Ansys Help to setup their virtual desktop infrastructure (VDI) or HPC environment with the recommended Amazon EC2 instance types. Users can then carry out simulations with solvers of their choice straight out-of-the-box without the need to manually setup and tune the solvers, simulation environment, and hardware parameters.
Benchmark information and simulation environment
For the benchmarking purposes of this post, we’ve used the test cases from the standard Ansys Fluent Benchmarks suite. The model description of each test case, including the mesh size represented in terms of number of cells, turbulence model used, and the fluid-flow condition are shown in Table 1. These benchmarks, shown in Figures 1a – 1c for visual reference, represent the typical size and physics modeled by users. We used Ansys Fluent version 2023 R1 to run the simulations.
AWS recently announced the Amazon EC2 Hpc7a instance type, powered by 4th generation AMD EPYCTM (Genoa) processors with up to 192 physical cores and 300 Gbps Elastic Fabric Adapter (EFA) network bandwidth. We compared the performance of these benchmarks on Hpc7a and the previous generation Hpc6a instances.
The following table (Table 2) summarizes Amazon EC2 instance configurations used:
Our objective for running these benchmarks was to quantify Ansys Fluent performance, associated hardware, platform, and license costs, and thus be able to recommend the appropriate Amazon EC2 instance type to use on Ansys Gateway. With that in mind, we performed the analysis by measuring the following:
- Solver Rating to represent the solver performance
- Ratio of performance to hardware configuration cost
- Ratio of performance to total job cost
Solver Rating: the Solver Rating is defined as the number of times the benchmark can be run on a given machine in 24 hours. It’s computed by dividing the number of seconds in a day by the number of seconds required to run the benchmark. A higher Solver Rating indicates better performance. Solver Rating is the primary metric we used in this post to report the performance. We ran our simulations for 1000 iterations for steady-state flow or 1000 timesteps for transient flow.
Job cost: The total job (or simulation) cost is comprised of three main components: the Amazon EC2 cost, Ansys Gateway charge of $0.25 per running Amazon EC2 instance per hour, and the Ansys Licensing cost.
These cost representations can guide you to select the right HPC configuration to meet any one of these three simulation goals:
- minimize job cost
- maximize performance
- achieve the best performance/cost ratio
Note that for the purpose of this post, we used the Ansys Elastic Licensing to account for the licensing costs which might not be fully representative if you’re using leased or perpetual licenses. Also, we haven’t accounted for storage and networking charges in our calculations, simply because compute constitutes most (or nearly all) of the infrastructure costs for these simulations. We used Amazon EC2 on-demand costs from the us-east-2 (Ohio) region. You can take advantage of flexible Amazon EC2 pricing options like Compute Savings Plan or Reserved Instances (RI) which provide a significant discount (up to 72%) compared to On-Demand pricing.
To understand Ansys Fluent performance, we plotted the variation of the Solver Rating against the number of instance cores for each of our three test cases. These plots are in Figures 2 (a, b, c). Higher Solver Rating signifies better performance. The Vehicle Exhaust model with 33 million cells was scaled to 1536 cores, while the Combustor and F1 Race Car models were scaled to over 6000 cores, respectively.
Since the number of physical cores on Hpc6a and Hpc7a instance types are different, the relative performance between the two instance types on cores and node level differ, too. In Figures 3 (a, b, c) we plotted the variation of the Solver Rating with the number of instances (nodes).
For the Vehicle Exhaust model with 33 million cells, the performance improvement with Hpc7a instances was 1.2x at 1536 cores on a per-core basis, and 1.78x at 32 instances on a per-instance basis. For the Combustor benchmark model with 71 million cells, the performance improvement with Hpc7a instances is 1.3x at 3072 cores on a per-core basis, and was 1.83x at 32 instances on per-instance basis.
As the benchmark model size increased, the instance topology and higher EFA networking bandwidth of Hpc7a instances helps to achieve better scaling. For the Formula 1 Race Car model with 140 million cells, the performance improvement per-core with Hpc7a instances was 1.3x at 6144 cores, and was 2.1x at 32 instances on a per-instance basis.
Full cores vs partial cores
These plots show performance results when running Ansys Fluent on the full available set of physical cores: 192 for hpc7a.96xlarge, and 96 for hpc6a.48xlarge. But it’s possible to manually disable certain numbers of physical cores – or use process pinning on each instance – to achieve better per-core performance because of increased memory bandwidth per core. We call this under-subscribing.
When running simulations on the Hpc6a instance type, which is available in only one size, under-subscribing is implemented in Ansys Gateway via job setup scripts. Under-subscribing on Hpc7a isn’t required because it’s a feature of the different instance sizes.
In this section we’ll take a detailed look at the impact this under-subscribing technique has on Ansys Fluent performance. Figures 4a, 4b, 4c show variations of Solver Rating with the number of instance cores for 100%, 50%, and 25% cores enabled per-instance for Hpc6a and corresponding sizes of Hpc7a instance type.
From Figures 4a – 4c, it’s clear that as we undersubscribe an instance (thus increasing the available memory bandwidth per core), the simulation performance improves. For the vehicle exhaust benchmark, the performance improvement with 50% and 25% cores enabled compared to 100% cores for both Hpc6a and Hpc7a instances was between 1.3x and 1.8x.
As the size of the simulation model increased, the performance improvement due to the under-subscribing became even more pronounced. For the Combustor model, the performance improvement with 50% and 25% cores enabled compared to 100% cores on both Hpc6a and Hpc7a was between 1.7x and 2.6x.
Finally, for the F1 Race Car model, the performance improvement with 50% and 25% cores enabled compared to 100% cores for both Hpc6a and Hpc7a instances was between 1.8x and 2.9x.
Now let’s look at the cost impacts (based on cloud infrastructure and licensing) for running these benchmarks with our various instance configurations.
Figures 5a – 5b show variations of performance to cloud-infrastructure-cost with across a range of cores-enabled per-instance for Hpc6a and Hpc7a, for the smallest and the largest benchmark case.
The cloud infrastructure costs plotted here represent the total hardware cost based on Amazon EC2, and the Ansys Gateway hardware flat charge of $0.25 per-instance per-hour. This is a good metric to follow when you want to evaluate the instances for best performance while considering the cloud infrastructure costs only.
From Figures 5a – 5b, you can tell that simulations run on the fully-subscribed Hpc6a and Hpc7a instances have the best performance per hardware-configuration cost. At higher core counts, the cloud infrastructure costs start to increase, resulting in a drop of the performance-to-cost ratio.
In Figure 6, we plotted the performance to total-job-cost ratio, where the total job cost is the sum of cloud infrastructure and the Ansys licensing cost. As the Ansys licensing costs can vary for each customer depending on their licensing agreement, for the purpose of this post we used the Ansys Elastic Currency (AEC) to represent the licensing cost, to keep it simple.
From Figure 6, you can see that simulations run on Hpc6a instances with 25% cores enabled, and hpc7a.24xlarge show the best performance to total-cost ratio. Running the simulations with 50% cores enabled on Hpc6a or on Amazon EC2 hpc7a.48xlarge is a good idea for customers with pre-existing Ansys licensing who are looking to balance performance and cloud infrastructure costs. All instance types provide better value at higher core counts because per-core license costs decrease when you use more cores.
In our plots, we focused on performance and cost comparisons, but customers are also interested in making the right Amazon EC2 choices based on simulation runtime and costs. In Figures 7 and 8, we plotted the variation of cloud infrastructure and total costs with simulation runtime. By referring to these plots, you can choose the right instance for your preferred simulation runtime.
From Figure 7, you can see that when prioritizing cloud infrastructure cost for a desired simulation runtime, the fully-subscribed instances offer the lowest cost because of utilizing maximum available cores on the instances.
When we add the software licensing costs – which can be high for larger simulations – the benefit of running the simulations on under-subscribed instances for performance gain helps to drive the total cost down. It’s clear from Figure 8 that under-subscribed instances offer the best total cost for a given simulation runtime.
Today we described the price-performance characteristics of Ansys Fluent CFD simulations on Ansys Gateway powered by AWS. We showed you some of our best practices for running simulations on different Amazon EC2 compute-optimized instances. And we described how total job costs comprise of infrastructure costs and Ansys licenses for the Ansys Fluent simulations. With these results you should be able to select the right instance types for running your simulations, whether your goal is to maximize performance or minimize costs.
You can get started with Ansys Gateway powered by AWS by subscribing through the AWS Marketplace. Follow the Ansys Gateway YouTube channel for in-depth step-by-step tutorials and video guidelines to get started with setup and running Ansys applications. Finally, you can refer to the Ansys innovation Space learning forums for Ansys Gateway specific help.
The content and opinions in this blog are those of the third-party author and AWS is not responsible for the content or accuracy of this blog.