Guidance for Computational Fluid Dynamics for Aircraft Design on AWS

This Guidance demonstrates how to design computational fluid dynamics (CFD) workloads on AWS. Large-scale aircraft simulations often require a heavy amount of compute for a short period. This Guidance moves CFD workloads to the cloud, where you can spin up thousands of compute cores and terminate them once a workload is complete, allowing you to provide valuable compute resources instantly without incurring the expense and delay of procuring servers. In the architecture, AWS ParallelCluster takes care of the undifferentiated heavy lifting involved in setting up an HPC cluster (including setting up Slurm) and configures autoscaling, mounting filesystems, and tracking costs through tags.

Architecture Diagram

Download the architecture diagram PDF

Guidance Architecture Diagram for Computational Fluid Dynamics for Aircraft Design on AWS

Step 1
AWS ParallelCluster deploys the HeadNode string instance type and Compute queues (visualization, compute, high memory).

Step 2
Connect to the HeadNode through Secure Shell (SSH) or NICE DCV. The HeadNode is responsible for running Slurm scheduler and giving you an environment to submit jobs.

Step 3
When a job is submitted, Slurm scales up the appropriate number of instances in a queue with the given CPU and Memory.

Step 4
You can define a queue for visualization. This is useful for model preparation and meshing. Meshing is a process of making a 2D or 3D grid to analyze it with simulation. Typically these instances have graphics processing units (GPU’s) attached such as the G4dn family.

Step 5
To run the solve portion of Computational fluid dynamics (CFD), we recommend using an Elastic Fabric Adapter (EFA) enabled instance such as the hpc6a.48xlarge. This queue is setup with EFA enabled and is in a placement group, ensuring low latency inter-node communication.

Step 6
For some CFD models, higher memory instances are required such as the hpc6id.32xlarge which has 64 cores and 1024 GB of memory. These can run on a high-memory queue.

Step 7
Amazon FSX for Lustre provides a shared parallel filesystem (/shared) that allows instances to work in parallel. We recommend FSx for Lustre as the scratch filesystem. Amazon Elastic File System (Amazon EFS) is used to store CFD applications (/apps) such as Siemens Simcenter STAR-CCM+, Ansys Fluent, or OpenFOAM.

Step 8
Long term storage of the model results is accomplished using Amazon Simple Storage Service (Amazon S3). The results can be called back at any point for future analysis. Data is synced automatically between FSx for Lustre and Amazon S3 through a Data Repository Association (DRA).

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

Operational Excellence

This Guidance is designed to help you respond to incidents and deploy your own changes. The high performance computing (HPC) clusters can be deployed in another Availability Zone (AZ) or Region with minimal changes. But the data must be stored in Amazon S3 and the cluster's configuration kept in the YAML file as Infrastructure as Code (IaC).

Changes to the infrastructure of this Guidance should be made in the HPC cluster’s YAML configuration file. This file should be versioned controlled and changes reviewed before deployment. This can be turned into a continuous integration and continuous deployment (CI/CD) pipeline.

Read the Operational Excellence whitepaper
Security

CFD simulations are often controlled under regulations such as International Traffic in Arms (ITAR) and FedRAMP. Therefore, we recommend running this Guidance in AWS GovCloud (US).

By default, AWS ParallelCluster allows connections on port 22 from 0.0.0.0/0 to the HeadNode. We recommend disabling all outside access and connecting through SSM.

We also recommend following the recommendations found in DoD-Compliant Implementations on AWS. It gives details about deploying Interleukin 4 (IL4), IL5, ITAR, export control deployments on AWS GovCloud US, and IL6 workloads on AWS Secret Regions.

File system data is protected using standard Amazon Elastic Block Store (Amazon EBS) encryption. In addition, FSx for Lustre is encrypted at rest and in transit. Locking down access from the file system to just the HPC cluster prevents inadvertent data access.

Files are accessed through standard Portable Operating System Interface for Unix (POSIX) file permissions. Access can be granted and revoked through user and group permissions.

Read the Security whitepaper
Reliability

A few key features provide high-availability. These include selecting On-Demand Instances for Compute Nodes and selecting a file system such as FSx for Lustre for fault tolerance and to prevent job failures. FSx for Lustre provides two file system deployment options: scratch and persistent. It also supports two persistent deployment types, Peristent_1 and Persistent_2.

By defining clusters as IaC and making backups through Amazon S3, the workload can easily be spun up in another Availability Zone or Region should a fault occur. We also recommend using On-Demand instances for tightly coupled CFD jobs and using FSx for Lustre Persistent_2 instead of scratch. All of these parameters prevent avoidable outages and allow for replication in the case of unavoidable outages.

Tightly coupled CFD simulations are inherently sensitive to instance failures or networking blips. By using On-Demand pricing models and FSx for Lustre Persistent_2, you can lower unexpected interruptions.

AWS ParallelCluster keeps logs in Amazon CloudWatch by default. It also monitors for idle instances and automatically shuts them down. It defines clusters as IaC, which allows them to be replicated in another Availability Zone. Configure alarms around parameters such as Network Utilization, Disk Usage, and spend. These alarms can then alert you of abnormal behavior so you can address it before an outage occurs.

Data in FSx for Lustre is backed up to Amazon S3 using a Data Repository Association (DRA). Should the file system fail, another one can be set up and reference the same Amazon S3 bucket.

Read the Reliability whitepaper
Performance Efficiency

Using FSx for Lustre with HPC instances and an Elastic Fabric Adapter (EFA) provides the optimal performance for tightly coupled CFD simulations.

The compute nodes should be located in the same Availability Zone as the FSx for Lustre file system and HeadNode. This reduces latency between the file system and instances and reduces cost by reducing inter-Availability Zone data transfer charges.

Read the Performance Efficiency whitepaper
Cost Optimization

For CFD, savings are realized in the engineering effort, such as research and development (R&D) costs and the cost of materials incurred in running real tests.

There are additional savings by running simulations and not real tests. The cost of a CFD simulation in the cloud is typically less costly than running a real test case when all the engineering time and material supplies are considered.

Data is kept in the Cloud by using desktop visualization software NICE DCV. This not only reduces data transfer cost, it also speeds up time between meshing and solve as data is kept on the same file system.

Tightly coupled simulations like CFD are especially sensitive to instances getting terminated, such as a spot reclamation. For this reason, we recommend using On-Demand as the purchasing model. You can reduce cost by using an HPC instance such as hpc6a.48xlarge, which is less expensive than the equivalent non-HPC instance.

AWS ParallelCluster dynamically scales up the compute nodes only when jobs are running. When those jobs complete, the instances scale down. This ensures no idle resources are running. In addition, this allows much larger bursts than an on-premises cluster provides. Typically, R&D fluctuates, with lots of compute used at key intervals such as the end of a project milestone. The cloud allows you to achieve this capacity when it’s needed and not pay for it when it’s not.

Read the Cost Optimization whitepaper
Sustainability

Instances are spun up only when needed, then terminated when no longer needed. You can review Slurm accounting logs to view the CPU utilization for the resources you requested. If you aren’t efficiently using the compute resources, you can reduce the amount requested in the next job.

FSx for Lustre provides a caching layer between Amazon S3 and the HPC cluster, this allows low-latency data access from the compute nodes. When the job is complete, results are stored in Amazon S3 which can be configured for Intelligent-Tiering, allowing for inexpensive data storage for cold data.

By matching requested cores and memory, Slurm efficiently scales up only the required number of instances and can pack multiple jobs onto the same instance.

Read the Sustainability whitepaper

Implementation Resources

A detailed guide is provided to experiment and use within your AWS account. Each stage of building the Guidance, including deployment, usage, and cleanup, is examined to prepare it for deployment.

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.

Open implementation guide

Open sample code on GitHub

Architecture Diagram

Well-Architected Pillars

Implementation Resources

Related Content

Computational Fluid Dynamics on AWS

Joby Aviation Uses AWS to Revolutionize Transportation

Simcenter STAR-CCM+ price-performance on AWS

Disclaimer

Was this page helpful?

Guidance for Computational Fluid Dynamics for Aircraft Design on AWS

Architecture Diagram

Well-Architected Pillars

Implementation Resources

Related Content

Computational Fluid Dynamics on AWS

Joby Aviation Uses AWS to Revolutionize Transportation

Simcenter STAR-CCM+ price-performance on AWS

Disclaimer

Was this page helpful?

Ending Support for Internet Explorer