AWS HPC Blog

Accelerating CFD development from years to weeks with agentic AI and AWS

The computational fluid dynamics (CFD) landscape is changing rapidly. Experienced engineers who once spent months or years on painstaking manual coding, debugging, and iteration might now accomplish the same complex simulations in weeks through the strategic application of agentic AI platforms. This transformation isn’t about replacing expertise; it’s about amplifying the capabilities of skilled practitioners and freeing them from tedious syntax management to focus on what they do best: physics, innovation, and engineering judgment.

For business leaders, agentic AI enhanced physics simulation represents a fundamental change in how their expert technical teams can deliver value. Engineering projects that leverage your team’s specialized CFD knowledge might now be accomplished far more rapidly while maintaining the same scientific rigor. The key lies in understanding how your experienced engineers can effectively leverage these AI-powered tools as force multipliers while avoiding the pitfalls that can lead to non-physical or unrealistic results, something that still requires deep domain expertise to recognize and prevent.

The traditional CFD development challenge

Computational fluid dynamics has long been the domain of specialists, but even experienced CFD engineers face significant challenges when setting up complex simulations in CFD tools such as OpenFOAM, one of the most powerful open-source CFD platforms. The traditional workflow involves multiple intricate steps, each presenting opportunities for time-consuming errors:

  • Mesh generation: Creating high-quality computational grids that accurately represent complex geometries, a process requiring both geometric intuition and numerical expertise
  • Physics configuration: Properly defining boundary conditions, initial conditions, and selecting appropriate partial differential equations
  • Solver setup: Configuring numerical schemes, convergence criteria, and solution algorithms while navigating version-specific syntax variations
  • Post-processing: Extracting meaningful results and validating physical accuracy across multiple solution fields

Seasoned CFD practitioners know that each of these steps harbors potential pitfalls. A single misplaced boundary condition, an incompatible solver-physics combination, or a subtle syntax error can invalidate weeks of computational work. The expertise required isn’t just in understanding physics, it’s in navigating the intricate details of your physics solver’s implementation, staying current with syntax changes across versions, and managing the countless small decisions that can derail a simulation.

This is where the pain points become clear: experts spend disproportionate time on implementation details rather than physics and engineering judgment. The question isn’t whether these engineers have the knowledge, it’s whether they should be spending their valuable time debugging capitalization errors or tracking down version-specific syntax changes.

Agentic AI: a new paradigm

Agentic AI platforms like Amazon Q Developer represent a fundamental shift from traditional coding assistance towards autonomous problem-solving. Unlike simple code completion tools, Amazon Q Developer can understand context, reason through complex problems, and execute multi-step code correction and suggestion workflows independently.

Agentic coding assistants don’t just write code, they can act as knowledgeable partners that can help you navigate the intricacies of the physics solver syntax, support debugging configuration issues, and even suggest physically appropriate modeling approaches.

Real-world application: data center thermal management

Let’s take a real-world application as an example use case. Specifically, we will model the thermal dynamics of data center cooling systems, which is a complex multi-physics problem involving:

  • forced convection from HVAC systems
  • natural buoyancy effects from heat-generating equipment
  • conjugate heat transfer through solid components
  • porous media modeling for heat exchanger simplifications
  • heat generation from CPUs operating at 250W each

This scenario illustrates the complexity that agentic AI can help manage. We will focus on showing how to leverage agentic AI for the mesh generation step. The same principles, however, can be applied to use agentic AI for the physics configuration, solver setup, and post-processing steps.

Mesh generation with API-driven tools

API-driven 3D meshers available today work exceptionally well with agentic artificial intelligence systems which can access the API directly. The combination enables procedural generation through JSON input configurations, automated quality assessment and refinement, and complex geometry handling through natural language descriptions.

Our approach uses an LLM to write Python code with gmsh meshing API for procedural mesh generation that enables visual validation and coupling to gradient-free optimizers. This method produces superior meshing correctness while providing an efficient script for both mesh creation and eventual integration with optimization workflows.

Step-by-step example

The following is an example session where we prompt the agent, receive a response, and refine as needed to get a result.

Initial prompt: “Create a python script that uses the gmsh API to generate meshes of a datacenter. The specifications of the datacenter should be in a json input file. We need to be able to define the total control volume dimensions, number of racks, and dimensions of the racks. Assume racks will be placed uniformly in the provided volume dimensions.”

AI response: The system generated a complete Python script with JSON configuration structure, including functions for datacenter volume creation, uniform rack placement algorithms, and gmsh mesh generation with appropriate boundary tagging.

Follow-up prompt: “Refine the mesh density around rack edges and add inlet/outlet boundary conditions for CFD analysis.”

Final result: The refined script produced a high-quality structured mesh with proper geometric representation (Figure 1), ready for direct use in OpenFOAM without additional cleanup.

Figure 1 Mesh of cuboids generated by the python code written by the LLM with json inputs for control

Figure 1. Mesh of cuboids generated by the python code written by the LLM with json inputs for control.

The AI can create high-quality meshes with minimal human intervention for clearly described geometry. For our data center applications involving “cuboids in boxes,” we rarely needed to examine the underlying Python meshing code. When written descriptions proved insufficient, ASCII diagrams or tabular representations effectively communicated desired patterns to the LLM. For example, to specify CPU locations within server racks consisting of X vertically stacked trays (each with a 6×3 layout), we provided a graphical 6×3 table (Table 1) in the prompt showing coordinate positions where CPUs should be located within each tray (results shown in Figure 2).

Y Dir
Table 1 A CPU orientation and location map made in a spreadsheet that was copied into an LLM prompt
1 2 3 4 5 6
X dir 1 1,1 1,6
2 2,3 2,4
3 3,2 3,5

Context: Python procedural generation code we previously generated

Prompt: “Add CPUs in the following locations of the server tray. Use the following ASCII map to understand placement and assume equal distance on the tray. The CPUs should have cold plates on them with piping that attaches to the rear door heat exchangers.

                        Y Dir                                                    

                        1          2          3          4          5          6

X dir     1          1,1                                                       1,6

            2                                  2,3       2,4                  

            3                      3,2                               3,5      

Figure 2 The CPU pattern that was created after using ASCII layouts.  The LLM was unable to understand the pattern via only natural language.

Figure 2. The CPU pattern that was created after using ASCII layouts.  The LLM was unable to understand the pattern via only natural language.

Finally, we arrive at a full rack shown in Figure 3. What traditionally requires 8-12 hours of CAD modeling and manual meshing was completed in 1 hour of iterating with prompts and code, representing an 8-12x efficiency gain while maintaining professional mesh quality.

Figure 3 Mesh generated by the python code written by the LLM with json inputs for control.

Figure 3. Mesh generated by the python code written by the LLM with json inputs for control.

In Figure 4 we put it all together and run a full datacenter CFD simulation where we can see the heat transfer, flow velocities, vorticity, etc. for the domain that we are analyzing. We used agentic AI to also accelerate our process of specifying the physics configuration, solver setup, and post-processing which will cover in future blogs. In the meantime, we would like to share some of our learnings and best practices in the next section.

Figure 4 3D visualization of data center rack layout with thermal contours showing temperature distribution and airflow patterns.  All inputs and mesh were created via LLM interactions.

Figure 4. 3D visualization of data center rack layout with thermal contours showing temperature distribution and airflow patterns.  All inputs and mesh were created via LLM interactions.

Best practices for agentic CFD development

Throughout using agentic AI for engineering simulations including computational fluid dynamics, structural analysis, thermal analysis, and electromagnetics, we have learned several best practices which we’ll share below.

1. Context engineering over prompt engineering

Traditional AI interactions focus heavily on crafting the perfect prompt. With agentic systems, context engineering becomes far more critical. The AI needs access to relevant documentation, source code, and examples to make informed decisions.

Critical strategy: Clone the entire OpenFOAM repository and make it available to the AI. This acts like an MCP (Model Context Protocol) lookup, enabling the AI to reference current syntax, examine source code for understanding feature interactions, and learn from official tutorials. We can then use natural language prompts to apply additional technical specifications such as adding heat sources to CPUs. The thermal engineer still needs to guide the agentic AI appropriately and verify the output. The AI doesn’t replace thermal engineering expertise, but allows engineers to focus on thermal analysis rather than coding syntax.

Context: OpenFOAM source code repository; fvModel OpenFOAM input file

Prompt: “I would like to add a heat source in the CPU volume. 1) Use the source code to understand how to enter energy in the form of W and not W/m^3. 2) I would like the heat insertion to begin at 0.1sec of the simulation and be at full power at 0.2sec.”

The LLM correctly configured the energy insertion parameters and recommended using ‘explicit’ rather than ‘implicit’ terms for improved realism in this scenario. Previously, this simple question required days of work:

  • Searching unclear documentation for energy input specifications
  • Navigating programmer APIs with limited engineering context
  • Examining source code across multiple files
  • Deciphering C++ syntax and cross-file dependencies
  • Figuring out syntax for transient power insertion

The LLM delivered a complete solution in under 30 seconds by:

  • Locating the exact code implementation
  • Clarifying energy input methodology with source evidence
  • Modifying OpenFOAM input files with correct syntax
  • Providing optimization recommendations

This represents orders of magnitude productivity improvement, allowing thermal engineers to focus on analysis rather than software archaeology. The key insight here is that you should leverage agentic AI with proper context engineering to dramatically accelerate technical workflows while maintaining engineering rigor.

2. Incremental complexity building

Critical strategy: Start with simple geometries and physics, then gradually add complexity. This approach allows the AI to build understanding progressively and reduces the likelihood of introducing multiple errors simultaneously.

For example, when we initially attempted to describe our complete data center setup in a single prompt, requesting Python code for a mesh with JSON inputs, 10 racks with 48U geometry, rear door heat exchangers, 8 CPUs with cold plates and heat pipes connected to the rear door, the LLM essentially ignored most of the complexity. It generated skeleton Python code with basic cuboid geometries, may or may not have included proper JSON inputs, and made arbitrary choices for input parameters that weren’t logically structured for the intended mesh application.

We adopted an incremental development approach, starting with simplified rack geometry where the racks, CPUs, and cold plates are represented as porous media blocks (Figure 4, top). The figure shows the low-fidelity baseline model and includes flow streamlines observed when we ran the simulation to illustrate the fluid dynamics behavior. We then added CPU placement and thermal components such as cold plates, and finally integrated heat exchangers (shown in the bottom of Figure 5). The figure shows the updated model with all the subcomponent details and includes the flow streamlines from running the simulation. We can see that this detailed model allows us to monitor individual CPU temperatures rather than averaged values across the domain. Each step allows the AI to maintain context and build upon verified foundations.

Figure 5 Progressive model refinement showing evolution from simple box geometry to complex multi-component data center rack. Top figure shows that a simple block modelled as porous media. The bottom figure shows modelling the CPUs and cold plates that transfer energy into the rear door heat exchanger.

Figure 5. Progressive model refinement showing evolution from simple box geometry to complex multi-component data center rack. Top figure shows that a simple block modelled as porous media. The bottom figure shows modelling the CPUs and cold plates that transfer energy into the rear door heat exchanger.

3. Physical validation protocols

When validating simulation setups, the specificity of your questions to the AI directly impacts the quality and physical accuracy of the results.

Critical strategy: Consistently ask about the physical realism of boundary and initial conditions. If you only ask whether something is “logical,” the AI might optimize for numerical convergence rather than physical accuracy. LLMs will answer the question asked, not necessarily the question you meant to ask. Even with extensive context, an AI might create non-physical models while knowing they’re non-physical simply because you didn’t ask the right validation questions.

Example:

Context: specific OpenFOAM boundary condition files

Prompt:  “Are these boundary conditions physically realistic for a data center environment?”

rather than

Prompt:“Are these boundary conditions logical?”

4. Git-based version control integration

Git is one of the most common versioning tools used today, yet it can be complex to set up, and many software developers use Git sub optimally.

Critical strategy: Use an agentic AI to help software developers manage Git operations through natural language commands, enabling sophisticated workflow management:

  • Branching strategies for different modeling approaches
  • Commit management for saving working configurations
  • Automated rollback when changes introduce problems
  • Cross-branch feature integration for combining successful elements

Example:

Context: Python OpenFOAM input creation script; Specific OpenFOAM input file for a boundary condition or modelling configuration

Prompt: “It seems we may have made an error or lost code in our revisions. I would like to look at previous state using git. Find when the loop was setting up porous media and fix the BC function. First use git diff then look at previous commits to find the previous working code.”

5. The simplification problem

One of the most frustrating behaviors we’ve encountered is the AI’s propensity to inappropriately simplify when facing convergence or setup issues. When a boundary condition causes convergence problems, the AI often takes the path of least resistance, silently removing or modifying the boundary condition rather than addressing the underlying numerical issues. This behavior stems from the AI’s training to provide “helpful” solutions, but it can silently compromise the physics of your simulation.

While VS Code’s change tracking and Git integration make it easier to spot these modifications after the fact, catching them in real-time requires vigilance. The AI might remove a critical heat flux boundary condition because it’s causing solver instability, when the real solution might be adjusting relaxation factors, changing numerical schemes, or refining the mesh in that region.

Critical strategy:  When encountering convergence issues, explicitly instruct the AI to “diagnose the problem without changing any boundary conditions” or “identify the numerical cause of this convergence issue while maintaining all physics constraints.” This forces the AI to address the root cause rather than taking shortcuts that compromise your simulation’s physical validity.

6. Context management

Agentic frameworks can reset prompt history when context windows overflow, potentially erasing all previous work.

Critical strategy: Inform the LLM to use grep and other CLI tools to extract specific information rather than loading entire large files. This targeted approach maintains context while providing necessary information.

AWS architecture for scalable CFD

Our AWS HPC environment delivers high performance distributed computing through AWS Parallel Computing Service (PCS) with native EFA integration and Amazon FSx for Lustre storage, optimized for tightly-coupled CFD workloads like OpenFOAM simulations. A reference architecture is shown in Figure 6 that demonstrates how the services connect to each other. Unlike prior HPC deployment architectures you may have seen, here we’ve included Amazon Q Developer in the architecture.

Amazon Q Developer functions as an intelligent diagnostic and optimization layer for HPC workloads, monitoring system metrics during CFD runs to identify bottlenecks, analyzing I/O patterns to recommend optimal Amazon FSx for Lustre configurations, and automatically generating Infrastructure as Code templates (leveraging AWS CDK) with performance-optimized settings.

A critical integration requirement is enabling MCP (Model Context Protocol) connection to AWS documentation to maintain Amazon Q Developer’s access to the latest HPC best practices and performance tuning guidelines.

Context: AWS MCP Documentation which is searchable by the LLM. “search_documentation” is a keyword that triggers the LLM to use the MCP AWS Documentation.

Prompt:  “I am running an OpenFOAM simulation, it seems to be taking a long time during mesh conversion and decomposition. Review the current system performance, then search_documentation to understand how we can optimize our Lustre setup. Generate a CDK python script that deploys the optimized storage.”

Figure 6. AWS architecture diagram showing job flow from Amazon Q Developer through PCS to HPC compute instances with FSx Lustre integration.

Figure 6. AWS architecture diagram showing job flow from Amazon Q Developer through PCS to HPC compute instances with FSx Lustre integration.

Strategic implications for organizations

Agentic AI doesn’t replace CFD expertise, it serves as an intelligent assistant that accelerates mastery of complex simulation frameworks. Engineers with solid physics understanding can now navigate OpenFOAM’s extensive codebase with unprecedented efficiency.

Consider the common scenario: an engineer knows they need to “setup a boundary condition for temperature,” but faces the learning curve of OpenFOAM’s specific syntax and structure. The AI tutor provides:

  • Contextual code guidance: Instantly translates physics concepts into OpenFOAM-specific implementations
  • Tutorial synthesis: Analyzes multiple examples and tutorials to create tailored solutions for unique scenarios
  • Source code intelligence: Rapidly reads and interprets OpenFOAM source code to understand parameter requirements and dependencies
  • Multiphysics navigation: Helps engineers merge examples from different physics domains when tackling coupled simulations

The traditional barrier of mastering a new codebase that often requires months of trial-and-error, is dramatically reduced. Engineers can focus on the physics and engineering judgment while the AI manages the syntax, parameter discovery, and implementation details.

What previously required hiring specialized OpenFOAM consultants or extensive training programs might now be accomplished by domain experts with AI guidance, democratizing access to advanced CFD capabilities across engineering teams.

This approach transforms the AI from a replacement tool into an intelligent learning accelerator that preserves the engineer’s critical thinking while substantially minimizing the friction of codebase complexity.

Risk mitigation through rapid prototyping

The ability to quickly test multiple modeling approaches reduces project risk. Teams can explore different physics assumptions, mesh strategies, and solver configurations without committing months to each approach. Organizations that master agentic CFD development can respond to design challenges and optimization opportunities far more rapidly than competitors using traditional approaches.

The future of CFD: toward full automation

The integration of agentic AI with CFD represents the beginning of a broader transformation.  We envision engineers submitting datacenter images and running simulations with minimal prompting, automating setup and post-processing so they can focus on engineering judgment rather than implementation details. While current implementations require human validation, progressive automation through sophisticated prompt engineering will minimize intervention. Engineers who embrace these tools will tackle previously unreachable challenges in weeks rather than years. The question isn’t whether to adopt these approaches, but how quickly can your organization master them for competitive advantage. To get started using Amazon Q Developer in your CFD workloads, check out the documentation at Amazon Q IDE integration.

TAGS:
Ross Pivovar

Ross Pivovar

Ross has over 15 years of experience in a combination of numerical and statistical method development for both physics simulations and machine learning. Ross is a Senior Solutions Architect at AWS focusing on development of self-learning digital twins, multi-agent simulations, and physics ML surrogate modeling.