AWS HPC Blog

Smashing computational barriers: data-driven ball-impact modeling on AWS

The engineering world continuously seeks faster – and more accurate – predictive models to design safer and more resilient products. Traditional finite element methods (FEM) have long been the backbone of impact analysis, but they come with substantial computational costs. In this context, machine learning (ML) offers a transformative opportunity to develop data-driven surrogates that can predict the transient response of structures to impact in a fraction of the time.

This blog post introduces a new approach to predict transient responses using machine learning models. Our focus is on ball-impact elastodynamics – a scenario critical to industries like consumer electronics, automotive, and aerospace.

We’ll explore how advanced ML models, like U-Nets and Fourier Neural Operators (FNOs), can deliver high-fidelity predictions at speeds around 10,000 times faster than FEM simulations. We’ve published this in the Journal of Physics Communications.

The challenge of ball-impact simulations

Predicting the response of materials to impact is a notoriously complex problem. These simulations involve non-linear material properties, dynamic wave propagation, and localized stress concentrations – all of which require significant computational power.

Understanding the nuances of stress wave propagation, energy dissipation, and deformation mechanisms is vital for accurate modeling. Figure 1 provides an exaggerated example from the physics simulations to illustrate how deformations propagate through the material.

Figure 1 This video demonstrates a ball impact scenario with exaggerated deformations using a coarse mesh. The setup shows the relative sizes of the ball and plate, highlighting both the macro bounce and the amplified micro/nano scale deformation waves for illustrative purposes.

Figure 1 This video demonstrates a ball impact scenario with exaggerated deformations using a coarse mesh. The setup shows the relative sizes of the ball and plate, highlighting both the macro bounce and the amplified micro/nano scale deformation waves for illustrative purposes.

Traditionally, we use FEM to predict the transient response. While it’s accurate, it demands a lot of compute cycles. Running a single impact scenario may take several hours, and scaling this process to evaluate multiple designs becomes very time consuming. This hinders rapid iteration and delays the introduction of improved designs to the process (and – eventually – to market).

The need for more agile, cost-effective, and scalable approaches has driven research into data-driven modeling techniques.

Why machine learning?

Machine learning—specifically, physics-informed neural networks and neural operators—has been slowly evolving to handle larger problems and is now entering more realistic applications beyond purely academic use, enabling faster and more efficient simulations.

These models learn to predict the response directly from training data, bypassing the need to solve complex sets of differential equations each time.

The benefits of using ML for transient response prediction include:

  • Speed: Predictions can be made in seconds instead of hours.
  • Scalability: Multiple impact scenarios can be evaluated quickly.
  • Generality: Models can generalize to new design configurations, reducing the need for re-simulation.

This study focuses on two ML architectures: U-Nets and Fourier Neural Operators (FNOs). We tested each of these models to determine their effectiveness at predicting transient structural responses.

By using these ML approaches, engineers can assess multiple impact scenarios in a much shorter timeframe.

Problem formulation

The central challenge is to predict the transient response of a laminated structure when impacted by a ball. A laminated structure consists of multiple layers of materials bonded together, which complicates simulations due to the need to model interactions between layers. This complex problem involves tracking the displacement of points within the material over time as the impact occurs, capturing the intricate interplay of stress wave propagation, energy dissipation, and deformation mechanisms.

When a ball strikes the material, it generates a rapid, localized force that propagates through the structure as stress waves. These waves travel at different speeds depending on the material properties, creating a complex pattern of compressions and rarefactions. As the waves interact with boundaries and internal features, they reflect, refract, and interfere, leading to a dynamic stress field that evolves over time.

The temporal aspect of this problem is crucial and particularly challenging. The initial impact creates high-frequency, short-duration effects that require fine time resolution to capture accurately. However, the overall structural response may continue for a much longer period, necessitating extended simulation times. This multi-scale temporal behavior poses a significant challenge for traditional finite element methods (FEM), which must resolve both the rapid initial dynamics and the longer-term response, often leading to prohibitively long computation times.

Furthermore, the material’s response can be highly nonlinear, with potential for localized plasticity, damage, or even failure. These nonlinearities can cause sudden changes in the structure’s behavior, further complicating the temporal evolution of the system. Capturing these abrupt changes accurately requires adaptive time-stepping strategies in traditional methods, adding another layer of computational complexity.

Machine learning models

To predict transient response, we explored two spatial ML kernels: U-Nets and Fourier Neural Operators (FNOs). Both models have advantages and cater to different aspects of the problem. Let’s look at them in turn.

U-Nets

Originally developed for biomedical image segmentation, U-Nets employ an encoder-decoder structure with skip connections. This structure allows the network to retain fine spatial details.

Here’s how it works:

  • Encoder: Extracts features at multiple resolutions.
  • Bottleneck: Captures the global context of the data.
  • Decoder: Reconstructs the displacement field at the original resolution.

This approach is particularly useful for modeling spatially localized phenomena – like ball impacts – where high detail is required in specific regions. The U-Net’s skip-connections allow it to capture fine-grained information that would otherwise be lost during down-sampling.

Fourier Neural Operators (FNOs)

FNOs transform the input to the Fourier domain, apply linear operators, and then return the result to the spatial domain. This method is powerful because convolutions in the Fourier domain are equivalent to pointwise multiplications, enabling efficient operations on large domains.

Key Steps in FNOs:

  • Fourier transform: Transforms input displacement fields into the frequency domain.
  • Operator application: Applies pointwise multiplications in the Fourier domain.
  • Inverse transform: Converts the frequency-domain representation back to the spatial domain.

The advantage of FNOs lies in their ability to capture long-range dependencies, making them ideal for problems involving large-scale deformation or wave propagation.

Incorporating kinetic energy as a physics-based guide

To further enhance the predictive capabilities of the models, we incorporate a physics-based guide like the approach used in Physics-Informed Neural Networks (PINNs). Specifically, the kinetic energy of the impacting ball is used as an additional input to inform the models. By embedding kinetic energy considerations, the models are encouraged to adhere to physical laws during the training process, leading to more consistent and realistic predictions. This approach enables the models to maintain physical accuracy, especially in scenarios where traditional data-driven methods might deviate from expected results.

While our approach shares similarities with PINNs in the way they use physical principles to guide the model, it differs in key aspects.

Traditional PINNs typically incorporate physical laws directly into the loss function, enforcing these constraints during training. In contrast, our method uses kinetic energy as an additional input feature, rather than as a hard constraint in the loss function. This ‘soft guidance’ approach allows the model more flexibility in learning from the data while still benefiting from physical insights.

Temporal strategy: multi-resolution approach

One key challenge in predicting long-term transient response is error accumulation. As predictions roll forward in time, small errors compound, leading to divergence from the true solution. This can result in models drifting away from accurate predictions over long time horizons, undermining their reliability.

To address this, we developed a multi-resolution strategy. This strategy leverages two separate models, each operating at a different time scale to capture unique aspects of the transient response:

  • Fine-resolution model: Captures the fine details of displacement at small time intervals.
  • Coarse-resolution model: Predicts the global trend at larger time intervals. This model offers a broader view of the overall structural response, providing stability and reducing the influence of small, transient fluctuations.

The fine-resolution model provides precision, ensuring that rapid changes during impact are captured with high fidelity. In contrast, the coarse-resolution model stabilizes the predictions, preventing the accumulated error that can arise from small deviations. By blending the outputs of these two models, the hybrid approach reduces error accumulation and ensures long-term stability. This strategy enables accurate long-term predictions while maintaining computational efficiency, especially in cases where fine-scale events influence long-term system behavior.

Training and results

To evaluate performance, we generated a synthetic dataset comprising 6,500 unique ball-impact scenarios using the MOOSE (Multiphysics Object-Oriented Simulation Environment) FEM solver. Each scenario was designed to simulate a wide range of impact conditions, such as inclusion sizes and locations, and different ball drop locations. This comprehensive dataset ensured diverse conditions for training and testing the machine learning models, allowing for a robust evaluation of generalization capability.

We tested these ML models on unseen impact configurations to assess their ability to predict outcomes for new, never-before-seen scenarios. The evaluation focused on two critical metrics to determine model efficacy:

  • Maximum displacement error: This metric quantifies the difference between the maximum predicted displacement and the true maximum displacement, offering a measure of prediction accuracy.
  • Temporal stability: This metric tracks how much error accumulates over time, reflecting the model’s ability to maintain accurate predictions over long time horizons.
Table 1 Maximum displacement error and temporal stability metric for various ML methods in ball impact scenario. Values indicate each method's performance in predicting displacement accuracy and maintaining simulation stability, crucial for engineering design decisions.

Table 1 Maximum displacement error and temporal stability metric for various ML methods in ball impact scenario. Values indicate each method’s performance in predicting displacement accuracy and maintaining simulation stability, crucial for engineering design decisions.

Fine-resolution models achieve better accuracy in maximum displacement predictions but suffer from poor temporal stability, while coarse-resolution models offer stability but at the cost of reduced accuracy.

When comparing U-Net and FNO models, the performance varies depending on the resolution. In the fine-resolution case, U-Net outperforms FNO with a lower maximum displacement error (1.2% vs 4.9%). However, for coarse-resolution models, FNO slightly outperforms U-Net (7.4% vs 7.7% error). Both fine-resolution models exhibit high instability, while their coarse-resolution counterparts are stable.

Notably, multi-resolution approaches show promise in balancing accuracy and stability, particularly for U-Net, which achieves high stability without compromising accuracy. This suggests a potential path forward for future research. The importance of temporal stability for long-term predictions is evident, often rivaling initial accuracy in significance.

Ultimately, the choice between U-Net and FNO models, as well as between fine and coarse resolutions, depends on the specific requirements of the application, weighing the need for accuracy against the demand for stability. These findings underscore the potential for further exploration of multi-resolution techniques to develop improved models that maintain both high accuracy and stability across diverse simulation scenarios.

Figure 2 is a visual output from U-Net for two different designs. We adjusted the color scale to intentionally saturate the high-deformation areas around the main ball depression zone. This adjustment allows for better visualization of the temporal wave propagations throughout the material.

Figure 3 is another visual that establishes a color scale that emphasizes the maximum depression zone, which is our primary area of interest. This focus allows us to precisely identify the values and locations of potential failure points, indicating where design modifications may be necessary. Figure 3 also illustrates the reduced resolution of the ML models. This reduction serves two purposes: it optimizes RAM usage and it facilitates the conversion from unstructured mesh data to structured arrays. This conversion is essential for the convolution operations performed in neural networks.

Figure 2a - The first of two examples of the output from a trained U-Net. The color scale is setup to ensure the wave propagation of the displacement magnitude is readily visible which causes the main depression zone to be saturated in color.

Figure 2a – The first of two examples of the output from a trained U-Net. The color scale is setup to ensure the wave propagation of the displacement magnitude is readily visible which causes the main depression zone to be saturated in color.

Figure 2b - The second of two examples of the output from a trained U-Net. The color scale is setup to ensure the wave propagation of the displacement magnitude is readily visible which causes the main depression zone to be saturated in color.

Figure 2b – The second of two examples of the output from a trained U-Net. The color scale is setup to ensure the wave propagation of the displacement magnitude is readily visible which causes the main depression zone to be saturated in color.

Figure 3 The figure compares z-component displacement predictions for a through-thickness cross-section at the impact point. From bottom to top: FNO, U-Net, and ground truth results. The color scale highlights the main depression zone, showing nearly identical predictions to the ground truth, with minor discretization error due to voxelization.

Figure 3 The figure compares z-component displacement predictions for a through-thickness cross-section at the impact point. From bottom to top: FNO, U-Net, and ground truth results. The color scale highlights the main depression zone, showing nearly identical predictions to the ground truth, with minor discretization error due to voxelization.

Scaling on AWS

Figure 4 AWS Architecture used for generating data and training ML models

Figure 4 AWS Architecture used for generating data and training ML models

The development and deployment of this predictive model required a robust computational pipeline. We used several essential AWS services to streamline the process and enable these large-scale simulations.

  1. AWS Batch for scalable simulation runs:
    1. We used MOOSE to generate the FEM simulations required to train the machine learning models (over 30TB of data was generated).
    2. AWS Batch enabled scalable and on-demand execution of thousands of simulation jobs, ensuring efficient resource utilization (we used more than 150,000 CPUs).
    3. Batch allowed dynamic allocation of compute resources based on job demands, which reduced our wait times for simulation completion.
    4. By using Batch job dependencies, we created pipelines that switched between containers for each subtask. For instance, we ran a MOOSE simulation in one container, then switched to a different container with the SEACAS post-processing engine, which requires a separate environment. Once the jobs were complete, Batch loaded a final, lightweight container that aggregated the data, processed voxelization, and ensured the data was ready for training.
  2. Data management and storage:
    1. We stored the raw simulation data in Amazon EFS, which allowed for secure, scalable, and cost-effective storage of large datasets.
    2. When training across a distributed network, EFS allowed any GPU (for ML training) to directly access the data without the need for downloads from network locations. Because our task are not I/O-bound, EFS served as an ideal solution. This provided a highly parallelizable file system, allowing for efficient data access during training and eliminated the need for more specialized high-performance systems like Lustre.
  3. ML Training:
    1. Each training job used 56 GPUs.
    2. We trained the machine learning models, including U-Nets and FNOs, on AWS Batch, leveraging its ability to run containerized, distributed training jobs using multi-node for scalable computation.

Why choose AWS Batch over other services such as Amazon ParallelCluster? In our situation, MOOSE, the MOOSE post-processor SEACAS, and PyTorch each required significantly different non-trivial installations and dependencies, including underlying drivers. While it’s possible to develop an Amazon machine Image (AMI) to support all these simultaneously, the developers of each software provide managed containers. By using managed containers, we no longer needed to worry about keeping the software up to date or ensuring driver compatibility.

AWS Batch and Amazon ParallelCluster each have their own strengths. With AWS Batch, we need to do some additional configuration to enable Elastic Fabric Adapter (EFA), which is important for running tightly coupled codes in the cloud. Although it required an initial effort to learn, this one-time cost enabled us to take advantage of the managed containers available for the software Batch streamlined the deployment and utilization of these containers from that point on.

Conclusion

This study demonstrates an incremental improvement in ball-impact simulations using machine learning, achieving prediction speeds up to 10,000 times faster than traditional FEM methods. Our multi-resolution approach, combining fine and coarse models, balanced accuracy with long-term stability – a critical advancement for reliable impact predictions.

By using AWS services, particularly AWS Batch and EFS, we efficiently managed some truly massive datasets and distributed complex computational tasks across large fleets of CPUs and GPUs.

This work opens exciting avenues for future research in complex materials and geometries, with far-reaching implications for industries like automotive, aerospace, and consumer electronics. As we continue to push boundaries, we invite collaboration to further advance this field and reshape how we approach complex design challenges.

If you want to request a proof of concept or if you have feedback on the AWS tools, please reach out to us at ask-hpc@amazon.com.

Ross Pivovar

Ross Pivovar

Ross has over 15 years of experience in a combination of numerical and statistical method development for both physics simulations and machine learning. Ross is a Senior Solutions Architect at AWS focusing on development of self-learning digital twins, multi-agent simulations, and physics ML surrogate modeling.

Fei Chen

Fei Chen

Fei Chen has 15 years of industry experience leading teams in developing and productizing AI and machine learning at scale. At AWS, she leads the worldwide solution teams in advanced compute, including AI accelerators, HPC, IoT, Visual & Spatial Compute, and emerging technology focusing on technical innovations in AI and generative AI.

Katragadda Raghunath

Katragadda Raghunath

Raghunath is a PE from Amazon devices organization specializing in device design and computational methods. He has over 15 years of experience in a combination of physics simulations and system design. He has experience across consumer electronics, materials and automotive industries.

Vidyasagar Ananthan

Vidyasagar Ananthan

Vidyasagar specializes in high performance computing, numerical simulations, optimization techniques and software development across industrial and academic environments. At AWS, Vidyasagar is a Senior Solutions Architect developing predictive models and simulation technologies.