AWS Partner Network (APN) Blog
Accelerating Enterprise Simulations on AWS with Fastone Compute Cloud-Enterprise Edition
By Meiwen Wang, Sr. Solution Consultant – Shanghai Fastone Information Technology
By Wei Dai, Partner Solutions Architect – AWS
By Xueyao Bai, Solutions Architect – AWS
Fastone |
An increasing number of simulations and calculations necessitate the real-time processing of substantial volumes of information, particularly in the computer aided engineering (CAE) and computational fluid dynamics (CFD) industries.
Remarkable progress has facilitated the migration of CAE/CFD simulation tasks to the cloud, thereby enabling multi-site collaboration, enhanced efficacy, and economic advantages in terms of cost reduction.
However, users still encounter a range of obstacles including effectively managing clusters, devising optimal scheduling strategies, fine-tuning performance, managing resources, and closely monitoring task progress.
In this post, we will share how to use the Fastone Compute Cloud-Enterprise Edition (Fastone FCC-E) to more efficiently handle large-scale CAE/CFD computations and simulations, while reliably leveraging cloud resources.
Shanghai Fastone Information Technology is an AWS Specialization Partner and AWS Marketplace Seller with the Manufacturing and Industrial Services Competency. Fastone delivers a readily available research and development (R&D) environment to help clients expedite application development across many industries, including biotech, electronic design automation, and financial technology.
Automating the Complexities of Cloud Simulations
Enterprises require considerable computational resources and storage capacity to successfully accomplish simulation tasks within a limited timeframe.
The advancement of cloud technology presents expandable computational and storage resources, an extensive array of instance alternatives, worldwide infrastructure, and virtually boundless utility. Fastone is a visual platform that can quickly turn all of this possibility into progress, requiring only a matter of hours to implement.
The Fastone platform leverages Amazon Web Services (AWS) to aid customers in conducting CAE/CFD simulations in the cloud in the following ways:
- Enhancing efficiency: This is achieved by employing techniques like parallel computing and task decomposition.
- Managing large-scale clusters: This is done with dynamic adjustment of resources within the cluster, ensuring the coherence of hardware and software environments.
- Selecting suitable scheduling strategies: For instance, evenly distributing jobs across nodes or dispatching them to a single node until its resources are fully utilized before scaling out to the next node.
- Optimizing the computational process: Done so by choosing the most appropriate instance type and performance for desired outcomes.
- Incorporating auto-scaling: Continuous integration and continuous delivery (CI/CD) mechanisms are leveraged for efficient utilization and allocation of cloud resources.
- Monitoring tasks in real time: Real-time monitoring allows for timely detection of potential issues and swift action to adjust resource allocation, optimize workloads, and ensure high availability and stability across the entire cluster. Monitoring can also be leveraged to generate historical data, providing valuable insights for capacity planning and future system optimizations.
Consider one customer’s structural analysis and simulation application as an illustrative example.
Figure 1 – Reference architecture diagram.
The application is primarily employed for complex finite element analysis and calculations, encompassing a wide range of fields such as mechanical, structural, fluid, and geological domains. It requires substantial computational and storage resources and parallel computing capabilities, as existing local resources are unable to adequately meet these demands.
To ensure architectural security and data flow security, with the rich instance resource backed by AWS, the Fastone FCC-E platform tackles this using the architecture depicted in Figure 1.
After logging in to the Fastone FCC-E web portal, select New Job in the FCC-E and adjust the desired number of CPU cores, specify the input file, and choose the appropriate instance type and number of data nodes.
Figure 2 – Reference job submission.
Subsequently, the Fastone FCC-E platform will seamlessly create the high-performance computing (HPC) cluster on AWS for the job through the API, and promptly initiate the job once the cluster is prepared.
What’s Behind the Automation?
Now, let’s examine more closely the automated process of building the cluster and submitting the CAE job for execution.
First, the Fastone FCC-E platform utilizes an infrastructure as code (IaC) tool to generate a comprehensive stack on AWS. This process employs a predefined YAML template that encompasses essential components such as key pairs, virtual private clouds (VPCs), subnets, security groups, and route tables.
Example YAML template indicates the automation process:
Next, the Fastone FCC-E platform retrieves configuration details from the job and determines the quantity and specifications of each node within the cluster. It then employs the AWS API to launch Amazon Elastic Compute Cloud (Amazon EC2) instances, utilizing the Fastone image that incorporates the necessary HPC cluster libraries, dependencies, and the installed CAE application.
Furthermore, the platform automatically updates the host file to facilitate seamless intercommunication between the nodes.
Node updates in hosts file:
Then, Fastone FCC-E constructs an HPC cluster on AWS by utilizing the Slurm.conf and Partitions.conf files.
Slurm.conf acts as the principal configuration file for a Slurm-based HPC cluster, containing a range of global configuration options that define and configure the behavior of the entire cluster. It specifies various aspects such as nodes, queues, resource limits, account management, and task scheduling policies.
Administrators have the flexibility to customize the cluster’s scale, performance parameters, permission controls, and other settings by editing the Slurm.conf file to align with specific application requirements.
Codes in slurm.conf file define the SPECs of a SLURM-based cluster:
Within the partitions.conf file, users can observe the definition of compute nodes and partitions, along with their detailed specifications. This file enables users to submit tasks to the relevant partitions, depending on their specific requirements.
It also outlines the characteristics of each partition, including name, available nodes, allocation policies, priorities, and other relevant attributes.
Codes in partitions.conf define the names and nodes of the partitions:
The Fastone FCC-E platform automatically submits the job to the cluster queue, and the execution of the job commences promptly once the compute nodes are prepared.
SLURM command to show the submitted job status in the queue:
Upon completion of the job, Fastone FCC-E automatically releases the allocated resources to prevent unnecessary costs. The output of the job becomes accessible on the platform’s file system. Downloading the output files requires approval from the system administrator.
Additional Benefits
Fastone FCC-E offers an intuitive billing management module that facilitates financial analysis, order details, and budget management, so users can effectively monitor and manage their expenses associated with the platform’s services.
Figure 3 – Intuitive billing status.
In addition, Fastone FCC-E provides users with integrated Secure Shell (SSH) and web virtual network computing(VNC) functionalities, so users are able to directly access the command line or desktop from the web portal.
Furthermore, for customers seeking an elevated user experience, Fastone FCC-E extends support for commercial virtual desktop infrastructure (VDI) solutions, including NICE DCV and Amazon WorkSpaces to keep the user experience consistent with their habits.
Figure 4 – Integrated SSH and VNC.
During task execution, users can monitor the job status and resource status in real-time through the monitoring and alerting modules provided by the platform. Alerts will be sent once there’s a job failure or a threshold is reached; for example, if memory is reaching 90% and instance type change is suggested.
Figure 5 – Reference dashboard.
In conjunction with the built-in monitoring and alerting system, the platform also offers a highly customizable monitoring and alerting platform that utilizes Prometheus and Grafana. This empowers users to conduct extensive and detailed data analysis, catering to their specific needs and requirements.
Figure 6 – Advanced dashboard.
Conclusion
This post demonstrated how the Fastone FCC-E platform automatically creates clusters, executes tasks, and releases resources within an AWS environment to support enterprise simulations on the cloud.
The Fastone platform can assist users in overcoming challenges related to enhancing efficiency, managing clusters, task scheduling, resource management, and monitoring.
Whether you’re a small startup or large enterprise, Fastone’s solution is designed to help automate workflows and streamline your simulations, all while providing top-notch performance and security. Take your simulations to the next level with Fastone FCC-E on AWS.
Learn more about Fastone FCC-E in AWS Marketplace.
Fastone – AWS Partner Spotlight
Shanghai Fastone Information Technology is an AWS Partner that delivers a readily available R&D environment to help clients expedite application development across many industries, including biotech, electronic design automation, and financial technology.