AWS Partner Network (APN) Blog
Unleash Supercomputing Power with HPC-NOW: An Open-Source HPC Solution on AWS
By Zhenrong WANG, Founder and CEO – Shanghai HPC-NOW Technologies Co., Ltd
By Weiping Liu, Sr. Partner, SA – AWS
HPC-NOW |
BACKGROUNDS
In today’s digital age, computing power has emerged as a vital resource alongside traditional natural resources like water, electricity, and gas. By harnessing the power of high-performance computing (HPC) platforms, organizations can centralize and efficiently allocate computing resources, driving innovation across a wide range of fields such as scientific computing and engineering simulations to financial modeling, life sciences, and other key scenarios. As the demand for increased capacity, scalability, and efficiency continues to soar, an increasing number of customers are turning to cloud-based solutions to build their own HPC clusters. This approach not only provides access to vast computing resources but also offers the flexibility and agility to scale up or down as needed, ensuring optimal resource utilization and cost-effectiveness.
However, customer expectations continue to evolve beyond mere resource provisioning. They seek to further liberate themselves from the complexities of infrastructure setup, enabling them to flexibly choose the types of resources and locations for cluster deployment, while focusing their efforts on driving business growth. We’ve frequently encountered universities and research institutions lacking dedicated IT operations staff in the past. As a result, they sought a tool to rapidly deploy and easily manage high-performance computing clusters. Although their expertise did not primarily lie in IT infrastructure principles, these customers demonstrated proficiency on Linux, seamlessly executing diverse operations with efficiency through the command line interface. They were even willing to modify code to optimize workflow efficiency. Undoubtedly, the emergence of cloud computing has greatly assisted these customers in reducing infrastructure buildout workloads. However, as a long-standing provider of high-performance computing technical services, Shanghai HPC-NOW Technologies Co., Ltd continually pondered how to better address customer requirements in close partnership with AWS. This led us to launch the open-source HPC-NOW project.
In this post, we will introduce the innovative architecture, key features, and major workflow of this open-source project on AWS, and further illustrate its benefits through a real-world customer success story.
AWS ARCHITECTURE AND FEATURES
This project offers an open-source cluster management tool called hpcorpr, a local Command Line Interface (CLI) tool, to meet customer requirements. In developing this tool, we collaborated closely with AWS to summarize our past work experience and create templates for deploying HPC clusters. This templating approach abstracts away the underlying cloud platform resources, empowering customers to create the clusters they require with just a single command from their operating system. Through dedicated efforts over time, hpcorpr now supports various x86_64 client platforms, including Microsoft Windows, GNU/Linux distros, and macOS. For customers whose internal security policies prohibit installing third-party tools on laptops, they can launch an Amazon EC2 instance of the T instance family on AWS, install hpcorpr there, and manage their clusters remotely.
Figure 1 – “HPC-NOW” on AWS Architecture
The “hpcopr” provides a unified interface along with the following features:
- Multi-cluster management: Users can manage multiple clusters across different AWS Regions and Availability Zones, fully utilizing the scalability and flexibility offered by AWS. “HPC-NOW” also provides utilities for accounting and logging purposes.
- Full Lifecycle management of Clusters: Users can easily import cloud credentials, initialize clusters, scale them up or down, scale out or in, hibernate or resume, and terminate clusters as needed.
- Cluster connectivity: Users can quickly connect to their cloud clusters via Secure Shell (SSH), NICE DCV, and Remote Desktop Protocol (RDP) without manually managing keys or passwords.
- Data Management: Users can directly transfer their data from local devices to the cloud or from the cloud to their local file systems. “HPC-NOW” supports cloud file systems such as Amazon EFS and object storage Service such as Amazon S3.
- Application Management: “HPC-NOW” offers a built-in application repository with various packages for diverse HPC applications across different domains and industries. These packages include compilers, mathematical libraries, and scientific/engineering software such as Lammps, WRF, WPS, Gromacs, and OpenFOAM, among others.
- Cluster User Management: Every cloud-based cluster supports multiple users. With “hpcopr”, administrators can perform create, read, update, and delete (CRUD) operations to cluster users.
- HPC Job Management: “HPC-NOW” integrates a job scheduler within each cloud-based cluster which allows users to utilize “hpcopr” for submitting, listing, and canceling their jobs or workloads.
THE JOURNEY ON AWS WITH HPC-NOW
AWS offers high performance, availability, scalability, and quality of service, enabling HPC users to overcome the performance and scalability bottlenecks of on-premises solutions, making AWS an excellent choice for HPC.
With HPC-NOW, users can embark on a seamless HPC journey on AWS.
I. The Workflow
The main workflow of “HPC-NOW” on AWS is as follows:
- Install “HPC-NOW” to the local x86_64 device (or an EC2 instance)
- Import the AWS credentials to the local registry using `hpcopr new-cluster`
- Initialize the registered cluster and check the connectivity with `hpcopr init`
- Deploy HPC applications and upload data using `hpcopr appman`, and `hpcopr dataman`
- Run and manage HPC workloads with `hpcopr jobman`
- Process results and relevant data using `hpcopr rdp` and `hpcopr ssh`
- Deprovision the cluster and resources with `hpcopr destroy`
II. The AWS Resources
After a cluster is created on AWS, it typically includes the following resources:
- 1 VPC and 2 subnets, one for public access, the other for private resources
- 1 internet gateway (IGW) and 2 route tables along with associations
- 5 security groups to provide network security controls for different resources
- 3 EC2 instances for managing the core infrastructure of the cluster
- 2 EFS storages, one for applications, the other for HPC data
- 1 S3 bucket for data storage and transfer
- 2 AWS IAM entities for securely authorizing operations.
- Several EC2 instances with Amazon EBS volumes as the compute node(s) of the cluster. The number of compute node(s) can be pre-configured.
CUSTOMER CASE AND BENEFITS
Computational Fluid Dynamics (CFD) has found widespread adoption across various research domains, including aerodynamics, hydraulics, energy, geoscience, and marine sciences. One research team embarked on a project focused on analyzing and predicting the hydrodynamics of waves.
For this project, the researchers leveraged OpenFOAM, an open-source CFD solution with specialized solvers, to simulate and examine the hydrodynamic processes. The simulations demanded large-scale computing power, involving thousands of CPU cores, and high-speed capabilities that exceeded the capacity of their on-premises HPC cluster.
By utilizing “HPC-NOW” on AWS, the customer could scale their computing resources tenfold, achieving higher resolution simulations. This approach not only reduced the runtime by 50% but also eliminated the costs associated with purchasing and maintaining the physical clusters.
SUMMARY
This post introduces “HPC-NOW” – an open-source, integrated solution that allows HPC practitioners to efficiently manage and scale their HPC workloads on AWS. Developed by the AWS, this solution streamlines the process of adopting and leveraging the high performance, scalability, elasticity, and quality of service offered by AWS cloud.
For more information about the HPC-NOW project, please refer this homepage.
HPC-NOW – AWS Partner Spotlight
HPC-NOW is an AWS Technology Partner focusing on providing open-source high-performance computing solutions for scientific and engineering research. It aims to bridge the excellence of AWS cloud to the HPC end-users in various industries – from bioscience, energy & power, chemical, atmosphere & environment, to aeronautics & aerodynamics, deep learning & AI, and more…