AWS Public Sector Blog

The Evolution of High Performance Computing

A guest blog by Jeff Layton, Principal Tech, AWS Public Sector

The High Performance Computing (HPC) world is evolving rapidly. New workloads, such as pattern recognition, speech, video, and text processing, speech and facial recognition, deep learning, machine learning, and genomic sequencing, are being executed on HPC systems. The main motivation behind this evolution is economic and technical. As HPC systems became more powerful, agile, and less costly, they can be used for applications that have never had access to high scale, low cost infrastructure.

The cloud has accelerated this evolution because it is scalable and elastic, allowing self-service provisioning of one to thousands of processors in minutes. As a result, HPC users are coming to AWS with new and expanding application requirements and are seeing reduced time-to-results, faster speed to deployment, greater architectural flexibility, and reduced costs. Cloud computing is pushing HPC at the pace of computing innovation as users benefit from advances in microprocessors, GPUs, networking, and storage.

The cloud and the evolving HPC world

The HPC world has a need for more processing capability, which is driving HPC system development. The current HPC architecture, the cluster, was created for a common architecture and operating system that had price-performance benefits far beyond proprietary systems. Clusters with commodity processors were then doing production work for a number of companies and labs, which led to the explosion of clusters in HPC.

Clusters have come a long way and have greatly increased access to HPC resources at an affordable price. This includes both embarrassingly parallel applications and tightly coupled applications.

Issues with traditional HPC fixed architectures

The HPC cluster architecture is a relatively fixed architecture with a set of servers (nodes). Each server has a small amount of internal storage (if any at all), connected by a dedicated network, using software tools to manage user requests for resources. It is rare for any changes to be made to the system, such as adding nodes, processor upgrades, additional node storage, network topology, or technology changes. Once put in place, the vast majority of dedicated cluster systems never change architecture.

The rise of the Hadoop architecture, which addresses a large class of HPC problems, makes this inflexibility an even greater challenge. The Hadoop architecture (also known as the Map-Reduce architecture) calls for nodes with a lot of local storage and only uses TCP networks. The typical on-premises HPC system uses the smallest, least expensive, but reliable drive in each node. For Hadoop workloads, customers often procure a separate system specifically designed for Hadoop workloads. Employing this strategy would create two HPC architectures with conflicting configurations. However, this is unnecessary when cloud computing is the platform, as both rely on commodity systems, dynamically created clusters, and software stacks that are purpose-built for the needs of particular problems.

The cloud allows you to go beyond thinking that HPC is only about clusters and that all applications must adapt that model. If you have a new architecture in mind for your application or your workflow, you can simply and easily create it in the cloud.

Do you want to use a combination of containers and microservices for your application? The AWS Cloud allows you to construct what you need with some very simple code. If the architecture doesn’t work as well as you wanted, then you just turn off the system and stop paying for it.

Learn more about HPC with AWS in this video below.

https://www.youtube.com/watch?v=g8LnNXtOONk

In future blogs, I’ll discuss some of the pain points of HPC beyond architectural rigidity and how the cloud addresses them. Stay tuned! In the meantime, learn more about HPC with AWS here: https://aws.amazon.com/hpc/