AWS HPC Blog
Tag: Slurm
Call for participation: HPC tutorial series from the HPCIC
Interested in getting hands-on experience with cutting-edge HPC tools? Check out this blog post on an upcoming virtual training series from @LLNL and @AWSCloud. Learn emerging technologies from the experts this August.
Large scale training with NVIDIA NeMo Megatron on AWS ParallelCluster using P5 instances
Launching distributed GPT training? See how AWS ParallelCluster sets up a fast shared filesystem, SSH keys, host files, and more between nodes. Our guide has the details for creating a Slurm-managed cluster to train NeMo Megatron at scale.
Build and deploy a 1 TB/s file system in under an hour
Want to set up a high-speed shared file system for your #HPC or #AI workloads in under an hour? Learn how with this new blog post.
Dynamic HPC budget control using a core-limit approach with AWS ParallelCluster
Balancing fixed budgets with fluctuating HPC needs is challenging. Discover a customizable solution for automatically setting weekly resource limits based on previous spending.
Slurm REST API in AWS ParallelCluster
Looking to integrate AWS ParallelCluster into an automated workflow? This post shows how to submit and monitor jobs programmatically with Slurm REST API (code examples included).
Introducing login nodes in AWS ParallelCluster
AWS ParallelCluster 3.7 now supports adding login nodes to your cluster, out of the box. Here, we’ll show you how to set this up, and highlight some important tunable options for tweaking the experience.
Financial services industry HPC migrations using AWS ParallelCluster with Slurm
In this post, we’ll walk you through how banks and other financial services firms migrate or burst their grid workloads onto AWS using AWS ParallelCluster and the Slurm scheduler.
Introducing a community recipe library for HPC infrastructure on AWS
Today we’re showing you our community library of HPC Recipes for AWS. It’s a public repo @github that will help you achieve feature-rich, reliable HPC deployments ready to run your workloads no matter where you’re starting from.
Customize Slurm settings with AWS ParallelCluster 3.6
With AWS ParallelCluster 3.6, you can directly specify Slurm settings in the cluster config file – improving reproducibility and another step towards self-documentation for your HPC infrastructure.
Multiple Availability Zones now supported in AWS ParallelCluster 3.4
In AWS ParallelCluster 3.4, you can now build HPC clusters that span multiple Amazon EC2 Availability Zones. In this post, we describe how the new feature works, how to use it, and some implications for cluster design that it raises.