AWS HPC Blog

Job queue snapshots: see what’s at the head of your queues in AWS Batch

Job queue snapshots: see what’s at the head of your queues in AWS Batch

AWS Batch just grew a neat new feature: Job queue snapshots. This gives you the visibility you need for managing throughput in a dynamic environment – with competing priorities – and across multiple queues and workloads. Today we give you the inside scoop on how this feature works – especially when you’re using fair share scheduling.

Large scale training with NeMo Megatron on AWS ParallelCluster using P5 instances

Large scale training with NVIDIA NeMo Megatron on AWS ParallelCluster using P5 instances

Launching distributed GPT training? See how AWS ParallelCluster sets up a fast shared filesystem, SSH keys, host files, and more between nodes. Our guide has the details for creating a Slurm-managed cluster to train NeMo Megatron at scale.