AWS PCS now provides a production-ready Deep Learning AMI
Today, AWS Parallel Computing Service (AWS PCS) launches PCS-ready DLAMI, an AWS-maintained Amazon Machine Image built on the Deep Learning Base GPU AMI (Ubuntu 24.04). It provides a production-quality foundation for AI/ML training and high performance computing (HPC), with core infrastructure components pre-installed and tested for compatibility.
AWS PCS is a managed service that makes it easier for you to run and scale your HPC workloads and build scientific and engineering models on AWS using Slurm. You can use AWS PCS to build complete, elastic environments that integrate compute, storage, networking, and visualization tools. AWS PCS simplifies cluster operations with managed updates and built-in observability features, helping to remove the burden of maintenance. You can work in a familiar environment, focusing on your research and innovation instead of worrying about infrastructure.
The AMI inherits the operating system, NVIDIA GPU drivers, CUDA toolkit, EFA drivers, and Lustre client from the source Deep Learning Base GPU AMI, and adds PCS Agent, Slurm for PCS, and EFS utilities. Multiple supported Slurm versions are included, and the correct version activates automatically based on your cluster configuration. You can add frameworks, libraries, and application software on top to complete your environment. AWS releases updated AMIs regularly when the source DLAMI or PCS components are updated, providing ongoing security patches and driver updates.
AWS PCS-ready DLAMI is available for x86_64 and arm64 architectures at no additional cost in all AWS Regions where AWS PCS is available. To get started, specify a PCS-ready AMI when configuring your compute node groups. For more information, see Using PCS-ready DLAMI in the AWS PCS User Guide. For a reference cluster architecture that builds on PCS-ready DLAMI, see the awsome-distributed-ai repository on GitHub.