AWS HPC Blog

Tag: Machine Learning

Deploying generative AI applications with NVIDIA NIMs on Amazon EKS

Deploying generative AI applications with NVIDIA NIMs on Amazon EKS

Learn how to deploy AI models at scale with @AWS using NVIDIA’s NIM and Amazon EKS! This step-by-step guide shows you how to create a GPU cluster for inference. Don’t miss part 1 of this 2-part blog series!

Large scale training with NeMo Megatron on AWS ParallelCluster using P5 instances

Large scale training with NVIDIA NeMo Megatron on AWS ParallelCluster using P5 instances

Launching distributed GPT training? See how AWS ParallelCluster sets up a fast shared filesystem, SSH keys, host files, and more between nodes. Our guide has the details for creating a Slurm-managed cluster to train NeMo Megatron at scale.