Posted On: Aug 4, 2023

Amazon SageMaker training jobs now support ml.p5 instances, powered by NVIDIA H100 chips, which are purpose built for high-performance ML training applications in the cloud. You can use ml.p5 instances on SageMaker to train some of the most demanding models. This includes large language models (LLMs) and diffusion models powering the most demanding generative AI applications. These applications include question answering, code generation, video and image generation, and speech recognition.

ml.p5 instances currently feature up to 8 of the latest NVIDIA H100 Tensor Core GPUs. P5 instances complement NVIDIA H100 Tensor Core GPUs with 2x higher CPU performance, 2x higher system memory, and 4x higher local storage as compared to previous-generation GPU-based instances. They provide market-leading scale-out capabilities for distributed training and tightly coupled HPC workloads with up to 3,200 Gbps of networking using second-generation Elastic Fabric Adapter (EFA) technology.

SageMaker Model Training supports ml.p5 instances today in the AWS US East (N. Virginia) and US West (Oregon) regions in the ml.p5.48xlarge size.

To read more about ml.p5 instances, visit the P5 instance page. To get started using ml.p5 instances sign into the Amazon SageMaker console. To learn more about Amazon SageMaker Model Training, visit our web page.