Posted On: Oct 17, 2023

We are pleased to announce the preview of ml.p5.48xlarge instances for deploying machine learning (ML) models for real-time and asynchronous inference on Amazon SageMaker.

With 80 GB of memory per NVIDIA H100 Tensor Core GPU (640 GB total), 30 TB of local NVMe SSD storage, 192 vCPUs, and 2 TiB of instance memory, ml.p5.48xlarge instances are built to enable high performance machine learning inference on compute intensive AI workloads like question answering, code generation, video and image generation, and speech recognition.

The ml.p5.48xlarge instances are now available for use on SageMaker in US East (N. Virginia) and US West (Oregon).

To get access to the preview, simply request a limit increase through AWS Service Quotas. For pricing information on these instances, please visit our pricing page. For more information on deploying models with SageMaker, see the overview here and the documentation here. To learn more about P5 instances in general, please visit the P5 product page.