Posted On: Jan 27, 2023

We are excited to announce the preview of ml.p4de.24xlarge instances for deploying machine learning (ML) models for inference on Amazon SageMaker.

The ml.p4de.24xlarge instances have 80 GB of memory per GPU (640 GB total) along with support for up to 8 TB of local NVMe SSD storage. This enables high performance machine learning inference of compute intensive workloads on SageMaker such as large language models, and generative AI models. These instances have 96 vCPUs, 1152 GiBs of instance memory, and 400 Gbps of network bandwidth. 

You can use ml.P4de instances in US East (N. Virginia), and US West (Oregon). 

To get access to the preview, simply request for a limit increase through AWS Service Quotas. For pricing information on these instances, please visit our pricing page. For more information on deploy models with SageMaker see the overview here and the documentation here. To learn more about P4de instances in general, please visit the P4 product page.