Introducing Amazon SageMaker ml.p3dn.24xlarge instances, optimized for distributed machine learning with up to 4x the network bandwidth of ml.p3.16xlarge instances

Posted On: Oct 4, 2019

Amazon SageMaker now supports ml.p3dn.24xlarge, the most powerful P3 instance optimized for machine learning applications. This instance provides faster networking, which helps remove data transfer bottlenecks and optimizes the utilization of GPUs to deliver maximum performance for training deep learning models.

The ml.p3dn.24xlarge instances provide up to 100 Gbps of networking throughput, 96 custom Intel® Xeon® Scalable (Skylake) vCPUs, 8 NVIDIA® V100 Tensor Core GPUs with 32 GB of memory each, 300 GB/s NVLINK GPU interconnect, and 1.8 TB of local NVMe-based SSD storage. Compared to the next largest P3 instance, the 4X increase in network throughput, coupled with faster processors and local NVMe-based SSD storage, will enable developers to efficiently distribute their machine learning training jobs across several ml.p3dn.24xlarge instances and remove data transfer and preprocessing bottlenecks.

Below is a comparison of how Amazon SageMaker ml.p3dn.24xlarge instances compare to existing Amazon SageMaker ML P3 instances.

ML Instance Type	GPUs - Tesla V100	GPU Peer to Peer	GPU Memory (GB)	vCPUs	Memory (GB)	Network Bandwidth	EBS Bandwidth	Local Instance Storage
ml.p3.2xlarge	1	N/A	16	8 (Broadwell)	61	Up to 10 Gbps	1.5 Gbps	N/A
ml.p3.8xlarge	4	NVLink	64	32 (Broadwell)	244	10 Gbps	7 Gbps	N/A
ml.p3.16xlarge	8	NVLink	128	64 (Broadwell)	488	25 Gbps	14 Gbps	N/A
ml.p3dn.24xlarge	8	NVLink	256	96 (Skylake)	768	100 Gbps	14 Gbps	2 x 900 GB NVMe SSD

Amazon SageMaker ml.p3dn.24xlarge instances are available in the US East (N. Virginia) and US West (Oregon) AWS regions. With these instances customers can use the 1.8 TB of local NVMe-based SSD storage eliminating the need for creating and paying for additional ml storage volumes. Visit Amazon SageMaker documentation to learn more about using local NVMe-based SSD storage on this instance type. Visit the P3 page to learn more about how the P3 instances are being used by AWS customers.

Introducing Amazon SageMaker ml.p3dn.24xlarge instances, optimized for distributed machine learning with up to 4x the network bandwidth of ml.p3.16xlarge instances

Ending Support for Internet Explorer