Announcing new Amazon EC2 P4d instances deployed in EC2 UltraClusters for highest performance ML training and HPC applications in the cloud

Posted On: Nov 2, 2020

We are excited to announce the availability of Amazon EC2 P4d instances, the next generation of GPU-based instances that provide the best performance for machine learning (ML) training and high performance computing (HPC) in the cloud for applications such as natural language processing, object detection and classification, seismic analysis, and genomics research. P4d instances are powered by the latest NVIDIA A100 Tensor Core GPUs and provide first in the cloud 400 Gbps instance networking with support for Elastic Fabric Adapter (EFA) and NVIDIA GPUDirect RDMA (remote direct memory access) to enable efficient scale-out of multi-node ML training and HPC workloads.

P4d instances deliver up to 60% lower cost to train and over 2.5x better deep learning performance with 2.5x the memory, twice the double precision floating point performance, 16x network bandwidth, and 4x local NVMe-based SSD storage compared to previous generation P3 instances.

P4d instances are deployed in hyperscale clusters, called EC2 UltraClusters, providing more than 4,000 NVIDIA A100 GPUs, Petabit-scale non-blocking networking infrastructure, and high throughput, low latency storage with FSx for Lustre. These EC2 UltraClusters are one of the world’s top supercomputers, and democratize access to supercomputing for everyday developers, data scientists, and researchers without any setup or maintenance costs. Using these EC2 UltraClusters, developers can scale their multi-node ML training or HPC applications to thousands of GPUs to solve their most complex problems, or scale down to just a few instances, paying only for the instances they use.

Announcing Amazon EC2 P4d Instances

Amazon EC2 P4d instances are built on the AWS Nitro System, a collection of AWS-designed hardware and software innovations that enable the delivery of efficient, flexible, and secure cloud services with isolated multi-tenancy, private networking, and fast local storage.

The P4d instances are now available in the AWS US East (N. Virginia) and US West (Oregon) regions. They are available in the p4d.24xl size, providing 96 vCPUs, 8 NVIDIA A100 GPUs, 1.1 TB instance memory, 8 TB local NVMe-based SSD storage, 400 Gbps networking bandwidth with EFA and GPUDirect RDMA, and 19 Gbps EBS burst bandwidth. The P4d instances are purchasable On-Demand, as part of Savings Plans, as Reserved instances, or Spot instances.

Popular AWS services for ML and orchestration such as Amazon SageMaker, Amazon Elastic Kubernetes Service (EKS), Amazon Elastic Container Service (ECS), AWS ParallelCluster, and AWS Batch will be adding support for P4d instances in the coming weeks. Customers from Fortune 500 companies to startups, including Toyota Research Institute, GE Healthcare, and Aon PathWise participated in the preview program and are adopting P4d instances to cut time to train and reduce costs for training their ML models. To get started with Amazon EC2 P4d instances, visit the AWS Management Console, AWS Command Line Interface (CLI), and AWS SDKs. To learn more, visit the product overview page or the product details page.

Announcing new Amazon EC2 P4d instances deployed in EC2 UltraClusters for highest performance ML training and HPC applications in the cloud

Announcing Amazon EC2 P4d Instances

Ending Support for Internet Explorer