Amazon EC2 P4d Instances
Highest performance for ML training and HPC applications in the cloud
Amazon EC2 P4d instances deliver the highest performance for machine learning (ML) training and high performance computing (HPC) applications in the cloud. P4d instances are powered by the latest NVIDIA A100 Tensor Core GPUs and deliver industry-leading high throughput and low latency networking. These instances are the first in the cloud to support 400 Gbps instance networking. P4d instances provide up to 60% lower cost to train ML models, including an average of 2.5x better performance for deep learning models compared to previous generation P3 and P3dn instances.
Amazon EC2 P4d instances are deployed in hyperscale clusters called EC2 UltraClusters that are comprised of the highest performance compute, networking, and storage in the cloud. Each EC2 UltraCluster is one of the most powerful supercomputers in the world, enabling customers to run their most complex multi-node ML training and distributed HPC workloads. Customers can easily scale from a few to thousands of NVIDIA A100 GPUs in the EC2 UltraClusters based on their ML or HPC project needs.
Researchers, data scientists, and developers can leverage P4d instances to train ML models for use cases such as natural language processing, object detection and classification, and recommendation engines, as well as run HPC applications such as pharmaceutical discovery, seismic analysis, and financial modeling. Unlike on-premises systems, customers can access virtually unlimited compute and storage capacity, scale their infrastructure based on business needs, and spin up a multi-node ML training job or a tightly coupled distributed HPC application in minutes, without any setup or maintenance costs.
High Scale ML Training and HPC with EC2 P4d UltraClusters
EC2 UltraClusters of P4d instances combine high performance compute, networking, and storage into one of the most powerful supercomputers in the world. Each EC2 UltraCluster of P4d instances comprises more than 4,000 of the latest NVIDIA A100 GPUs, Petabit-scale non-blocking networking infrastructure, and high throughput low latency storage with FSx for Lustre. Any ML developer, researcher, or data scientist can spin up P4d instances in EC2 UltraClusters to get access to supercomputer-class performance with pay-as-you-go usage model to run their most complex multi-node ML training and HPC workloads.
For questions or assistance with EC2 UltraClusters, request help.
Reduce ML training time from days to minutes
With the latest generation NVIDIA A100 Tensor Core GPUs, each Amazon EC2 P4d instance delivers on average 2.5x better deep learning performance compared to previous generation P3 instances. EC2 UltraClusters of P4d instances enable everyday developers, data scientists, and researchers to run their most complex ML and HPC workloads by giving access to supercomputing-class performance without any upfront costs or long-term commitments. The reduced training time with P4d instances boosts productivity, enabling developers to focus on their core mission of building ML intelligence into business applications.
Run the most complex multi-node ML training with high efficiency
Developers can seamlessly scale to up to thousands of GPUs with EC2 UltraClusters of P4d instances. High throughput, low latency networking with support for 400 Gbps instance networking, Elastic Fabric Adapter (EFA), and GPUDirect RDMA technology, help rapidly train ML models using scale-out/distributed techniques. Elastic Fabric Adapter (EFA) uses the NVIDIA Collective Communications Library (NCCL) to scale to thousands of GPUs, and GPUDirect RDMA technology enables low latency GPU to GPU communication between P4d instances.
Lower the infrastructure costs for ML training and HPC
Amazon EC2 P4d instances deliver up to 60% lower cost to train ML models compared to P3 instances. Additionally, P4d instances are available for purchase as Spot Instances. Spot Instances take advantage of unused EC2 instance capacity and can lower your Amazon EC2 costs significantly with up to a 90% discount from On-Demand prices. With the lower cost of ML training with P4d instances, budgets can be reallocated to build more ML intelligence into business applications.
Easily get started and scale with AWS services
Deep learning AMIs and deep learning containers make it easy to deploy P4d deep learning environments in minutes as they contain the required deep learning framework libraries and tools. You can also easily add your own libraries and tools to these images. P4d instances support popular ML frameworks such as TensorFlow, PyTorch, and MXNet. Additionally, Amazon EC2 P4d instances are supported by major AWS services for ML, management, and orchestration such as Amazon SageMaker, Amazon Elastic Kubernetes Service (EKS), Amazon Elastic Container Service (ECS), AWS Batch, and AWS ParallelCluster.
Powered by NVIDIA A100 Tensor Core GPUs
NVIDIA A100 Tensor Core GPUs deliver unprecedented acceleration at scale for ML and high performance computing (HPC). NVIDIA A100’s third generation Tensor Cores accelerate every precision workload, speeding time to insight and time to market. Each A100 GPU offers over 2.5x the compute performance compared to the previous generation V100 GPU and comes with 40 GB of high-performance HBM2 GPU memory. NVIDIA A100 GPUs leverage NVSwitch GPU interconnect throughput so each GPU can communicate with every other GPU in the same instance at the same 600GB/s bidirectional throughput and with single-hop latency.
High performance networking
P4d instances provide 400 Gbps networking to help customers better scale out their distributed workloads such as multi-node training more efficiently with high throughput networking between P4d instances as well as between a P4d instance and storage services such as Amazon S3 and FSx for Lustre. Elastic Fabric Adapter (EFA) is a custom network interface designed by AWS to help scale ML and HPC applications to thousands of GPUs. To further reduce latency, EFA is coupled with NVIDIA GPUDirect RDMA to enable low latency GPU to GPU communication between servers with OS bypass.
High throughput, low latency storage
Customers can access PetaByte-scale high throughput, low latency storage with FSx for Lustre or virtually unlimited cost-effective storage with Amazon S3 at 400 Gbps speeds. For workloads that need fast access to large datasets, each P4d instance also includes 8TB NVMe-based SSD storage with 16 GigaBytes/sec read throughput.
Built on AWS Nitro System
The P4d instances are built on the AWS Nitro System, which is a rich collection of building blocks that offloads many of the traditional virtualization functions to dedicated hardware and software to deliver high performance, high availability, and high security while also reducing virtualization overhead.
Toyota Research Institute (TRI), founded in 2015, is working to develop automated driving, robotics, and other human amplification technology for Toyota.
“At TRI, we're working to build a future where everyone has the freedom to move,” said Mike Garrison, Technical Lead, Infrastructure Engineering at TRI. "The previous generation P3 instances helped us reduce our time to train machine learning models from days to hours and we are looking forward to utilizing P4d instances, as the additional GPU memory and more efficient float formats will allow our machine learning team to train with more complex models at an even faster speed."
"At TRI-AD, we're working to build a future where everyone has the freedom to move and explore with a focus on reducing vehicle injuries and fatalities using adaptive driving and smart city. Through the use of Amazon EC2 P4d instances, we were able to reduce our training time for object recognition by 40% compared to previous generation GPU instances without any modification to existing codes." said Junya Inada, Director of Automated Driving (Recognition) at TRI-AD.
Jack Yan, Senior Director of Infrastructure Engineering at TRI-AD said, "Through the use of Amazon EC2 P4d instances, we were able to instantly reduce our cost to train compared to previous generation GPU instances enabling us to increase the number of teams working on model training. The networking improvements in P4d allowed us to efficiently scale to dozens of instances which gave us significant agility to rapidly optimize, retrain, and deploy models in test cars or simulation environments for further testing."
GE Healthcare is a leading global medical technology and digital solutions innovator. GE Healthcare enables clinicians to make faster, more informed decisions through intelligent devices, data analytics, applications and services, supported by its Edison intelligence platform.
“At GE Healthcare, we provide clinicians with tools that help them aggregate data, apply AI and analytics to that data and uncover insights that improve patient outcomes, drive efficiency and eliminate errors,” said Karley Yoder, VP & GM, Artificial Intelligence. “Our medical imaging devices generate massive amounts of data that need to be processed by our data scientists. With previous GPU clusters, it would take days to train complex AI models, such as Progressive GANs, for simulations and view the results. Using the new P4d instances reduced processing time from days to hours. We saw two- to three-times greater speed on training models with various image sizes, while achieving better performance with increased batch size and higher productivity with a faster model development cycle.”
OmniSci is a pioneer in accelerated analytics. The OmniSci platform is used in business and government to find insights in data beyond the limits of mainstream analytics tools.
“At Omnisci, we’re working to build a future where Data Science and Analytics converge to break down and fuse data silos. Customers are leveraging their massive amounts of data that may include location and time to build a full picture of not only what is happening, but when and where through granular visualization of spatial temporal data. Our technology enables seeing both the forest and the trees.” said (Ray Falcione), VP of US Public Sector at Omnisci. “Through the use of Amaon EC2 P4d instances, we were able reduce the cost to deploy our platform significantly compared to previous generation GPU instances thus enabling us to cost-effectively scale massive data sets. The networking improvements on A100 has increased our efficiencies in how we scale to billions of rows of data and enabled our customers to glean insights even faster.”
Zenotech Ltd is redefining engineering online through the use of HPC Clouds delivering on demand licensing models together with extreme performance benefits by leveraging GPUs.
“At Zenotech we are developing the tools to enable designers to create more efficient and environmentally friendly products. We work across industries and our tools provide greater product performance insight through the use of large scale simulation.” said Jamil Appa, Director of Zenotech. “The use of AWS P4d instances, enables us to run our simulations 3.5 times faster compared to the previous generation of GPUs. This speed up cuts our time to solve significantly allowing our customers to get designs to market faster or to do higher fidelity simulations than were previously possible.”
Aon is a leading global professional services firm providing a broad range of risk, retirement and health solutions. Aon PathWise is a GPU-based and scalable HPC risk management solution that insurers and re-insurers, banks, and pension funds can use to address today’s key challenges such as hedge strategy testing, regulatory and economic forecasting, and budgeting.
“At PathWise Solutions Group LLC, our product allows insurance companies, reinsurers and pension funds to access next generation technology to rapidly solve today’s key insurance challenges such as machine learning, hedge strategy testing, regulatory and financial reporting, business planning and economic forecasting, and new product development and pricing” said Peter Phillips, President and CEO, PathWise Solutions Group. "Through the use of Amazon EC2 P4d instances we are able to deliver amazing improvements in speed for single and double precision calculations over previous generation GPU instances for the most demanding calculations, allowing new range of calculations and forecasting to be done by clients for the very first time. Speed matters,” says Phillips, “and we continue to deliver meaningful value and the latest technology to our customers thanks to the new instances from AWS.”
Comprised of radiology and AI experts, Rad AI builds products that maximize radiologist productivity, ultimately making healthcare more widely accessible and improving patient outcomes.
“At Rad AI, our mission is to increase access to and quality of healthcare, for everyone. With a focus on medical imaging workflow, Rad AI saves radiologists time, reduces burnout, and enhances accuracy,” said Doktor Gurson, Co-founder of Rad AI. “We use AI to automate radiology workflows and help streamline radiology reporting. With the new EC2 P4d instances, we’ve seen faster inference and the ability to train models 2.4x faster, with higher accuracy than on previous generation P3 instances. This allows faster, more accurate diagnosis, and greater access to high quality radiology services provided by our customers across the US.“
|Instance Size||vCPUs||Instance Memory (GB)||GPU – A100||Network Bandwidth||GPUDirect RDMA||GPU Peer to Peer||Local Instance Storage||EBS Bandwidth||On-demand Price/hr||1-yr Reserved Instance Effective Hourly *||3-yr Reserved Instance Effective Hourly *|
|p4d.24xlarge||96||1152||8||400 Gbps ENA AND EFA||Yes||600 GB/s NVSwitch||8 x 1 TB NVMe SSD||19 Gbps||$32.77||$19.22||$11.57|
Amazon EC2 P4d instances are available in the US East (N. Virginia), US West (Oregon) and Europe (Ireland) regions. Customers can purchase P4d instances as On-Demand Instances, Reserved Instances, Spot Instances, or as part of Savings Plan.
Getting Started with Amazon EC2 P4d instances for machine learning
Using Amazon SageMaker
Using AWS Deep Learning AMI or Deep Learning Containers
Getting started with Amazon EC2 P4d instances for high performance computing
Amazon EC2 P4d instances are an ideal platform to run engineering simulations, computational finance, seismic analysis, molecular modeling, genomics, rendering, and other GPU-based high performance computing (HPC) workloads. HPC applications often require high network performance, fast storage, large amounts of memory, high compute capabilities, or all of the above. P4d instances support Elastic Fabric Adapter (EFA) that enables HPC applications using the Message Passing Interface (MPI) to scale to thousands of GPUs. AWS Batch and AWS ParallelCluster enable HPC developers to quickly build and scale distributed HPC applications.