Amazon EC2 P5 Instances
Highest performance GPU-based instances for deep learning and HPC applications
Amazon Elastic Compute Cloud (Amazon EC2) P5 instances, powered by the latest NVIDIA H100 Tensor Core GPUs, deliver the highest performance in Amazon EC2 for deep learning (DL) and high performance computing (HPC) applications. They help you accelerate your time to solution by up to 6x compared to previous-generation GPU-based EC2 instances, and reduce cost to train ML models by up to 40%. P5 instances help you iterate on your solutions at a faster pace and get to market more quickly. You can use P5 instances for training and deploying increasingly complex large language models (LLMs) and diffusion models powering the most demanding generative artificial intelligence (AI) applications. These applications include question answering, code generation, video and image generation, and speech recognition. You can also use P5 instances to deploy demanding HPC applications at scale for pharmaceutical discovery, seismic analysis, weather forecasting, and financial modeling.
To deliver these performance improvements and cost savings, P5 instances complement NVIDIA H100 Tensor Core GPUs with 2x higher CPU performance, 2x higher system memory, and 4x higher local storage as compared to previous-generation GPU-based instances. They provide market-leading scale-out capabilities for distributed training and tightly coupled HPC workloads with up to 3,200 Gbps of networking using second-generation Elastic Fabric Adapter (EFAv2). To deliver large-scale compute at low latency, P5 instances are deployed in Amazon EC2 UltraClusters that enable scaling up to 20,000 H100 GPUs. These are interconnected with a petabit-scale nonblocking network. P5 instances in EC2 UltraClusters deliver up to 20 exaflops of aggregate compute capability—performance equivalent to a supercomputer.
Train 100B+ parameter models at scale
P5 instances can train ultra-large generative AI models at scale and deliver up to 6x the performance of previous-generation GPU-based EC2 instances.
Reduce time to solution and iterate faster
P5 instances reduce training times and time to solution from weeks to just a few days. This helps you iterate at a faster pace and get to market more quickly.
Lower your DL and HPC infrastructure costs
P5 instances deliver up to 40% savings on DL training and HPC infrastructure costs compared to previous-generation GPU-based EC2 instances.
Run distributed training and HPC with exascale compute
P5 instances provide up to 3,200 Gbps of EFAv2 networking. These instances are deployed in EC2 UltraClusters and deliver 20 exaflops of aggregate compute capability.
NVIDIA H100 Tensor Core GPUs
P5 instances provide up to 8 NVIDIA H100 GPUs with a total of up to 640 GB HBM3 GPU memory per instance. P5 instances support up to 900 GB/s of NVSwitch GPU interconnect (total of 3.6 TB/s bisectional bandwidth in each instance), so each GPU can communicate with every other GPU in the same instance with single-hop latency.
New transformer engine and DPX instructions
NVIDIA H100 GPUs have a new transformer engine that intelligently manages and dynamically chooses between FP8 and 16-bit calculations. This feature helps deliver faster DL training speedups on LLMs compared to previous-generation A100 GPUs. For HPC workloads, NVIDIA H100 GPUs have new DPX instructions that further accelerate dynamic programming algorithms as compared to A100 GPUs.
P5 instances deliver up to 3,200 Gbps of EFAv2 networking. EFAv2 delivers up to 50% improvement in collective communications performance for distributed training workloads. EFAv2 is also coupled with NVIDIA GPUDirect RDMA to enable low-latency GPU-to-GPU communication between servers with operating system bypass.
P5 instances support Amazon FSx for Lustre file systems so you can access data at the hundreds of GB/s of throughput and millions of IOPS required for large-scale DL and HPC workloads. Each P5 instance also supports up to 30 TB of local NVMe SSD storage for fast access to large datasets. You can also use virtually unlimited cost-effective storage with Amazon Simple Storage Service (Amazon S3).
Second-generation EC2 UltraClusters
P5 instances are deployed in second-generation EC2 UltraClusters, which provide a network fabric that enables greater scale, fewer network hops across the cluster, and lower latency than previous-generation UltraClusters. P5 instances in UltraClusters can scale up to 20,000 H100 GPUs interconnected with petabit-scale network and deliver 20 exaflops of aggregate compute capability.
Seamless integration with other AWS services
P5 instances can be deployed using AWS Deep Learning AMIs (DLAMI) and AWS Deep Learning Containers. They are available through managed services such as Amazon SageMaker, Amazon Elastic Kubernetes Service (Amazon EKS), Amazon Elastic Container Service (Amazon ECS), AWS Batch, and more.
Anthropic builds reliable, interpretable, and steerable AI systems that will have many opportunities to create value commercially and for public benefit.
"At Anthropic, we are working to build reliable, interpretable, and steerable AI systems. While the large general AI systems of today can have significant benefits, they can also be unpredictable, unreliable, and opaque. Our goal is to make progress on these issues and deploy systems that people find useful. Our organization is one of the few in the world that is building foundational models in DL research. These models are highly complex, and to develop and train these cutting-edge models, we need to distribute them efficiently across large clusters of GPUs. We are using Amazon EC2 P4 instances extensively today, and we are excited about the launch of P5 instances. We expect them to deliver substantial price-performance benefits over P4d instances, and they'll be available at the massive scale required for building next-generation LLMs and related products."
Tom Brown, Cofounder, Anthropic
Cohere, a leading pioneer in language AI, empowers every developer and enterprise to build incredible products with world-leading natural language processing (NLP) technology while keeping their data private and secure
"Cohere leads the charge in helping every enterprise harness the power of language AI to explore, generate, search for, and act upon information in a natural and intuitive manner, deploying across multiple cloud platforms in the data environment that works best for each customer. NVIDIA H100-powered Amazon EC2 P5 instances will unleash the ability of businesses to create, grow, and scale faster with its computing power combined with Cohere's state-of-the-art LLM and generative AI capabilities."
Aidan Gomez, CEO, Cohere
Hugging Face is on a mission to democratize good ML.
"As the fastest-growing open-source community for ML, we now provide over 150,000 pretrained models and 25,000 datasets on our platform for NLP, computer vision, biology, reinforcement learning, and more. With significant advances in LLMs and generative AI, we're working with AWS to build and contribute the open-source models of tomorrow. We're looking forward to using Amazon EC2 P5 instances via Amazon SageMaker at scale in UltraClusters with EFA to accelerate the delivery of new foundation AI models for everyone."
Julien Chaumond, CTO and Cofounder, Hugging Face
|Instance Size||vCPU||Instance Memory (TiB)||GPU - H100||GPU Memory||Network Bandwidth||GPUDirectRDMA||GPU Peer to Peer||Instance Storage (TB)||EBS Bandwidth (Gbps)|
640 GB HBM3
3200 Gbps EFAv2
900 GB/s NVSwitch
8 x 3.84 NVMe SSD
*Prices shown are for Linux/Unix in the US East (N. Virginia) AWS Region and rounded to the nearest cent. For full pricing details, see Amazon EC2 Pricing.
Getting started with P5 instances for ML
SageMaker is a fully managed service for building, training, and deploying ML models. When used together with P5 instances, you can more easily scale to tens, hundreds, or thousands of GPUs to train a model quickly at any scale without worrying about setting up clusters and data pipelines.
Using DLAMI or Deep Learning Containers
DLAMI provides ML practitioners and researchers with the infrastructure and tools to accelerate DL in the cloud, at any scale. Deep Learning Containers are Docker images preinstalled with DL frameworks to streamline the deployment of custom ML environments by letting you skip the complicated process of building and optimizing your environments from scratch.
Getting started with P5 instances for HPC
P5 instances are an ideal platform to run engineering simulations, computational finance, seismic analysis, molecular modeling, genomics, rendering, and other GPU-based HPC workloads. HPC applications often require high network performance, fast storage, large amounts of memory, high compute capabilities, or all of the above. P5 instances support EFAv2 that enables HPC applications using the Message Passing Interface (MPI) to scale to thousands of GPUs. AWS Batch and AWS ParallelCluster help HPC developers quickly build and scale distributed HPC applications.
Learn more »