Why Amazon EC2 P5 Instances?
Amazon Elastic Compute Cloud (Amazon EC2) P5 instances, powered by NVIDIA H100 Tensor Core GPUs, and P5e and P5en instances powered by NVIDIA H200 Tensor Core GPUs deliver the highest performance in Amazon EC2 for deep learning (DL) and high performance computing (HPC) applications. They help you accelerate your time to solution by up to 4x compared to previous-generation GPU-based EC2 instances, and reduce cost to train ML models by up to 40%. These instances help you iterate on your solutions at a faster pace and get to market more quickly. You can use P5, P5e, and P5en instances for training and deploying increasingly complex large language models (LLMs) and diffusion models powering the most demanding generative artificial intelligence (AI) applications. These applications include question answering, code generation, video and image generation, and speech recognition. You can also use these instances to deploy demanding HPC applications at scale for pharmaceutical discovery, seismic analysis, weather forecasting, and financial modeling.
To deliver these performance improvements and cost savings, P5 and P5e instances complement NVIDIA H100 and H200 Tensor Core GPUs with 2x higher CPU performance, 2x higher system memory, and 4x higher local storage as compared to previous-generation GPU-based instances. P5en instances pair NVIDIA H200 Tensor Core GPUs with high performance Intel Sapphire Rapids CPU, enabling Gen5 PCIe between CPU and GPU. P5en instances provide up to 4x the bandwidth between CPU and GPU and lower network latency compared to P5e and P5 instances thereby improving distributed training performance. P5 and P5e instances support provide up to 3,200 Gbps of networking using second-generation Elastic Fabric Adapter (EFA). P5en, with third generation of EFA using Nitro v5, shows up to 35% improvement in latency compared to P5 that uses the previous generation of EFA and Nitro. This helps improve collective communications performance for distributed training workloads such as deep learning, generative AI, real-time data processing, and high-performance computing (HPC) applications. To deliver large-scale compute at low latency, these instances are deployed in Amazon EC2 UltraClusters that enable scaling up to 20,000 H100 or H200 GPUs interconnected with a petabit-scale nonblocking network. P5, P5e, and P5en instances in EC2 UltraClusters can deliver up to 20 exaflops of aggregate compute capability—performance equivalent to a supercomputer.
Amazon EC2 P5 Instances
Benefits
Features
Customer testimonials
Here are some examples of how customers and partners have achieved their business goals with Amazon EC2 P4 instances.
-
Anthropic
Anthropic builds reliable, interpretable, and steerable AI systems that will have many opportunities to create value commercially and for public benefit.
At Anthropic, we are working to build reliable, interpretable, and steerable AI systems. While the large general AI systems of today can have significant benefits, they can also be unpredictable, unreliable, and opaque. Our goal is to make progress on these issues and deploy systems that people find useful. Our organization is one of the few in the world that is building foundational models in DL research. These models are highly complex, and to develop and train these cutting-edge models, we need to distribute them efficiently across large clusters of GPUs. We are using Amazon EC2 P4 instances extensively today, and we are excited about the launch of P5 instances. We expect them to deliver substantial price-performance benefits over P4d instances, and they'll be available at the massive scale required for building next-generation LLMs and related products.
Tom Brown, Cofounder, Anthropic -
Cohere
Cohere, a leading pioneer in language AI, empowers every developer and enterprise to build incredible products with world-leading natural language processing (NLP) technology while keeping their data private and secure
Cohere leads the charge in helping every enterprise harness the power of language AI to explore, generate, search for, and act upon information in a natural and intuitive manner, deploying across multiple cloud platforms in the data environment that works best for each customer. NVIDIA H100-powered Amazon EC2 P5 instances will unleash the ability of businesses to create, grow, and scale faster with its computing power combined with Cohere's state-of-the-art LLM and generative AI capabilities.
Aidan Gomez, CEO, Cohere -
Hugging Face
Hugging Face is on a mission to democratize good ML.
As the fastest-growing open-source community for ML, we now provide over 150,000 pretrained models and 25,000 datasets on our platform for NLP, computer vision, biology, reinforcement learning, and more. With significant advances in LLMs and generative AI, we're working with AWS to build and contribute the open-source models of tomorrow. We're looking forward to using Amazon EC2 P5 instances via Amazon SageMaker at scale in UltraClusters with EFA to accelerate the delivery of new foundation AI models for everyone.
Julien Chaumond, CTO and Cofounder, Hugging Face
Product details
Instance Size | vCPUs | Instance Memory (TiB) | GPU | GPU memory | Network Bandwidth (Gbps) | GPUDirect RDMA | GPU Peer to Peer | Instance Storage (TB) | EBS Bandwidth (Gbps) |
---|---|---|---|---|---|---|---|---|---|
p5.48xlarge | 192 | 2 | 8 H100 | 640 GB HBM3 |
3200 Gbps EFA | Yes | 900 GB/s NVSwitch | 8 x 3.84 NVMe SSD | 80 |
p5e.48xlarge | 192 | 2 | 8 H200 | 1128 GB HBM3e |
3200 Gbps EFA | Yes | 900 GB/s NVSwitch | 8 x 3.84 NVMe SSD | 80 |
p5en.48xlarge | 192 | 2 | 8 H200 | 1128 GB HBM3e | 3200 Gbps EFA | Yes | 900 GB/s NVSwitch | 8 x 3.84 NVMe SSD | 100 |
Getting started with ML use cases
Getting started with HPC use cases
P5, P5e, and P5en instances are an ideal platform to run engineering simulations, computational finance, seismic analysis, molecular modeling, genomics, rendering, and other GPU-based HPC workloads. HPC applications often require high network performance, fast storage, large amounts of memory, high compute capabilities, or all of the above. All three instance types support EFA that enables HPC applications using the Message Passing Interface (MPI) to scale to thousands of GPUs. AWS Batch and AWS ParallelCluster help HPC developers quickly build and scale distributed HPC applications.
Learn more