Skip to main content

Amazon EC2

Amazon EC2 P6e UltraServers and P6 instances

The highest GPU performance for AI training and inference

Why Amazon EC2 P6e UltraServers and P6 instances?

Amazon Elastic Compute Cloud (Amazon EC2) P6e UltraServers, accelerated by NVIDIA GB200 NVL72, offer the highest GPU performance in Amazon EC2. P6e-GB200 features over 20x the compute and over 11x the memory under NVIDIA NVLinkTM compared to P5en instances. These UltraServers are ideal for the most compute-and-memory-intensive AI workloads, such as training and deploying frontier models at the multi-trillion-parameter scale.

Amazon EC2 P6 instances, accelerated by NVIDIA Blackwell and Blackwell Ultra GPUs, are an ideal option for medium-to-large-scale training and inference applications. P6-B200 instances offer up to 2x the performance compared to P5en instances for AI training and inference while P6-B300 instances deliver high performance for large-scale AI training and inference. These instances are well suited for sophisticated models such as mixture of experts (MoE) and reasoning models with trillions of parameters.

P6e UltraServers and P6 instances enable faster training for next-generation AI models and improve performance for real-time inference in production. You can use P6e UltraServers and P6 instances to train frontier foundation models (FMs) such as MoE and reasoning models and deploy them in generative and agentic AI applications such as content generation, enterprise copilots, and deep research agents.

Benefits

P6e UltraServers

With P6e-GB200 UltraServers, customers can access up to 72 Blackwell GPUs within one NVLink domain to use 360 petaflops of FP8 compute (without sparsity) and 13.4 TB of total high- bandwidth memory (HBM3e). P6e-GB200 UltraServers provide up to 130 terabytes per second of low-latency NVLink connectivity between GPUs and up to 28.8 terabits per second of total Elastic Fabric Adapter networking (EFAv4) for AI training and inference. This UltraServer architecture on P6e-GB200 enables customers to leverage a step change improvement in compute and memory, with up to 20x GPU TFLOPS, 11x GPU memory, and 15x aggregate GPU memory bandwidth under NVLink compared to P5en.

P6 instances

P6-B300 instances provide 8x NVIDIA Blackwell Ultra GPUs with 2.1 TB high bandwidth GPU memory, 6.4 Tbps EFA networking, 300 Gbps dedicated ENA throughput, and 4 TB of system memory. P6-B300 instances deliver 2x networking bandwidth, 1.5x GPU memory size, and 1.5x GPU TFLOPS (at FP4, without sparsity) compared to P6-B200 instances. These improvements make P6-B300 instances well suited for large-scale ML training and inference.

P6-B200 instances provide 8x NVIDIA Blackwell GPUs with 1440 GB of high- bandwidth GPU memory, 5th Generation Intel Xeon Scalable processors (Emerald Rapids), 2 TiB of system memory, up to 14.4 TBp/s of total bidirectional NVLink bandwidth, and 30 TB of local NVMe storage. These instances feature up to 2.25x GPU TFLOPs, 1.27x GPU memory size, and 1.6x GPU memory bandwidth compared to P5en instances.

 

P6e UltraServers and P6 instances are powered by the AWS Nitro System with specialized hardware and firmware designed to enforce restrictions so that no one, including anyone at AWS, can access your sensitive AI workloads and data. The Nitro System, which handles networking, storage, and other I/O functions, can deploy firmware updates, bug fixes, and optimizations while it remains operational. This increases stability and reduces downtime, which is critical to meeting training timelines and running AI applications in production.

To enable efficient distributed training, P6e UltraServers and P6 instances use fourth- generation Elastic Fabric Adapter networking (EFAv4). EFAv4 uses Scalable Reliable Datagram (SRD) protocol to intelligently route traffic across multiple network paths to maintain smooth operation even during congestion or failures.

P6e UltraServers and P6 instances are deployed in Amazon EC2 UltraClusters, which enable scaling up to tens of thousands of GPUs within a petabit-scale nonblocking network.

Features

Each NVIDIA Blackwell GPU found in P6-B200 instances features a second-generation Transformer Engine and supports new precision formats such as FP4. It supports fifth- generation NVLink, a faster, wider interconnect delivering up to 1.8 TBp/s of bandwidth per GPU.

The Grace Blackwell Superchip, a key component of P6e-GB200, connects two high- performance NVIDIA Blackwell GPUs and an NVIDIA Grace CPU using the NVIDIA NVLink-C2C interconnect. Each Superchip delivers 10 petaflops of FP8 compute (without sparsity) and up to 372 GB of HBM3e. With the superchip architecture, 2 GPUs and 1 CPU are co-located within one compute module, increasing bandwidth between GPU and CPU by an order of magnitude compared to current generation P5en instances.

The NVIDIA Blackwell Ultra GPUs powering P6-B300 instances deliver a 2x increase in network bandwidth, 1.5x increase in GPU memory, and up to 1.5x FP4 compute improvements (without sparsity) in effective TFLOPs compared to P6-B200 instances.

P6e UltraServers and P6 instances provide 400 GB ps per GPU of EFAv4 networking for a total of 28.8 Tbps per P6e-GB200 UltraServer and 3.2 Tbps per P6-B200 instance.

P6-B300 instances offer 6.4 Tbps networking bandwidth, 2x compared to P6-B200 instances due to PCle Gen6, and are designed for large-scale distributed deep learning model training.

P6e UltraServers and P6 instances support Amazon FSx for Lustre file systems so you can access data at hundreds of GBp/s of throughput and millions of IOPS required for large-scale AI training and inference. P6e UltraServers support up to 405 TB of local NVMe SSD storage while P6 instances support up to 30 TB of local NVMe SSD storage for fast access to large datasets. You can also use virtually unlimited cost-effective storage with Amazon Simple Storage Service (Amazon S3).

Product Details

Instance types

Instance Size
Blackwell GPUs
GPU memory (GB)
vCPUs
System memory (GiB)
Instance storage (TB)
Network bandwidth (Gbps)
EBS bandwidth (Gbps)
Available in EC2 UltraServers
p6-b300.48xlarge

8 Ultra

2,144 HBM3e

192

4,096

8 x 3.84

6.4

100

No

p6-b200.48xlarge

8

1,432 HBM3e

192

2,048

8 x 3.84

3.2

100

No

p6e-gb200.36xlarge

4

740 HBM3e

144

960

3 x 7.5

3.2

60

Yes*

*P6e-GB200 instances are only available in UltraServers

UltraServer types

Instance Size
Blackwell GPUs
GPU memory (GB)
vCPUs
System memory (GiB)
UltraServer Storage (TB)
Aggregate EFA bandwidth (Gbps)
EBS bandwidth (Gbps)
Available in EC2 UltraServers
u-p6e-gb200x72

72

13,320

2,592

17,280

405

28,800

1,080

Yes

u-p6e-gb200x36

36

6,660

1,296

8,640

202.5

14,400

540

Yes

Getting started with ML use cases

Amazon SageMaker is a fully managed service for building, training, and deploying ML models. With Amazon SageMaker HyperPod, you can more easily scale to tens, hundreds, or thousands of GPUs to train a model quickly at any scale without worrying about setting up and managing resilient training clusters. (P6e-GB200 support coming soon)

AWS Deep Learning AMIs (DLAMI) provides ML practitioners and researchers with the infrastructure and tools to accelerate DL in the cloud, at any scale. AWS Deep Learning Containers are Docker images preinstalled with DL frameworks to streamline the deployment of custom ML environments by letting you skip the complicated process of building and optimizing your environments from scratch.

If you prefer to manage your own containerized workloads through container orchestration services, you can deploy P6e-GB200 UltraServers and P6-B200 instances with Amazon Elastic Kubernetes Service (Amazon EKS) or Amazon Elastic Container Service (Amazon ECS).

P6e UltraServers will also be available through NVIDIA NVIDA DGX Cloud, a fully managed environment with NVIDIA’s complete AI software stack. With NVIDIA DGX Cloud you get NVIDIA’s latest optimizations, benchmarking recipes, and technical expertise.

Learn more

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages