Amazon EC2 UltraClusters
Run HPC and ML applications at scale
Why Amazon EC2 UltraClusters?
Amazon Elastic Compute Cloud (Amazon EC2) UltraClusters can help you scale to thousands of GPUs or purpose-built ML AI chips, such as AWS Trainium, to get on-demand access to a supercomputer. They democratize access to supercomputing-class performance for machine learning (ML), generative AI, and high performance computing (HPC) developers through a simple pay-as-you-go usage model without any setup or maintenance costs. Amazon EC2 instances that are deployed in EC2 UltraClusters include P6e-GB200, P6-B200, P5en, P5e, P5, P4d, Trn2, and Trn1 instances.
EC2 UltraClusters consist of thousands of accelerated EC2 instances that are co-located in a given AWS Availability Zone and interconnected using Elastic Fabric Adapter (EFA) networking in a petabit-scale nonblocking network. EC2 UltraClusters also provide access to Amazon FSx for Lustre, a fully managed shared storage built on the most popular high-performance, parallel file system to quickly process massive datasets on demand and at scale with sub-millisecond latencies. EC2 UltraClusters provide scale-out capabilities for distributed ML training and tightly coupled HPC workloads.
Benefits
Faster time to solution for distributed training and HPC
On-demand access to an exascale supercomputer
Flexibility to optimize performance and cost
Features
High-performance networking
High-performance storage
Instances and UltraServers supported
P6e-GB200 UltraServers
Accelerated by NVIDIA GB200 NVL72, P6e-GB200 instances in an UltraServer configuration offer the highest GPU AI training and inference performance in Amazon EC2.
P6-B200 instances
Amazon EC2 P6-B200 instances, accelerated by NVIDIA Blackwell GPUs, offer high- performance instances for AI training, inference, and HPC.
Trn2 instances and UltraServers
Powered by AWS Trainium2 AI chips, Trn2 instances offer up to 30 to 40% better price-performance over comparable GPU-based instances.
P5en, P5e, and P5 instances
Powered by NVIDIA H200 Tensor Core GPUs, P5en and P5e instances provide the high performance in Amazon EC2 for ML training and HPC applications. P5 instances are powered by NVIDIA H100 Tensor Core GPUs.
P4d instances
Powered by NVIDIA A100 Tensor Core GPUs, P4d instances provide high performance for ML training and HPC applications.
Trn1 instances
Powered by AWS Trainium AI chips, Trn1 instances are purpose built for high-performance ML training. They offer up to 50% cost-to-train savings over comparable EC2 instances.
Did you find what you were looking for today?
Let us know so we can improve the quality of the content on our pages