Amazon EC2 UltraServers

AI training and inference at scale

Why Amazon EC2 UltraServers?

Amazon Elastic Compute Cloud (Amazon EC2) UltraServers are ideal for customers seeking the highest AI training and inference performance for models at the trillion-parameter scale. UltraServers connect multiple EC2 instances using a dedicated, high-bandwidth, low-latency accelerator interconnect enabling you to leverage a tightly-coupled mesh of accelerators across EC2 instances, and access significantly more compute and memory than standalone EC2 instances.

EC2 UltraServers are ideal for the largest models that require more memory and more memory bandwidth than standalone EC2 instances can provide. The UltraServer design uses the intra-instance accelerator connectivity to connect multiple instances into one node, unlocking new capabilities. For inference, UltraServers help deliver industry-leading response time to create the best real-time experiences. For training, UltraServers boost model training speed and efficiency with faster collective communication for model parallelism as compared to standalone instances. EC2 UltraServers support EFA networking and when deployed in EC2 UltraClusters enable scale-out distributed training across tens of thousands of accelerators on a single petabit scale, non-blocking network. By delivering higher performance for both training and inference, UltraServers accelerate your time to market and help you deliver real-time applications powered by the most performant, next-generation foundation models.

Benefits

UltraServers enable efficient training and inference of models with hundreds of billions to trillions of parameters by linking a larger set of accelerators with a high-bandwidth, low-latency interconnect to deliver more compute and memory than standalone EC2 instances.

UltraServers enable real-time inference for ultra-large models that demand substantial memory and memory bandwidth resources beyond what a single EC2 instance can offer.

UltraServers enable faster collective communication for model parallelism as compared to standalone instances, helping you reduce your time to train.

Features

You can launch instances into an UltraServer and leverage a dedicated, high-bandwidth, and low-latency accelerator interconnect across these instances. UltraServers enable access to a larger number of accelerators connected with this dedicated interconnect, delivering significantly more compute and memory in a single node than standalone EC2 instances.

EC2 UltraServers deployed in EC2 UltraClusters are interconnected with petabit-scale EFA networking to improve performance for distributed training workloads.

You can use EC2 UltraServers together with high-performance storage solutions such as Amazon FSx for Lustre, fully managed shared storage built on the most popular high-performance parallel file system. You can also use virtually unlimited cost-effective storage with Amazon Simple Storage Service (Amazon S3).

EC2 UltraServers are built on the AWS Nitro System, a rich collection of building blocks that offloads many of the traditional virtualization functions to dedicated hardware and software. Nitro delivers high performance, high availability, and high security, reducing virtualization overhead.

Instances supported

Trn2 instances

Powered by AWS Trainium2 chips, Trn2 instances in a Trn2 UltraServer configuration (available in preview) enable you to scale up to 64 Trainium2 chips connected with NeuronLink, the dedicated high- bandwidth, low-latency interconnect for AWS AI chips. Trn2 UltraServers provide breakthrough performance in Amazon EC2 for generative AI training and inference.

Learn more