We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.
If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”
Customize cookie preferences
We use cookies and similar tools (collectively, "cookies") for the following purposes.
Essential
Essential cookies are necessary to provide our site and services and cannot be deactivated. They are usually set in response to your actions on the site, such as setting your privacy preferences, signing in, or filling in forms.
Performance
Performance cookies provide anonymous statistics about how customers navigate our site so we can improve site experience and performance. Approved third parties may perform analytics on our behalf, but they cannot use the data for their own purposes.
Allowed
Functional
Functional cookies help us provide useful site features, remember your preferences, and display relevant content. Approved third parties may set these cookies to provide certain site features. If you do not allow these cookies, then some or all of these services may not function properly.
Allowed
Advertising
Advertising cookies may be set through our site by us or our advertising partners and help us deliver relevant marketing content. If you do not allow these cookies, you will experience less relevant advertising.
Allowed
Blocking some types of cookies may impact your experience of our sites. You may review and change your choices at any time by selecting Cookie preferences in the footer of this site. We and selected third-parties use cookies or similar technologies as specified in the AWS Cookie Notice.
Your privacy choices
We display ads relevant to your interests on AWS sites and on other properties, including cross-context behavioral advertising. Cross-context behavioral advertising uses data from one site or app to advertise to you on a different company’s site or app.
To not allow AWS cross-context behavioral advertising based on cookies or similar technologies, select “Don't allow” and “Save privacy choices” below, or visit an AWS site with a legally-recognized decline signal enabled, such as the Global Privacy Control. If you delete your cookies or visit this site from a different browser or device, you will need to make your selection again. For more information about cookies and how we use them, please read our AWS Cookie Notice.
К сожалению, данный материал на выбранном языке не доступен. Мы постоянно работаем над расширением контента, предоставляемого пользователю на выбранном языке. Благодарим вас за терпение!
Amazon Elastic Compute Cloud (Amazon EC2) UltraClusters can help you scale to thousands of GPUs or purpose-built ML AI chips, such as AWS Trainium, to get on-demand access to a supercomputer. They democratize access to supercomputing-class performance for machine learning (ML), generative AI, and high performance computing (HPC) developers through a simple pay-as-you-go usage model without any setup or maintenance costs. Amazon EC2 instances that are deployed in EC2 UltraClusters include P5en, P5e, P5, P4d, Trn2, and Trn1 instances.
EC2 UltraClusters consist of thousands of accelerated EC2 instances that are co-located in a given AWS Availability Zone and interconnected using Elastic Fabric Adapter (EFA) networking in a petabit-scale nonblocking network. EC2 UltraClusters also provide access to Amazon FSx for Lustre, a fully managed shared storage built on the most popular high-performance, parallel file system to quickly process massive datasets on demand and at scale with sub-millisecond latencies. EC2 UltraClusters provide scale-out capabilities for distributed ML training and tightly coupled HPC workloads.
Benefits
Faster time to solution for distributed training and HPC
EC2 UltraClusters help you reduce training times and time-to-solution from weeks to just a few days. This helps you iterate at a faster pace and get your deep learning (DL), generative AI, and HPC applications to market more quickly.
On-demand access to an exascale supercomputer
EC2 UltraClusters consist of thousands of accelerated EC2 instances that are co-located in a given AWS Availability Zone and interconnected using Elastic Fabric Adapter (EFA) networking in a petabit-scale nonblocking network. They enable you to get on-demand access to several exaflops of accelerated compute.
Flexibility to optimize performance and cost
EC2 UltraClusters are supported on a growing list of EC2 instances and give you the flexibility to choose the right compute option to maximize performance while keeping costs under control for your workload.
Features
High-performance networking
EC2 instances deployed in EC2 UltraClusters are interconnected with EFA networking to improve performance for distributed training workloads and tightly coupled HPC workloads. P5en, P5e, P5, and Trn2 instances deliver up to 3,200 Gbps; Trn1 instances deliver up to 1,600 Gbps; and P4d instances deliver up to 400 Gbps of EFA networking. EFA is also coupled with NVIDIA GPUDirect RDMA (P5en, P5e, P5, P4d) and NeuronLink (Trn2, Trn1) to enable low-latency accelerator-to-accelerator communication between servers with operating system bypass.
High-performance storage
EC2 UltraClusters use FSx for Lustre, fully managed shared storage built on the most popular high-performance parallel file system. With FSx for Lustre, you can quickly process massive datasets on demand and at scale, and deliver sub-millisecond latencies. The low-latency and high-throughput characteristics of FSx for Lustre are optimized for DL, generative AI, and HPC workloads on EC2 UltraClusters. FSx for Lustre keeps the GPUs and AI chips in EC2 UltraClusters fed with data, accelerating the most demanding workloads. These workloads include large language model (LLM) training, generative AI inferencing, DL, genomics, and financial risk modeling. You can also get access to virtually unlimited cost-effective storage with Amazon Simple Storage Service (Amazon S3).
Instance supported
Trn2 Instances
Powered by AWS Trainium2 AI chips, Trn2 instances offer up to 30-40% better price-performance over comparable GPU-based instances.
Powered by AWS Trainium AI chips, Trn1 instances are purpose built for high-performance ML training. They offer up to 50% cost-to-train savings over comparable EC2 instances.