AWS Trainium

Get high performance for deep learning and generative AI training while lowering costs

Why AWS Trainium?

AWS Trainium is the second-generation machine learning (ML) accelerator that AWS purpose built for deep learning training of 100B+ parameter models. Each Amazon Elastic Compute Cloud (EC2) Trn1 instance deploys up to 16 AWS Trainium accelerators to deliver a high-performance, low-cost solution for deep learning (DL) training in the cloud. Although the use of deep learning and generative AI is accelerating, many development teams are limited by fixed budgets, which puts a cap on the scope and frequency of training needed to improve their models and applications. Trainium based EC2 Trn1 instances solve this challenge by delivering faster time to train while offering up to 50% cost-to-train savings over comparable Amazon EC2 instances. Trainium has been optimized for training natural language processing, computer vision, and recommender models used in a broad set of applications, such as text summarization, code generation, question answering, image and video generation, recommendation, and fraud detection.

AWS Neuron SDK helps developers train models on the AWS Trainium accelerators (and deploy them on AWS Inferentia accelerators). It integrates natively with popular frameworks, such as PyTorch and TensorFlow, so that you can continue to use your existing code and workflows and train on Trainium accelerators.

Benefits of AWS Trainium

Each Trainium accelerator includes two second-generation NeuronCores that are purpose built for deep learning algorithms. To support efficient data and model parallelism, each Trainium accelerator has 32 GB of high-bandwidth memory, delivers up to 190 TFLOPS of FP16/BF16 compute power, and features NeuronLink, an intra-instance, ultra-high-speed nonblocking interconnect technology.

The AWS Neuron SDK, which supports Trainium, is natively integrated with PyTorch and TensorFlow. This ensures that you can continue using your existing workflows in these popular frameworks and get started with Trainium with only a few lines of code changes. For distributed model training, the Neuron SDK supports libraries, such as Megatron-LM and PyTorch Fully Sharded Data Parallel (FSDP). To get started quickly with Trainium powered EC2 Trn1 instances, see popular model examples in the Neuron documentation.

To deliver high performance while meeting accuracy goals, Trainium is optimized for FP32, TF32, BF16, FP16, UINT8, and the new configurable FP8 (cFP8) data type.
To support the fast pace of DL innovation and generative AI, Trainium has several innovations that makes it flexible and extendable to train constantly evolving DL models. Trainium has hardware optimizations and software support for dynamic input shapes. To allow support for new operators in the future, it supports custom operators written in C++. It also supports stochastic rounding, a method for rounding probabilistically to achieve high performance and higher accuracy compared to legacy rounding modes.
Trn1 instances powered by Trainium are up to 25% more energy efficient for deep learning training than comparable accelerated computing EC2 instances. Trn1 instances help you meet your sustainability goals when training ultra-large models.

Videos

Behind the scenes look at Generative AI infrastructure at Amazon
Accelerate deep learning and innovate faster with AWS Trainium
Introducing Amazon EC2 Trn1 instances powered by AWS Trainium