AWS Trainium

Get high performance for deep learning and generative AI training while lowering costs

Why Trainium?

AWS Trainium is the machine learning (ML) chip that AWS purpose built for deep learning (DL) training of 100B+ parameter models. Each Amazon Elastic Compute Cloud (Amazon EC2) Trn1 instance deploys up to 16 Trainium accelerators to deliver a high-performance, low-cost solution for DL training in the cloud. Although use of DL and generative AI is accelerating, many development teams have fixed budgets, limiting the scope and frequency of training needed to improve their models and applications. Trainium-based Amazon EC2 Trn1 instances solve this challenge by delivering faster time to train while offering up to 50% cost-to-train savings over comparable EC2 instances. Trainium has been optimized for training natural language processing, computer vision, and recommender models used in a broad set of applications, such as text summarization, code generation, question answering, image and video generation, recommendation, and fraud detection.

AWS Neuron SDK helps developers train models on Trainium accelerators (and deploy them on AWS Inferentia accelerators). It natively integrates with popular frameworks, such as PyTorch and TensorFlow, so that you can continue to train on Trainium accelerators and use your existing code and workflows.

Benefits of Trainium

Trainium-powered Trn1 instances deliver high performance while reducing training costs by up to 50% over other comparable Amazon EC2 instances. Each Trainium accelerator includes two second-generation NeuronCores that are purpose built for DL algorithms. To support efficient data and model parallelism, each Trainium accelerator has 32 GB of high-bandwidth memory, delivers up to 190 TFLOPS of FP16/BF16 compute power, and features NeuronLink, an intra-instance, ultra-high-speed nonblocking interconnect technology.

The AWS Neuron SDK, which supports Trainium, is natively integrated with PyTorch and TensorFlow. This ensures that you can continue using your existing workflows in these popular frameworks and get started with Trainium with only a few lines of code changes. For distributed model training, the Neuron SDK supports libraries such as Megatron-LM and PyTorch Fully Sharded Data Parallel (FSDP). To quickly get started with Trainium-powered Amazon EC2 Trn1 instances, see popular model examples in the Neuron documentation.

To deliver high performance while meeting accuracy goals, Trainium is optimized for FP32, TF32, BF16, FP16, UINT8, and the new configurable FP8 (cFP8) data type.
To support the fast pace of DL innovation and generative AI, Trainium has several innovations that make it flexible and extendable to train constantly evolving DL models. Trainium has hardware optimizations and software support for dynamic input shapes. To allow support for new operators in the future, it supports custom operators written in C++. It also supports stochastic rounding, a method for probabilistically rounding to achieve high performance and higher accuracy compared to legacy rounding modes.
Trn1 instances powered by Trainium are up to 25% more energy efficient for DL training than comparable accelerated computing EC2 instances. Trn1 instances help you meet your sustainability goals when training ultra-large models.

Videos

Behind the scenes look at generative AI infrastructure at Amazon
Accelerate DL and innovate faster with AWS Trainium
Introducing Amazon EC2 Trn1 instances powered by AWS Trainium