AWS Trainium
AWS Trainium is the second-generation machine learning (ML) accelerator that AWS purpose built for deep learning training of 100B+ parameter models. Each Amazon Elastic Compute Cloud (EC2) Trn1 instance deploys up to 16 AWS Trainium accelerators to deliver a high-performance, low-cost solution for deep learning (DL) training in the cloud. Although the use of deep learning is accelerating, many development teams are limited by fixed budgets, which puts a cap on the scope and frequency of training needed to improve their models and applications. Trainium based EC2 Trn1 instances solve this challenge by delivering faster time to train while offering up to 50% cost-to-train savings over comparable Amazon EC2 instances. Trainium has been optimized for training natural language processing, computer vision, and recommender models used in a broad set of applications, such as text summarization, code generation, question answering, image and video generation, recommendation, and fraud detection.
Benefits
Purpose built for high-performance deep learning training
Each Trainium accelerator includes two second-generation NeuronCores that are purpose built for deep learning algorithms. To support efficient data and model parallelism, each Trainium accelerator has 32 GB of high-bandwidth memory, delivers up to 190 TFLOPS of FP16/BF16 compute power, and features NeuronLink, an intra-instance, ultra-high-speed nonblocking interconnect technology.
Optimized for state-of-the-art models
Trainium has native support for a wide range of data types (FP32, TF32, BF16, FP16, UINT8, and configurable FP8). It supports hardware-accelerated stochastic rounding to deliver high performance and higher accuracy as compared to legacy rounding modes. Trainium also provides full-stack support for dynamic tensor shapes, control flow, and custom operators written in C++ to deliver flexible, future-proofed infrastructure for your training needs.
Native support for ML frameworks and libraries
The AWS Neuron SDK, which supports Trainium, is natively integrated with PyTorch and TensorFlow. This ensures that you can continue using your existing workflows in these popular frameworks and get started with Trainium with only a few lines of code changes. For distributed model training, the Neuron SDK supports libraries, such as Megatron-LM and PyTorch Fully Sharded Data Parallel (FSDP). To get started quickly with Trainium powered EC2 Trn1 instances, see popular model examples in the Neuron documentation.
AWS Neuron SDK
AWS Neuron is an SDK consisting of a compiler, runtime, and profiling tools that you can use to run high-performance training on AWS Trainium powered Amazon EC2 Trn1 instances. By using Neuron, you can use your existing workflows in popular frameworks, such as TensorFlow and PyTorch, and train optimally on EC2 Trn1 instances with minimal code changes. Neuron comes preconfigured in AWS Deep Learning AMIs (DLAMI) and AWS Deep Learning Containers, making it easy to get started with Trn1 instances.
AWS Inferentia
AWS Inferentia is an AWS-designed ML inference accelerator that delivers high performance and the low cost ML inference in the cloud. Amazon EC2 Inf1 instances that are based on AWS Inferentia accelerators deliver up 2.3x higher throughput and up to 70% lower cost per inference than comparable Amazon EC2 instances.