Skip to main content

AWS AI Chips

AWS Trainium

Trainium3, our first 3nm AWS AI chip purpose built to deliver the best token economics for next gen agentic, reasoning, and video generation applications

Why Trainium?

AWS Trainium is a family of purpose-built AI accelerators — Trn1, Trn2, and Trn3 — designed to deliver scalable performance and cost efficiency for training and inference across a broad range of generative AI workloads

The AWS Trainium Family

Trainium1

The first-generation AWS Trainium chip powers Amazon Elastic Compute Cloud (Amazon EC2) Trn1 instances, which have up to 50% lower training costs than comparable Amazon EC2 instances. Many customers, including Ricoh, Karakuri, SplashMusic, and Arcee AI, are realizing performance and cost benefits of Trn1 instances.

Trainium2

AWS Trainium2 chip delivers up to 4x the performance of first-generation Trainium. Trainium2-based Amazon EC2 Trn2 instances and Trn2 UltraServers, are purpose-built for generative AI and offer 30-40% better price performance than GPU-based EC2 P5e and P5en instances. Trn2 instances feature up to 16 Trainium2 chips, and Trn2 UltraServers feature up to 64 Trainium2 chips interconnected with NeuronLink, our proprietary chip-to-chip interconnect. You can use Trn2 instances and UltraServers to train and deploy the most demanding models including large language models (LLMs), multi-modal models, and diffusion transformers, to build a broad set of next-generation generative AI applications.

Trainium3

Trn3 UltraServers, powered by our fourth-generation AI chip, AWS Trainium3—AWS’s first 3 nm AI chip—are purpose-built to deliver the best token economics for next-generation agentic, reasoning, and video generation applications. Trn3 UltraServers deliver up to 4.4× higher performance, 3.9× higher memory bandwidth, and over 4× better energy efficiency compared to Trn2 UltraServers, providing the best price-performance for training and serving frontier-scale models, including reinforcement learning, Mixture-of-Experts (MoE), reasoning, and long-context architectures.

Each AWS Trainium3 chip provides 2.52 petaflops (PFLOPs) of FP8 compute, increases thememory capacity by 1.5x and bandwidth by 1.7x over Trainium2 to 144 GB of HBM3e memory, and 4.9 TB/s of memory bandwidth, Trainium3 is designed for both dense and expert-parallel workloads with advanced data types (MXFP8 and MXFP4) and improved memory-to-compute balance for real-time, multimodal, and reasoning tasks.

On Amazon Bedrock, Trainium3 is the fastest accelerator, delivering up to 3× faster performance than Trainium2 and 3× better power efficiency than any other accelerator on the service. In large-scale serving tests (e.g., GPT-OSS), Trn3 delivers over 5× higher output tokens per megawatt than Trn2 at similar latency per user, enabling more sustainable, higher throughput inference at scale.

Built for Developers

New Trainium3 based instances are built for AI researchers and powered by the AWS Neuron SDK, to unlock
breakthrough performance. 

With native PyTorch integration, developers can train and deploy without changing a single line of code. For AI
performance engineers, we’ve enabled deeper access to Trainium3, so developers can fine-tune performance,
customize kernels, and push your models even further. Because innovation thrives openness, we are committed
to engaging with our developers through opensource tools and resources. 

To learn more, visit Amazon EC2 Trn3 instances, explore AWS Neuron SDK, or sign up for preview access.

Benefits

Trn3 UltraServers feature the latest innovations in scale-up UltraServer technology, with NeuronSwitch-v1 for
faster all-to-all collectives across up to 144 Trainium3 chips. In aggregate, a single Trn3 UltraServer provides up
to 20.7 TB of HBM3e, 706 TB/s of memory bandwidth, and 362 FP8 PFLOPs, delivering up to 4.4× more
performance and over 4× better energy efficiency than Trn2 UltraServers. Trn3 provides the highest
performance at the lowest cost for training and inferencing with the latest 1T+ parameter MoE and reasoningtype models, and drives significantly higher throughput for GPT-OSS serving at scale compared to Trainium2-
based instances.

Trn2 UltraServers remain a high-performance, cost-effective option for generative AI training and inference of
models up to 1T parameters. Trn2 instances feature up to 16 Trainium2 chips, and Trn2 UltraServers feature
up to 64 Trainium2 chips connected with NeuronLink, a proprietary chip-to-chip interconnect.

Trn1 instances feature up to 16 Trainium chips and deliver up to 3 FP8 PFLOPs, 512 GB of HBM with 9.8 TB/s of
memory bandwidth, and up to 1.6 Tbps of EFA networking.

Built for Research and Experimentation

AWS Neuron SDK helps you extract the full performance from Trn3, Trn2 and Trn1 instances so you can focus on building and deploying models and accelerating your time to market. AWS Neuron integrates natively with , PyTorch Jax, and essential libraries like Hugging Face, vLLM, PyTorch Lightning and others. It optimizes models out of the box for distributed training and inference, while providing deep insights for profiling and debugging. AWS Neuron integrates with services such as Amazon SageMaker, Amazon SageMaker Hyerpod, Amazon Elastic Kubernetes Service (Amazon EKS), Amazon Elastic Container Service (Amazon ECS), AWS ParallelCluster, and AWS Batch, as well as third- party services like Ray (Anyscale), Domino Data Lab, and Datadog.

To deliver high performance while meeting accuracy goals, AWS Trainium supports a range of mixed precision
data types such as BF16, FP16, FP8, MXFP8 andMXFP4. To support the fast pace of innovation in generative AI,
Trainium2 and Trainium3 feature hardware optimizations for 4x sparsity (16:4), micro-scaling, stochastic
rounding, and dedicated collective engines.

Neuron enables developers to optimize their workloads using Neuron Kernel Interface (NKI)for kernel development. NKI exposes the full Trainium ISA, enabling complete control over instruction-levelprogramming, memory allocation, and execution scheduling. Along with building your own Kernels, developers can use the Neuron Kernel Library, which are open source, ready to deploy optimized kernels. And lastly, Neuron Explore provides full stack visibility, connecting to developers code down to engines in hardware.

Customers

Customers such as Databricks, Ricoh, Karakuri, SplashMusic and others, are realizing performance and cost benefits of Trn1 instances.

Customers including Anthropic, Databricks, Poolside, Ricoh, and NinjaTech AI are realizing significant performance and cost benefits on Trn1 and Trn2 instances.

Early adopters of Trn3 are achieving new levels of efficiency and scalability for the next generation of large-scale generative AI models.

Missing alt text value

Conquer AI performance, cost, and scale

AWS Trainium2 for breakthrough AI performance

AWS AI chips customer stories