Amazon EC2 Trn1 instances

Best price performance for training deep learning models in the cloud

Amazon Elastic Compute Cloud (Amazon EC2) Trn1 instances will deliver the best price performance for training deep learning models in the cloud for use cases such as natural language processing (NLP), computer vision, search, recommendation, ranking, and more. Trn1 instances are powered by AWS Trainium, the second machine learning (ML) chip built by AWS that is optimized for high-performance deep learning training.

Trn1 instances support up to 16 AWS Trainium accelerators, up to 800 Gbps of Elastic Fabric Adapter (EFA) networking bandwidth, and 768 GB/s of ultra-high-speed, NeuronLink connectivity.

Trn1 instances are deployed in Amazon EC2 UltraClusters consisting of tens of thousands of Trainium accelerators to rapidly train even the most complex deep learning models with trillions of parameters.

Developers can get started quickly on Trn1 instances using the AWS Neuron SDK and easily train models using leading ML frameworks.

New Amazon EC2 Trn1 instances | Amazon Web Services (1:11)

Benefits

Best price performance for model training

Trn1 instances are powered by AWS Trainium accelerators that are purpose built for ML training to deliver the best price performance for training deep learning models in the cloud.

Reduce model training from months to days

Deploy Trn1 instances in EC2 UltraClusters to scale model training to 10,000+ accelerators interconnected with petabit-scale networking for the fastest ML training in Amazon EC2.

Ease of use

You can get started easily with Trn1 instances using the AWS Neuron SDK that comes integrated with leading ML frameworks such as PyTorch and TensorFlow, and continue using existing ML workflows with minimal code changes.

Maximized resource efficiency

Trn1 instances are built on the AWS Nitro System, a combination of dedicated hardware and lightweight hypervisor that provides you a rich collection of flexible building blocks to assemble the compute, storage, memory, and networking resources you need for better overall performance and security.

Features

AWS Trainium accelerators

Trn1 instances are powered by up to 16 AWS Trainium accelerators that have specific math engines for processing DL algorithms, making the accelerators more efficient than general-purpose GPUs for training deep learning models. Each accelerator delivers up to 210 trillion operations per second (TOPS) of compute power, supports 32 GB of high bandwidth memory (HBM2e), and features NeuronLink, an intra-instance ultra-high-speed, nonblocking interconnect of 768 GB/s.

High-performance networking and storage

Trn1 instances deliver up to 800 Gbps of high-performance networking. They also support Elastic Fabric Adapter (EFA), a custom network interface designed by AWS to improve scaling efficiency and deliver low latencies for faster training. Each Trn1 instance also supports up to 8 TB of local nonvolatile memory express solid-state drive (NVMe SSD) storage for fast workload access to large datasets.

Amazon EC2 UltraClusters

Trn1 instances are deployed in EC2 UltraClusters consisting of tens of thousands of Trainium accelerators interconnected with fully nonblocking petabit scale networking. Developers can access petabyte-scale high throughput, low latency storage with Amazon FSx for Lustre.

AWS Neuron SDK

Get started with Amazon EC2 Trn1 instances easily with the AWS Neuron SDK. The Neuron SDK consists of a compiler, framework extensions, a runtime library, and developer tools, natively integrated with ML frameworks, such as TensorFlow and PyTorch. You can use distributed training libraries, such as Megatron-ML and DeepSpeed, for efficient distributed model training. The Neuron SDK supports a large number of operators for state-of-the art natural language processing and computer vision models. Advanced developers can implement custom operators with C++.

Built on the AWS Nitro System

Trn1 instances are built on the AWS Nitro System, which offloads many of the traditional virtualization functions to dedicated hardware and software to deliver high performance, high availability, and high security while reducing virtualization overhead.

Customers

Anthropic
"At Anthropic we build reliable, interpretable, and steerable AI systems that will have many opportunities to create value commercially and for public benefit. Our research interests span multiple areas including natural language, human feedback, scaling laws, reinforcement learning, code generation, and interpretability. A major key to our success is access to modern infrastructure that allows us to spin up very large fleets of high performance deep learning accelerators. We are looking forward to using AWS Trainium, as its unprecedented ability to scale to tens of thousands of nodes and higher network bandwidth will enable us to iterate faster while keeping our costs under control."

Tom Brown, Co-founder at Anthropic

Sprinklr
"Sprinklr's natural language processing and computer vision ML models analyze different data formats sourced from publicly available social media posts, blog posts, video content, and other content available on public domains across more than 30 channels. Based on our value from using AWS Inferentia we are eager to try AWS Trainium to improve time to train and lower training costs for our models. We look forward to developing our models on these high performance, and low-cost training instances.”

Vasant Srinivasan, Senior Vice President of Product Engineering at Sprinklr

Get started with AWS

Sign up for an AWS account

Sign up for an AWS account

Instantly get access to the AWS Free Tier.

Learn with simple tutorials

Learn with 10-minute tutorials

Explore and learn with simple tutorials.

Start building with EC2 in the console

Start building in the console

Begin building with step-by-step guides to help you launch your AWS project.