Amazon EC2 Trn1 Instances

High-performance, cost-effective training of generative AI models

Get started with Trn1 instances using AWS Neuron

Amazon Elastic Compute Cloud (EC2) Trn1 instances, powered by AWS Trainium chips, are purpose built for high-performance deep learning (DL) training of generative AI models, including large language models (LLMs) and latent diffusion models. Trn1 instances offer up to 50% cost-to-train savings over other comparable Amazon EC2 instances. You can use Trn1 instances to train 100B+ parameter DL and generative AI models across a broad set of applications, such as text summarization, code generation, question answering, image and video generation, recommendation, and fraud detection.

The AWS Neuron SDK helps developers train models on AWS Trainium (and deploy models on the AWS Inferentia chips). It integrates natively with frameworks such as PyTorch and TensorFlow, so that you can continue using your existing code and workflows to train models on Trn1 instances. To learn about the current Neuron support for machine learning (ML) frameworks and libraries, model architectures, and hardware optimizations, see the Neuron documentation.

Trn1n instances are now available

Trn1n instances double the network bandwidth (compared to Trn1 instances) to 1600 Gbps of Elastic Fabric Adapter (EFAv2). The increased bandwidth delivers up to 20% faster time-to-train relative to Trn1 for training network-intensive generative AI models, such as large language models (LLMs) and mixture of experts (MoE).

New Amazon EC2 Trn1 instances | Amazon Web Services (1:34)

Benefits

Reduce training times for 100B+ parameter models

Trn1 instances are purpose built for high-performance DL and reduce training times from months to weeks, or even days. With reduced training times, you can iterate faster, build more innovative models, and increase productivity. Trn1n instances deliver up to 20% faster time-to-train than Trn1 instances for models that benefit from increased network bandwidth.

Lower your fine-tuning and pre-training costs

Trn1 instances deliver high performance while offering up to 50% cost-to-train savings over other comparable Amazon EC2 instances.

Use your existing ML frameworks and libraries

Use the AWS Neuron SDK to extract the full performance of Trn1 instances. With Neuron, you can use popular ML frameworks like PyTorch and TensorFlow and continue to use your existing code and workflows to train models on Trn1 instances. To quickly get started with Trn1 instances, see popular model examples in the Neuron documentation.

Scale up to 6 exaflops with EC2 UltraClusters

Trn1 instances support up to 800 Gbps of second-generation Elastic Fabric Adapter (EFAv2) network bandwidth. Trn1n instances support up to 1600 Gbps of EFAv2 network bandwidth to deliver even higher performance for network-intensive models. Both instances are deployed in EC2 UltraClusters that enable scaling up to 30,000 Trainium chips, which are interconnected with a nonblocking petabit-scale network to provide 6 exaflops of compute performance.

How it works

Using AWS DLAMI
Using Amazon EKS
Using Amazon ECS
Using Amazon SageMaker

Using AWS DLAMI
Enlarge and read image description.

The first section titled “User application” shows the major methods that you can use to automatically launch the AWS Deep Learning AMI (DLAMI) and Amazon EC2 Trn1 instances: AWS Command Line Interface (AWS CLI), AWS Tools and SDKs, and AWS Cloud Control API. The second major method to launch a DLAMI from the web interface is the AWS Management Console.

Moving from the DLAMI, the next section shows the Amazon EC2 Trn1 instances that are launched from the selected DLAMI.

Another grouping shows a local terminal, EC2 remote terminal, and application script that can be used to update and manage a DLAMI to launch EC2 instances based on the updates.
Using Amazon EKS
Enlarge and read image description.

The first box shows how Amazon Elastic Kubernetes Service (EKS) is used to create Kubernetes clusters that are powered by Amazon EKS Distro.

After creating the clusters, you can deploy Trn1 or Trn1n worker nodes for your EKS cluster.

You can then run your training workloads on Kubernetes.
Using Amazon ECS
Enlarge and read image description.

In the first stage of the workflow, Amazon Elastic Container Registry (ECR) is used to build images and store them by using ECR or any other repository.

Next, you can use Amazon Elastic Container Service (ECS) to select the deep learning container (DLC) image for your workload.

Then, deploy your training workload on the Amazon EC2 Trn1 or Trn1n instance server.

Finally, use Amazon ECS to manage your containers.
Using Amazon SageMaker
Enlarge and read image description.

First, select Trn1 or Trn1n instances (ml.trn1 or ml.trn1n) as your SageMaker training option.

A per-second billing model is applied, and you pay for what you use. You can use distributed training libraries and the SageMaker Training Compiler to scale and boost performance.

Next, apply automation model tuning for hyperparameter optimization.

Then, interactivity and monitoring are achieved through debugging, profiling, and experiment management. Tune your cost with managed Spot training.

Finally, save the resulting model artifacts for your trained model in an Amazon S3 bucket.

Features

Up to 3 petaflops with AWS Trainium

Trn1 instances are powered by up to 16 AWS Trainium chips purpose built to accelerate DL training and deliver up to 3 petaflops of FP16/BF16 compute power. Each chip includes two second-generation NeuronCores.

Up to 512 GB high-bandwidth accelerator memory

To support efficient data and model parallelism, each Trn1 instance has 512 GB of shared accelerator memory (HBM) with 9.8 TB/s of total memory bandwidth.

High-performance networking and storage

To support training of network-intensive models, such as Mixture of Experts (MoE) and Generative Pre-Trained Transformers (GPT), each Trn1n instance delivers up to 1600 Gbps of EFAv2 networking bandwidth. Each Trn1 instance supports up to 800 Gbps of EFAv2 bandwidth. EFAv2 speeds up distributed training by delivering up to 50% improvement in collective communications performance over first-generation EFA. These instances also support up to 80 Gbps of Amazon Elastic Block Store (EBS) bandwidth and up to 8 TB of local NVMe solid state drive (SSD) storage for fast workload access to large datasets.

NeuronLink interconnect

For fast connectivity between Trainium chips and streamlined collective communications, Trn1 instances support up to 768 GB/s of NeuronLink, a high-speed, nonblocking interconnect.

Optimized for novel data types

To deliver high performance while meeting accuracy goals, Trn1 instances are optimized for FP32, TF32, BF16, FP16, UINT8, and the new configurable FP8 (cFP8) data type.

State-of-the-art DL optimizations

To support the fast pace of DL innovation and generative AI, Trn1 instances have several innovations that make them flexible and extendable to train constantly evolving DL models. Trn1 instances have hardware optimizations and software support for dynamic input shapes. To allow support for new operators in the future, they support custom operators written in C++. They also support stochastic rounding, a method for rounding probabilistically to achieve high performance and higher accuracy compared to legacy rounding modes.

Customers

More than 10,000 organizations worldwide — including Comcast, Condé Nast, and over 50% of the Fortune 500 — rely on the Databricks to unify their data, analytics and AI.

“Thousands of customers have implemented Databricks on AWS, giving them the ability to use MosaicML to pre-train, fine-tune, and serve foundation models for a variety of use cases. AWS Trainium gives us the scale and high performance needed to train our Mosaic MPT models, and at a low cost. As we train our next generation Mosaic MPT models, Trainium2 will make it possible to build models even faster, allowing us to provide our customers unprecedented scale and performance so they can bring their own generative AI applications to market more rapidly.”

Naveen Rao, VP of Generative AI, Databricks

With the mission of “reinventing the mechanism of value creation and advancing humanity,” Stockmark helps many companies create and build innovative businesses by providing cutting-edge natural language processing technology.

"With 16 nodes of Amazon EC2 Trn1 instances powered by AWS Trainium chips, we have developed and released stockmark-13b, a large language model with 13 billion parameters, pre-trained from scratch on a Japanese corpus of 220B tokens. The corpus includes the latest business domain texts up to September 2023. The model achieved the highest JSQuAD score (0.813) on the JGLUE (Japanese General Language Understanding Evaluation) benchmark compared to other equivalent models. It is available at Hugging Face Hub and can be used commercially with the MIT license. Trn1 instances helped us to achieve 20% training cost reduction compared to equivalent GPU instances."

Kosuke Arima, CTO, Stockmark Co., Ltd.

RICOH offers workplace solutions and digital transformation services designed to manage and optimize the flow of information across businesses.

"The migration to Trn1 instances was quite straightforward. We were able to complete the training of our 13B parameter model in just 8 days. Building on this success, we are looking forward to developing and training our 70B parameter model on Trainium and are excited about the potential of these instances in training our models faster and more cost-effectively."

Yoshiaki Umetsu, Director, Digital Technology Development Center, RICOH

Helixon

“At HeliXon, we build next-generation AI solutions to protein-based therapeutics. We aim to develop AI tools that empower scientists to decipher protein function and interaction, interrogate large-scale genomic datasets for target identification, and design therapeutics such as antibodies and cell therapies. Today, we use training distribution libraries like FSDP to parallelize model training over many GPU-based servers, but this still takes us weeks to train a single model. We are excited to utilize Amazon EC2 Trn1 instances, featuring the highest networking bandwidth (800 Gbps) available in AWS to improve the performance of our distributed training jobs and reduce our model training times, while also reducing our training costs."

Jian Peng, CEO, Helixon

Money Forward

Money Forward, Inc. serves businesses and individuals with an open and fair financial platform.

“We launched a large-scale AI chatbot service on the Amazon EC2 Inf1 instances and reduced our inference latency by 97% over comparable GPU-based instances while also reducing costs. As we keep fine-tuning tailored NLP models periodically, reducing model training times and costs is also important. Based on our experience from successful migration of inference workload on Inf1 instances and our initial work on AWS Trainium-based EC2 Trn1 instances, we expect Trn1 instances will provide additional value in improving end-to-end ML performance and cost.”

Takuya Nakade, CTO, Money Forward, Inc.

Magic

Magic is an integrated product and research company developing AI that feels like a colleague to make the world more productive.

“Training large autoregressive Transformer-based models is an essential component of our work. AWS Trainium-powered Trn1 instances are designed specifically for these workloads, offering near infinite scalability, fast inter-node networking, and advanced support for 16- and 8-bit data types. Trn1 instances will help us train large models faster, at a lower cost. We are particularly excited about the native support for BF16 stochastic rounding in Trainium, increasing performance while numerical accuracy is indistinguishable from full precision.”

Eric Steinberger, Cofounder and CEO, Magic

Cactus

CACTUS has a suite of products and solutions for researchers, and organizations that improve how research gets funded, published, communicated and discovered.

“At Cactus Labs, we harness the power of AI, with research focused on natural language processing, ranking and recommendation, conversational AI, large language models, computer vision, AR/VR and XAI. In line with our quest to enable faster training of machine learning models as well as enable our researchers to run more experiments while managing the infrastructure cost, we were delighted to evaluate AWS Trainium. AWS Trainium’s out of the box features like XLA optimization, multi-worker data parallel training, and graph caching are really useful for us to reduce our training times and help us run more experiments faster and cheaper.”

Nishchay Shah, CTO and Head of Emerging Products, Cactus Communications

Watashiha

Watashiha offers an innovative and interactive AI chatbot service, “OGIRI AI,” which incorporates humor to provide a funny answer on the spot for a question.

“We use Large Language Models to incorporate humor and offer a more relevant and conversational experience to our customers on our AI services. This requires us to pre-train and fine-tune these models frequently. We pre-trained a GPT-based Japanese model on the EC2 Trn1.32xlarge instance, leveraging tensor and data parallelism. The training was completed within 28 days at a 33% cost reduction over our previous GPU based infrastructure. As our models rapidly continue to grow in complexity, we are looking forward to Trn1n instances which has double the network bandwidth of Trn1 to speed up training of larger models.”

Yohei Kobashi, CTO, Watashiha, K.K.

Partners

"At PyTorch, we accelerate taking machine learning from research prototyping to production ready for customers. We have collaborated extensively with the AWS team to provide native PyTorch support for the new AWS Trainium powered Amazon EC2 Trn1 instances that are purpose built for training deep learning models. Developers building PyTorch models can start training on Trn1 instances with minimal code changes. Additionally, we have worked with the OpenXLA community to enable PyTorch Distributed libraries for easy model migration from GPU-based instances to Trn1 instances. We are excited about the innovation that Trn1 instances bring to the PyTorch community, including more efficient data types, dynamic shapes, custom operators, hardware-optimized stochastic rounding, and eager debug mode. All these makes Trn1 well suited for wide adoption by PyTorch developers and we look forward to future joint contributions to PyTorch to further optimize training performance."

Geeta Chauhan, Applied AI, Engineering Manager, PyTorch

"Hugging Face’s mission is to democratize good ML to help ML developers around the world solve real-world problems. And key to that is ensuring the latest and greatest models run as fast and efficiently as possible on the best ML chips in the cloud. We are incredibly excited about the potential for Inferentia2 to become the new standard way to deploy generative AI models at scale. With Inf1, we saw up to 70% lower cost than traditional GPU-based instances, and with Inf2 we have seen up to 8x lower latency for BERT-like transformers compared to Inferentia1. With Inferentia2, our community will be able to easily scale this performance to LLMs at the 100B+ parameters scale, and to the latest diffusion and computer vision models as well."

Amazon services using Trn1 instances

Amazon

Amazon’s product search engine indexes billions of products, serves billions of customer queries daily, and is one of the most heavily used services in the world.

“We are training large language models (LLM) that are multi-modal (text + image), multilingual, multi-locale, pre-trained on multiple tasks, and span multiple entities (products, queries, brands, reviews, etc.) to improve the customer shopping experience. Trn1 instances provide a more sustainable way to train LLMs by delivering the best performance/watt compared to other accelerated machine-learning solutions and offers us high performance at the lowest cost. We plan to explore the new configurable FP8 datatype, and hardware-accelerated stochastic rounding to further increase our training efficiency and development velocity.”

Trishul Chilimbi, VP, Amazon Search

Getting started

Using Amazon SageMaker

You can easily train models on Trn1 instances by using Amazon SageMaker. Significantly reduce the time and cost to train and tune ML models without the need to manage infrastructure. With SageMaker, you can use built-in tools to manage and track training experiments, automatically choose optimal hyperparameters, debug training jobs, and monitor the use of system resources.

Using the AWS Deep Learning AMIs

The AWS Deep Learning AMIs (DLAMI) provide deep learning (DL) practitioners and researchers with the infrastructure and tools to accelerate DL on AWS, at any scale. AWS Neuron drivers comes preconfigured in the DLAMI to train your DL models optimally on Trn1 instances.

Using AWS Deep Learning Containers

You can now deploy Trn1 instances in Amazon Elastic Kubernetes Service (EKS), a fully managed Kubernetes service, and in Amazon Elastic Container Service (ECS), a fully managed container orchestration service. Neuron is also available pre-installed in AWS Deep Learning Containers. To learn more about running containers on Trn1 instances, see the Neuron containers tutorials.

Product details

Instance Size	Trainium Chips	Accelerator Memory (GB)	vCPUs	Instance Memory (GiB)	Local NVMe Storage (TB)	Network Bandwidth (Gbps)	EFA and RDMA Support	EBS Bandwidth (Gbps)	On-Demand Price per Hour	1-Year Reserved Instance Effective Hourly*	3-Year Reserved Instance Effective Hourly*
trn1.2xlarge	1	32	8	32	0.5	Up to 12.5	No	Up to 20	$1.34	$0.79	$0.4744
trn1.32xlarge	16	512	128	512	8	800	Yes	80	$21.50	$12.60	$7.59
trn1n.32xlarge	16	512	128	512	8	1600	Yes	80	$24.78	$14.52	$8.59

Sign up for an AWS account

Sign up for an AWS account

Instantly get access to the AWS Free Tier.

Learn with simple tutorials

Learn with 10-minute tutorials

Explore and learn with simple tutorials.

Start building with EC2 in the console

Start building in the console

Begin building with step-by-step guides to help you launch your AWS project.