Amazon EC2 Trn1 Instances
High-performance, cost-effective deep learning training in the cloud
Amazon EC2 Trn1 instances, powered by AWS Trainium accelerators, are purpose built for high-performance (DL) training while offering up to 50% cost-to-train savings over comparable GPU-based instances. Trn1 instances deliver the highest performance on deep learning training of popular natural language processing (NLP) models on AWS. You can use Trn1 instances to train NLP, computer vision, and recommender models across a broad set of applications, such as speech recognition, recommendation, fraud detection, and image and video classification. You can get started on Trn1 instances by using your existing workflows in popular machine learning (ML) frameworks, such as PyTorch and TensorFlow. The AWS Neuron SDK integrates seamlessly with these frameworks so that you can get started with only a few lines of code changes. To learn about the current Neuron support for ML frameworks and libraries, model architectures, and hardware optimizations, visit the Neuron documentation.
Reduce training times
Trn1 instances are purpose built for high-performance deep learning and reduce training times from months to weeks or even days. With reduced training times you can iterate faster, build more innovative models, and increase productivity.
Lower your deep learning training costs
Trn1 instances deliver high performance while offering up to 50% cost-to-train savings over comparable GPU-based instances.
Build with native support for ML frameworks and libraries
You can get started on Trn1 instances by using popular ML frameworks, such as PyTorch and TensorFlow. The AWS Neuron SDK integrates seamlessly with these frameworks to help you get started with only a few lines of code changes. To get started quickly with Trn1 instances, see popular model examples in the Neuron documentation.
Scale up to 6.3 exaflops of compute on demand
Trn1 instances are the first EC2 instances with up to 800 Gbps of Elastic Fabric Adapter (EFA) network bandwidth. They are deployed in EC2 UltraClusters that enable scaling up to 30,000 Trainium accelerators, which are interconnected with a nonblocking petabit-scale network, to provide 6.3 exaflops of compute.
AWS Trainium accelerators
Trn1 instances are powered by up to 16 AWS Trainium accelerators purpose built to accelerate DL training. Each accelerator includes two second-generation NeuronCores. To support efficient data and model parallelism, each Trn1 instance has 512 GB of high-bandwidth memory (HBM2e), delivers up to 3.4 petaflops of FP16/BF16 compute power, and features NeuronLink, an intra-instance, ultra-high-speed, nonblocking interconnect. For high performance while meeting accuracy goals, Trainium has native support for a wide range of data types, such as FP32, TF32, BF16, FP16, UINT8, and configurable FP8. It enables hardware support for stochastic rounding, enabling high performance and higher accuracy as compared to legacy rounding modes. Trainium also supports dynamic tensor shapes and custom operators written in C++ to deliver flexible, future-proofed infrastructure for your training needs.
AWS Neuron SDK
The AWS Neuron SDK consists of a compiler, framework extensions, a runtime library, and developer tools. It is natively integrated with ML frameworks, such as TensorFlow and PyTorch. AWS Neuron also supports distributed training libraries, such as Megatron-LM, PyTorch FSDP, and others. To get started quickly with Trn1 instances, see popular model examples in the Neuron documentation.
High-performance networking and storage
Each Trn1 instance supports up to 800 Gbps of Elastic Fabric Adapter networking bandwidth. Each Trn1 instance also supports up to 80 Gbps of Amazon Elastic Block Store (EBS) bandwidth and up to 8 TB of local NVMe solid state drive (SSD) storage for fast workload access to large datasets.
Amazon EC2 UltraClusters
Trn1 instances are deployed in EC2 UltraClusters that enable scaling up to 30,000 Trainium accelerators. These accelerators are interconnected with a nonblocking petabit-scale network to provide up to 6.3 exaflops of compute connected with storage solutions such as Amazon S3. With Amazon FSx for Lustre, you can access shared storage that provides sub-millisecond latencies and up to hundreds of gigabytes per second of throughput.
"At PyTorch, we accelerate taking machine learning from research prototyping to production ready for customers. We have collaborated extensively with the AWS team to provide native PyTorch support for the new AWS Trainium powered Amazon EC2 Trn1 instances that are purpose built for training deep learning models. Developers building PyTorch models can start training on Trn1 instances with minimal code changes. Additionally, we have worked with the OpenXLA community to enable PyTorch Distributed libraries for easy model migration from GPU-based instances to Trn1 instances. We are excited about the innovation that Trn1 instances bring to the PyTorch community, including more efficient data types, dynamic shapes, custom operators, hardware optimized stochastic rounding, and eager debug mode. All these, makes Trn1 well suited for wide adoption by PyTorch developers and we look forward to future joint contributions to PyTorch to further optimize training performance."
Geeta Chauhan, Applied AI, Engineering Manager
“At HeliXon, we build next-generation AI solutions to protein-based therapeutics. We aim to develop AI tools that empower scientists to decipher protein function and interaction, interrogate large-scale genomic datasets for target identification, and design therapeutics such as antibodies and cell therapies. Today, we use training distribution libraries like FSDP to parallelize model training over many GPU based servers, but this still takes us weeks to train a single model. We are excited to utilize Amazon EC2 Trn1 instances, featuring the highest networking bandwidth (800 Gbps) available in AWS to improve the performance of our distributed training jobs and reduce our model training times, while also reducing our training costs."
Jian Peng, CEO, Helixon
Money Forward, Inc. serves businesses and individuals with an open and fair financial platform.
“We launched a large-scale AI chatbot service on the Amazon EC2 Inf1 instances and reduced our inference latency by 97% over comparable GPU-based instances while also reducing costs. As we keep fine-tuning tailored NLP models periodically, reducing model training times and costs is also important. Based on our experience from successful migration of inference workload on Inf1 instances and our initial work on AWS Trainium-based EC2 Trn1 instances, we expect Trn1 instances will provide additional value in improving end-to-end ML performance and cost.”
Takuya Nakade, CTO, Money Forward, Inc.
Magic is an integrated product and research company developing AI that feels like a colleague to make the world more productive.
“Training large autoregressive Transformer-based models is an essential component of our work. AWS Trainium-powered Trn1 instances are designed specifically for these workloads, offering near infinite scalability, fast inter-node networking, and advanced support for 16 and 8-bit data types. Trn1 instances will help us train large models faster, at a lower cost. We are particularly excited about the native support for BF16 stochastic rounding in Trainium, increasing performance while numerical accuracy is indistinguishable from full precision.”
Eric Steinberger, Cofounder and CEO, Magic
CACTUS has a suite of products and solutions for researchers, and organizations that improve how research gets funded, published, communicated and discovered.
“At Cactus Labs, we harness the power of AI, with research focused on natural language processing, ranking & recommendation, conversational AI, large language models, computer vision, AR/VR and XAI. In line with our quest to enable faster training of machine learning models as well enable our researchers to run more experiments while managing the infrastructure cost, we were delighted to evaluate AWS Trainium. AWS Trainium’s out of the box features like XLA optimization, multi-worker data parallel training, Graph caching are really useful for us to reduce our training times and help us run more experiments faster and cheaper.”
Nishchay Shah, CTO and Head of Emerging Products, Cactus Communications
Amazon services using Trn1 instances
Amazon’s product search engine indexes billions of products, serves billions of customer queries daily, and is one of the most heavily used services in the world.
“We are training large language models (LLM) that are multi-modal (text + image), multilingual, multi-locale, pre-trained on multiple tasks, and span multiple entities (products, queries, brands, reviews, etc.) to improve the customer shopping experience. Trn1 instances provide a more sustainable way to train LLMs by delivering the best performance/watt compared to other accelerated machine-learning solutions and offers us high performance at the lowest cost. We plan to explore the new configurable FP8 datatype, and hardware accelerated stochastic rounding to further increase our training efficiency and development velocity.”
Trishul Chilimbi, VP, Amazon Search
You can train models on Trn1 instances easily by using Amazon SageMaker. Significantly reduce the time and cost to train and tune ML models without the need to manage infrastructure. With SageMaker, you can use built-in tools to manage and track training experiments, automatically choose optimal hyperparameters, debug training jobs, and monitor the utilization of system resources.
Price per Hour
|trn1.2xlarge||1||32||8||32||0.5||Up to 12.5||No||Up to 20||$1.34||$0.79||$0.4744|