AWS Trainium is the second custom machine learning (ML) chip designed by AWS that provides the best price performance for training deep learning models in the cloud. Trainium offers the highest performance with the most teraflops (TFLOPS) of compute power for the fastest ML training in Amazon EC2 and enables a broader set of ML applications. The Trainium chip is specifically optimized for deep learning training workloads for applications including image classification, semantic search, translation, voice recognition, natural language processing and recommendation engines.
As the use of ML accelerates, there's a pressing need to increase performance and reduce infrastructure costs driven by inference and training. AWS launched AWS Inferentia, a custom chip that provides customers with high performance ML inference at the lowest cost in the cloud. While Inferentia addressed the cost of inference, which constitutes up to 90% of ML infrastructure costs, many development teams are also limited by fixed ML training budgets. This puts a cap on the scope and frequency of training needed to improve their models and applications. AWS Trainium addresses this challenge by providing the most cost-efficient training for model training in the cloud. With both Trainium and Inferentia, customers will have an end-to-end flow of ML compute from scaling training workloads to deploying accelerated inference.
AWS Trainium shares the same AWS Neuron SDK as AWS Inferentia making it easy for developers using Inferentia to get started with Trainium. Because the Neuron SDK is integrated with popular ML frameworks including TensorFlow and PyTorch, developers can easily migrate to AWS Trainium from GPU-based instances with minimal code changes. AWS Trainium is available for preview in Amazon EC2 Trn1 instances.
AWS Inferentia is a ML inference chip custom-built by AWS to deliver high performance and the lowest cost machine learning inference in the cloud. AWS Inferentia enables up to 30% higher throughput and up to 45% lower cost-per-inference than Amazon EC2 G4 instances, which were already the lowest cost instances for ML inference in the cloud.