AWS AI Chips

AWS Trainium Customers

See how customers are using AWS Trainium to build, train, and fine-tune deep learning models.

Anthropic

At Anthropic, millions of people rely on Claude daily for their work. We're announcing two major advances with AWS: First, a new "latency-optimized mode" for Claude 3.5 Haiku which runs 60% faster on Trainium2 via Amazon Bedrock. And second, Project Rainier—a new cluster with hundreds of thousands of Trainium2 chips delivering hundreds of exaflops, which is over five times the size of our previous cluster. Project Rainier will help power both our research and our next generation of scaling. For our customers, this means more intelligence, lower prices, and faster speeds. We're not just building faster AI, we're building trustworthy AI that scales.

Tom Brown, Chief Compute Officer at Anthropic

Databricks

Databricks’ Mosaic AI enables organizations to build and deploy quality Agent Systems. It is built natively on top of the data lakehouse, enabling customers to easily and securely customize their models with enterprise data and deliver more accurate and domain-specific outputs. Thanks to Trainium's high performance and cost-effectiveness, customers can scale model training on Mosaic AI at a low cost. Trainium2’s availability will be a major benefit to Databricks and its customers as demand for Mosaic AI continues to scale across all customer segments and around the world. Databricks, one of the largest data and AI companies in the world, plans to use TRN2 to deliver better results and lower TCO by up to 30% for its customers.

Naveen Rao, VP of Generative AI at Databricks

poolside

At poolside, we are set to build a world where AI will drive the majority of economically valuable work and scientific progress. We believe that software development will be the first major capability in neural networks that reaches human-level intelligence because it's the domain where we can combine Search and Learning approaches the best. To enable that, we're building foundation models, an API, and an Assistant to bring the power of generative AI to your developers' hands (or keyboard). A major key to enable this technology, is the infrastructure we are using to build and run our products. With AWS Trainium2 our customers will be able to scale their usage of poolside at a price performance ratio unlike other AI accelerators. In addition, we plan to train future models with Trainium2 UltraServers with expected savings of 40% compared to EC2 P5 instances.

Eiso Kant, CTO & Co-founder, poolside

Itaú Unibanco

We have tested AWS Trainium and Inferentia across various tasks, ranging from standard inference to fine-tuned applications. The performance of these AI chips have enabled us to achieve significant milestones in our research and development. For both batch and online inference tasks, we have seen a 7x improvement in throughput compared to GPUs. This enhanced performance is driving the expansion of more use cases across the organization. The latest generation of Trainium2 chips unlocks groundbreaking features for GenAI and opens the door for innovation at Itau.

Vitor Azeka, Head of Data Science at Itaú Unibanco

NinjaTech AI

We are extremely excited for the launch of AWS TRN2 because we believe it’ll offer the best cost per token performance and most the fastest speed currently possible for our core model Ninja LLM which is based off of Llama 3.1 405B. It’s amazing to see Trn2’s low latency coupled with competitive pricing and on-demand availability; we couldn’t be more excited about Trn2’s arrival!

Babak Pahlavan, Founder & CEO, NinjaTech AI

Ricoh

The migration to Trn1 instances was easy and straightforward. We were able to pretrain our 13B parameter LLM in just 8 days, utilizing a cluster of 4,096 Trainium chips! After the success we saw with our smaller model, we fine-tuned a new, larger LLM based on Llama-3-Swallow-70B, and leveraging Trainium we were able to reduce our training costs by 50% and improve the energy efficiency by 25% as compared to using latest GPU machines in AWS. We are excited to leverage the latest generation of AWS AI Chips, Trainium2, to continue to provide our customers with the best performance at the lowest cost.

Yoshiaki Umetsu, Director, Digital Technology Development Center, Ricoh

PyTorch

What I liked most about AWS Neuron NxD Inference library is how seamlessly it integrates with PyTorch models. NxD's approach is straightforward and user-friendly. Our team was able to onboard HuggingFace PyTorch models with minimal code changes in a short time frame. Enabling advanced features like Continuous Batching and Speculative Decoding was straightforward. This ease of use enhances developer productivity, allowing teams to focus more on innovation and less on integration challenges.

Hamid Shojanazeri, PyTorch Partner Engineering Lead, Meta

Refact.ai

Customers have seen up to 20% higher performance and 1.5x higher tokens per dollar with EC2 Inf2 instances compared to EC2 G5 instances. Refact.ai’s fine-tuning capabilities further enhance our customers’ ability to understand and adapt to their organizations’ unique codebase and environment. We are also excited to offer the capabilities of Trainium2, that will bring even faster, more efficient processing to our workflows. This advanced technology will enable our customers to accelerate their software development process, by boosting developer productivity while maintaining strict security standards for their code base.

Oleg Klimov CEO & Founder, Refact.ai

Karakuri Inc.

KARAKURI, builds AI tools to improve the efficiency of web based customer support and simplify customer experiences. These tools include AI chatbots equipped with generative AI functions an FAQ centralization tools, and an email response tool, all of which improve the efficiency and quality of customer support. Utilizing AWS Trainium, we succeeded in training KARAKURI LM 8x7B Chat v0.1. For startups, like ourselves, we need to optimize the time to build and the cost required to train LLMs. With the support of AWS Trainium and AWS Team, we were able to develop a practical level LLM in a short period of time. Also, by adopting AWS Inferentia, we were able to build a fast and cost-effective inference service. We're energized about Trainium2 because it will revolutionize our training process, reducing our training time by 2x and driving efficiency to new heights!

Tomofumi Nakayama, Co-Founder, Karakuri Inc.

Stockmark Inc.

With the mission of “reinventing the mechanism of value creation and advancing humanity,” Stockmark helps many companies create and build innovative businesses by providing cutting-edge natural language processing technology. Stockmark’s new data analysis and gathering service called Anews and SAT, a Data structuring service that dramatically improves generative AI uses by organizing all forms of information stored in an organization, required us to rethink how we built and deployed models to support these products. With 256 Trainium accelerators, we have developed and released stockmark- 13b, a large language model with 13 billion parameters, pre-trained from scratch on a Japanese corpus dataset of 220B tokens. Trn1 instances helped us to reduce our training costs by 20%. Leveraging Trainium, we successfully developed an LLM that can answer business- critical questions for professionals with unprecedented accuracy and speed. This achievement is particularly noteworthy given the widespread challenge companies face in securing adequate computational resources for model development. With the impressive speed and cost reduction of Trn1 instances, we are excited to see the additional benefits that Trainium2 will bring to our workflows and customers.

Kosuke Arima, CTO and Co-founder, Stockmark Inc.

Brave

Brave is an independent browser and search engine dedicated to prioritizing user privacy and security. With over 70 million users, we deliver industry-leading protections that make the Web safer and more user-friendly. Unlike other platforms that have shifted away from user-centric approaches, Brave remains committed to putting privacy, security, and convenience first. Key features include blocking harmful scripts and trackers, AI- assisted page summaries powered by LLMs, built-in VPN services, and more. We continually strive to enhance the speed and cost-efficiency of our search services and AI models. To support this, we’re excited to leverage the latest capabilities of AWS AI chips, including Trainium2, to improve user experience as we scale to handle billions of search queries monthly.

Subu Sathyanarayana , VP of Engineering, Brave Software

Anyscale

At Anyscale, we’re committed to empowering enterprises with the best tools to scale AI workloads efficiently and cost- effectively. With native support for AWS Trainium and Inferentia chips, powered by our RayTurbo runtime, our customers have access to high performing, cost effective options for model training and serving. We are now excited to join forces with AWS on Trainium2, unlocking new opportunities for our customers to innovate rapidly, and deliver high-performing transformative AI experiences at scale.

Robert Nishihara, Cofounder, Anyscale

Datadog

Datadog, the observability and security platform for cloud applications, provides AWS Trainium and Inferentia Monitoring for customers to optimize model performance, improve efficiency, and reduce costs. Datadog’s integration provides full visibility into ML operations and underlying chip performance, enabling proactive issue resolution and seamless infrastructure scaling. We're excited to extend our partnership with AWS for the AWS Trainium2 launch, which helps users cut AI infrastructure costs by up to 50% and boost model training and deployment performance.

Yrieix Garnier, VP of Product Company, Datadog

Hugging Face

Hugging Face is the leading open platform for AI builders, with over 2 million models, datasets and AI applications shared by a community of more than 5 million researchers, data scientists, machine learning engineers and software developers. We have been collaborating with AWS over the last couple of years, making it easier for developers to experience the performance and cost benefits of AWS Inferentia and Trainium through the Optimum Neuron open source library, integrated in Hugging Face Inference Endpoints, and now optimized within our new HUGS self-deployment service, available on the AWS Marketplace. With the launch of Trainium2, our users will access even higher performance to develop and deploy models faster.

Jeff Boudier, Head of Product, Hugging Face

Lightning AI

Lightning now natively offers support for AWS AI Chips, Trainium and Inferentia, which are integrated across Lightning Studios and our open-source tools like PyTorch Lightning, Fabric, and LitServe. This gives users seamless capability to pretrain, fine-tune, and deploy at scale—optimizing cost, availability, and performance with zero switching overhead, and the performance and cost benefits of AWS AI Chips, including the latest generation of Trainium2 chips, delivering higher performance at lower cost.

Luca Antiga, CTO, Lightning AI

Domino Data Lab

Leading enterprises must balance technical complexity, costs, and governance, mastering expansive AI options for a competitive advantage. At Domino, we're committed to giving customers access to cutting-edge technologies. With compute as a bottleneck for so much groundbreaking innovation, we're proud to give customers access to Trainium2 so they can train and deploy models with higher performance, lower cost, and better energy efficiency.

Nick Elprin, CEO and Co-Founder, Domino Data Lab

Scale.ai

Scale is accelerating the development of AI applications. With Scale Gen AI solutions, we help enterprises accelerate generative AI adoption and increase ROI by generating high quality data and providing technology solutions that allow our customers to build, deploy and evaluate the best ai tools and applications. Earlier this year, Scale partnered with AWS to be their first model customization and evaluation partner. As we help our customers accelerate their AI roadmap to build Gen AI solutions, we will offer AWS Trainium and Inferentia to reduce training and deployment cost for their open source models. We are exctied to see AWS Trainium 2 bring greater cost savings.

Vijay Kaunamurthy Field CTO

Money Forward, Inc.

We launched a large-scale AI chatbot service on the Amazon EC2 Inf1 instances and reduced our inference latency by 97% over comparable GPU-based instances while also reducing costs. As we keep fine-tuning tailored NLP models periodically, reducing model training times and costs is also important. Based on our experience from successful migration of inference workload on Inf1 instances and our initial work on AWS Trainium-based EC2 Trn1 instances, we expect Trn1 instances will provide additional value in improving end-to-end ML performance and cost.

Takuya Nakade, CTO, Money Forward, Inc.

Mimecast

At Mimecast, we process around 1.4 billion emails every day and analyze them for potential risk. It’s a crucial task, and it’s vital we deliver safe emails, free of risk and without delay. Our customers span more than 100 countries, and on average, each organization uses 4.9 Mimecast services. The platform includes advanced email security, collaboration security, email archive, DMARC, insider risk protection and security awareness with a human-centric approach. We don’t want to sacrifice on accuracy, so we built our models in house to achieve precision and recall levels well above 90%. Based on these requirements, Inferentia 2 instances were the most appropriate way forward. Inferentia 2’s exceptional efficiency allows us to achieve remarkable latency, delivering real-time experiences for our customers. AWS AI Chips combined with SageMaker makes it very easy to horizontally scale to meet our real-time demand and we use a custom scheduled scaling policy to scale up to 100’s of instances at peak hours with nearly zero latency overheads.

Felix Laumann Director - Data science

Jax (Google)

AWS Neuron is designed to make it easy to use popular frameworks like JAX with Trainium while minimizing code changes and tie-in to vendor-specific solutions. Google and AWS are collaborating to enable customers to get started with Trn2 instances quickly using JAX for large-scale training and inference through its native OpenXLA integration. With broad collaboration and now the availability of Trainium2, Google expects to see increased adoption of JAX—a significant milestone for the entire ML community.

Bill Jia VP engineering Google

Watashiha

We use Large Language Models to incorporate humor and offer a more relevant and conversational experience to our customers on our AI services. This requires us to pre-train and fine-tune these models frequently. We pre-trained a GPT-based Japanese model on the EC2 Trn1.32xlarge instance, leveraging tensor and data parallelism. The training was completed within 28 days at a 33% cost reduction over our previous GPU based infrastructure. As our models rapidly continue to grow in complexity, we are looking forward to Trn1n instances which has double the network bandwidth of Trn1 to speed up training of larger models.

Yohei Kobashi, CTO, Watashiha, K.K.

Amazon

We are training large language models (LLM) that are multi-modal (text + image), multilingual, multi-locale, pre-trained on multiple tasks, and span multiple entities (products, queries, brands, reviews, etc.) to improve the customer shopping experience. Trn1 instances provide a more sustainable way to train LLMs by delivering the best performance/watt compared to other accelerated machine-learning solutions and offers us high performance at the lowest cost. We plan to explore the new configurable FP8 datatype, and hardware-accelerated stochastic rounding to further increase our training efficiency and development velocity.

Trishul Chilimbi, VP, Amazon Search

Next Steps

Console

Start building in the console

Resources

Training Samples/Tutorials (Trn1/Trn1n)

Learn more