AWS Inferentia | AWS News Blog

AWS Week in Review: New Service for Generative AI and Amazon EC2 Trn1n, Inf2, and CodeWhisperer now GA – April 17, 2023

I could almost title this blog post the “AWS AI/ML Week in Review.” This past week, we announced several new innovations and tools for building with generative AI on AWS. Let’s dive right into it. Last Week’s Launches Here are some launches that got my attention during the previous week: Announcing Amazon Bedrock and Amazon […]

Amazon EC2 Inf2 Instances for Low-Cost, High-Performance Generative AI Inference are Now Generally Available

Innovations in deep learning (DL), especially the rapid growth of large language models (LLMs), have taken the industry by storm. DL models have grown from millions to billions of parameters and are demonstrating exciting new capabilities. They are fueling new applications such as generative AI or advanced research in healthcare and life sciences. AWS has […]

Scaling Ad Verification with Machine Learning and AWS Inferentia

Amazon Advertising helps companies build their brand and connect with shoppers, through ads shown both within and beyond Amazon’s store, including websites, apps, and streaming TV content in more than 15 countries. Businesses or brands of all sizes including registered sellers, vendors, book vendors, Kindle Direct Publishing (KDP) authors, app developers, and agencies on Amazon […]

Majority of Alexa Now Running on Faster, More Cost-Effective Amazon EC2 Inf1 Instances

Today, we are announcing that the Amazon Alexa team has migrated the vast majority of their GPU-based machine learning inference workloads to Amazon Elastic Compute Cloud (Amazon EC2) Inf1 instances, powered by AWS Inferentia. This resulted in 25% lower end-to-end latency, and 30% lower cost compared to GPU-based instances for Alexa’s text-to-speech workloads. The lower […]

Amazon ECS Now Supports EC2 Inf1 Instances

As machine learning and deep learning models become more sophisticated, hardware acceleration is increasingly required to deliver fast predictions at high throughput. Today, we’re very happy to announce that AWS customers can now use the Amazon EC2 Inf1 instances on Amazon Elastic Container Service (Amazon ECS), for high performance and the lowest prediction cost in […]

Amazon EKS Now Supports EC2 Inf1 Instances

Amazon Elastic Kubernetes Service (Amazon EKS) (EKS) has quickly become a leading choice for machine learning workloads. It combines the developer agility and the scalability of Kubernetes, with the wide selection of Amazon Elastic Compute Cloud (Amazon EC2) instance types available on AWS, such as the C5, P3, and G4 families. As models become more […]

AWS News Blog