AWS Inferentia

High performance machine learning inference chip, custom designed by AWS

AWS Inferentia is a machine learning inference chip designed to deliver high performance at low cost. AWS Inferentia will support the TensorFlow, Apache MXNet, and PyTorch deep learning frameworks, as well as models that use the ONNX format.

Making predictions using a trained machine learning model–a process called inference–can drive as much as 90% of the compute costs of the application. Using Amazon Elastic Inference, developers can reduce inference costs by up to 75% by attaching GPU-powered inference acceleration to Amazon EC2 and Amazon SageMaker instances. However, some inference workloads require an entire GPU or have extremely low latency requirements. Solving this challenge at low cost requires a dedicated inference chip.

AWS Inferentia provides high throughput, low latency inference performance at an extremely low cost. Each chip provides hundreds of TOPS (tera operations per second) of inference throughput to allow complex models to make fast predictions. For even more performance, multiple AWS Inferentia chips can be used together to drive thousands of TOPS of throughput. AWS Inferentia will be available for use with Amazon SageMaker, Amazon EC2, and Amazon Elastic Inference.

Sign-up for service availability notifications

To be notified about AWS Inferentia availability, sign up here, and we'll send you an e-mail when more information becomes available.

Get service availability updates

AWS Inferentia is coming soon. Sign-up to be notified when more information is available.

Learn more 
Sign up for a free account

Instantly get access to the AWS Free Tier. 

Sign up 
Start building in the console

Get started with machine learning in the AWS Console.

Sign in