High performance machine learning inference chip, custom designed by AWS
AWS Inferentia is a machine learning inference chip designed to deliver high performance at low cost. AWS Inferentia will support the TensorFlow, Apache MXNet, and PyTorch deep learning frameworks, as well as models that use the ONNX format.
Making predictions using a trained machine learning model–a process called inference–can drive as much as 90% of the compute costs of the application. Using Amazon Elastic Inference, developers can reduce inference costs by up to 75% by attaching GPU-powered inference acceleration to Amazon EC2 and Amazon SageMaker instances. However, some inference workloads require an entire GPU or have extremely low latency requirements. Solving this challenge at low cost requires a dedicated inference chip.
AWS Inferentia provides high throughput, low latency inference performance at an extremely low cost. Each chip provides hundreds of TOPS (tera operations per second) of inference throughput to allow complex models to make fast predictions. For even more performance, multiple AWS Inferentia chips can be used together to drive thousands of TOPS of throughput. AWS Inferentia will be available for use with Amazon SageMaker, Amazon EC2, and Amazon Elastic Inference.
Sign-up for service availability notifications
To be notified about AWS Inferentia availability, sign up here, and we'll send you an e-mail when more information becomes available.
AWS Inferentia is coming soon. Sign-up to be notified when more information is available.
Instantly get access to the AWS Free Tier.
Get started with machine learning in the AWS Console.