Announcing AWS Inferentia: Machine Learning Inference Chip

Posted on: Nov 28, 2018

AWS Inferentia is a machine learning inference chip, custom designed by AWS to deliver high throughput, low latency inference performance at an extremely low cost. AWS Inferentia will support the TensorFlow, Apache MXNet, and PyTorch deep learning frameworks, as well as models that use the ONNX format.

AWS Inferentia provides hundreds of TOPS (tera operations per second) of inference throughput to allow complex models to make fast predictions. For even more performance, multiple AWS Inferentia chips can be used together to drive thousands of TOPS of throughput.

AWS Inferentia will be available for use with Amazon SageMaker, Amazon EC2, and Amazon Elastic Inference. For more information about AWS Inferentia, see the web page.

Announcing AWS Inferentia: Machine Learning Inference Chip

Learn

Resources

Developers

Help