Posted On: Nov 28, 2018

Amazon Elastic Inference allows you to attach just the right amount of GPU-powered acceleration to any Amazon EC2 and Amazon SageMaker instance to reduce the cost of running deep learning inference by up to 75%. Amazon Elastic Inference supports TensorFlow, Apache MXNet, and ONNX models, with more frameworks coming soon. 

With Amazon Elastic Inference, you can now choose the instance type that is best suited to the overall CPU and memory needs of your application, and then separately configure the amount of inference acceleration that you need with no code changes. This allows you to use resources efficiently and to reduce the cost of running inference. For more information about Amazon Elastic Inference, see the service detail page.