Reduce ML inference costs on PyTorch with Amazon Elastic Inference

Posted on: Mar 18, 2020

You can now use Amazon Elastic Inference to accelerate inference and reduce inference costs for PyTorch models in Amazon SageMaker, Amazon EC2 and Amazon ECS. Enhanced PyTorch libraries for EI are available automatically in Amazon SageMaker, AWS Deep Learning AMIs, and AWS Deep Learning Containers, so you can deploy your PyTorch models in production with minimal code changes. Elastic Inference supports TorchScript compiled models on PyTorch. In order to use Elastic Inference with PyTorch, you must convert your PyTorch models into TorchScript and use the Elastic Inference API for inference. Today, PyTorch joins TensorFlow and Apache MXNet as a deep learning framework that is supported by Elastic Inference.

Elastic Inference allows you to attach just the right amount of GPU-powered acceleration to any Amazon SageMaker instance, EC2 instance, or ECS task to reduce the cost of running deep learning inference by up to 75%.

PyTorch for Elastic Inference is supported in regions where Amazon Elastic Inference is available. For more information, see Using PyTorch Models with Elastic Inference in the developer guide and our blog post, “Reduce ML inference costs on Amazon SageMaker for PyTorch models using Amazon Elastic Inference“.