Posted On: Oct 25, 2019

Amazon Elastic Inference has introduced new Elastic Inference Accelerators called EIA2, with up to 8GB of GPU memory. Customers can now use Amazon Elastic Inference on larger models or models that have larger input sizes for image processing, object detection, image classification, automated speech processing and natural language processing and other deep learning use cases.

Amazon Elastic Inference allows you to attach just the right amount of GPU-powered acceleration to any Amazon EC2 instance, Amazon SageMaker instance or Amazon ECS task to reduce the cost of running deep learning inference by up to 75%. With Amazon Elastic Inference, you can choose the instance type that is best suited to the overall CPU and memory needs of your application, and separately configure the amount of inference acceleration that you need with no code changes. Until now, you could provision a maximum of 4GB of GPU memory on Elastic Inference. Now, you can choose among 3 new accelerator types, which have 2GB, 4GB and 8GB of GPU memory respectively. Amazon Elastic Inference supports TensorFlow, Apache MXNet, and ONNX models, with more frameworks coming soon.

The new Elastic Inference Accelerators are available in US East (Virginia), US West (Oregon), US East (Ohio), Asia Pacific (Seoul) and EU (Ireland). Support for other regions is coming soon.

For more information, see the Amazon Elastic Inference product page and documentation.