Introducing Amazon EC2 Inf1 Instances, high performance and the lowest cost machine learning inference in the cloud

Posted on: Dec 3, 2019

Today, we are announcing the general availability of Amazon EC2 Inf1 instances, built from the ground up to support machine learning inference applications. Inf1 instances feature up to 16 AWS Inferentia chips, high-performance machine learning inference chips designed and built by AWS. In addition, we’ve coupled the Inferentia chips with the latest custom 2nd Gen Intel® Xeon® Scalable processors and up to 100 Gbps networking to enable high throughput inference. This powerful configuration enables Inf1 instances to deliver up to 3x higher throughput and up to 40% lower cost per inference than Amazon EC2 G4 instances, which were already the lowest cost instance for machine learning inference available in the cloud.

Amazon EC2 Inf1 instances offer high performance and the lowest cost machine learning inference in the cloud. Using Inf1 instances, customers can run large scale machine learning inference applications like image recognition, speech recognition, natural language processing, personalization, and fraud detection, at the lowest cost in the cloud.  

AWS makes it easy for you to deploy your machine learning application on Amazon EC2 Inf1 instances. Once your model is trained, you can use AWS Neuron, an SDK for running inference using AWS Inferentia chips that consists of a compiler, run-time, and profiling tools. Neuron is pre-integrated into popular machine learning frameworks including TensorFlow, Pytorch, and MXNet to deliver optimal performance of EC2 Inf1 instances. Inf1 instances can be deployed using AWS Deep Learning AMIs and will be available via managed services such as Amazon SageMaker, EKS and ECS.

Amazon EC2 Inf1 instances come in 4 sizes and are available in the US East (N. Virginia) and US West (Oregon) AWS Regions as On-Demand, Reserved, and Spot Instances or as part of a Savings Plan. To learn more about Inf1 instances, visit the Inf1 page.