Posted On: Nov 29, 2022

Today, AWS announces the preview of Amazon Elastic Compute Cloud (Amazon EC2) Inf2 instances, which are designed to deliver high performance at the lowest cost in Amazon EC2 for the most demanding deep learning (DL) inference applications. Inf2 instances are powered by up to 12 AWS Inferentia2, the third AWS-designed DL accelerator. Inf2 instances offer 3x higher compute performance, up to 4x higher throughput, and up to 10x lower latency compared to Inf1 instances.

You can use Inf2 instances to run DL applications for natural language understanding, translation, video and image generation, speech recognition, personalization, and more. They are optimized to deploy complex models, such as large language models (LLM) and vision transformers, at scale while also improving the Inf1 instances’ price-performance benefits for smaller models. To support ultra-large 100B+ parameter models, Inf2 instances are the first inference-optimized instances in Amazon EC2 to support scale-out distributed inference with ultra-high-speed connectivity between accelerators.

Inf2 instances offer up to 2.3 petaflops of DL performance, up to 384 GB of accelerator memory with 9.8 TB/s bandwidth, and NeuronLink, an intra-instance ultra-high-speed, nonblocking interconnect. Inf2 instances also offer up to 50% better performance per watt compared to GPU-based instances in Amazon EC2 and help you meet your sustainability goals.The AWS Neuron SDK is natively integrated with popular ML frameworks, such as PyTorch and TensorFlow, so you can deploy your DL applications on Inf2 with a few lines of code.  

To learn more and sign up for a preview of the Inf2 instances, see the Inf2 product detail page.