Posted On: Jun 26, 2023

Starting today, you can choose Inferentia 2 and Trainium 1 as additional targets to compile your PyTorch and TensorFlow models for Amazon SageMaker Neo, a capability of Amazon SageMaker that enables customers to optimize machine learning (ML) models for inference on SageMaker to achieve faster inference without any loss in accuracy. Amazon Elastic Compute Cloud (Amazon EC2) Inf2 instances deliver high performance at the lowest cost for generative artificial intelligence (AI) models, including large language models (LLMs) and vision transformers. AWS Trainium is a machine learning (ML) accelerator that AWS purpose built for deep learning training of 100B+ parameter models.

Inferentia 2 instances are available in us-east-2 and Trainium 1 instances are available in us-east-1. You can quickly get started through the console simply by selecting ml_inf2 or ml_trn1 as the Target Device. If you use an SDK to compile models using Neo, set the TargetDevice field in the output config as ml_inf2 or ml_trn1. Supported frameworks are PyTorch 1.13 and TensorFlow 2.10. Learn more about it here.

To learn more on AWS Sagemaker Neo and console experience, please view the documentations here. To get started, log into Amazon SageMaker console.