Posted On: May 4, 2023

We are excited to announce the availability of ml.inf2 and ml.trn1 family of instances on Amazon SageMaker for deploying machine learning (ML) models for Real-time and Asynchronous inference. You can use these instances on SageMaker to achieve high performance at a low cost for generative artificial intelligence (AI) models, including large language models (LLMs) and vision transformers. In addition, you can use SageMaker Inference Recommender to help you run load tests and evaluate the price-performance benefits of deploying your model on these instances.

ml.inf2 and ml.trn1 instances are powered by AWS Inferentia2 and Trainium accelerators respectively.

  • You can use ml.inf2 instances to run your ML applications on SageMaker for text summarization, code generation, video, and image generation, speech recognition, and more. ml.inf2 instances offer up to 384 GB of shared accelerator memory for performant generative AI inference.
  • ml.trn1 instances are similar to ml.inf2 instances but has 512 GB of shared accelerator memory; you can use these instances to deploy even larger models on SageMaker. In addition, these instances have up to 8 TB of local NVMe solid state drive (SSD) storage for fast workload access to large datasets and models.

ml.inf2 instances are available for model deployment on SageMaker in US East (Ohio) and ml.trn1 instances in US East (N. Virginia).

You can easily get started using ml.trn1 and ml.inf2 compatible AWS Deep Learning Containers (DLCs) for PyTorch, Tensorflow, HuggingFace, and Large Model Inference (LMI) when deploying endpoints (details). For pricing, please visit our pricing page.