Artificial Intelligence
Large model inference container – latest capabilities and performance enhancements
AWS recently released significant updates to the Large Model Inference (LMI) container, delivering comprehensive performance improvements, expanded model support, and streamlined deployment capabilities for customers hosting LLMs on AWS. These releases focus on reducing operational complexity while delivering measurable performance gains across popular model architectures.
Deploy DeepSeek-R1 distilled models on Amazon SageMaker using a Large Model Inference container
Deploying DeepSeek models on SageMaker AI provides a robust solution for organizations seeking to use state-of-the-art language models in their applications. In this post, we show how to use the distilled models in SageMaker AI, which offers several options to deploy the distilled versions of the R1 model.
Easily deploy and manage hundreds of LoRA adapters with SageMaker efficient multi-adapter inference
The new efficient multi-adapter inference feature of Amazon SageMaker unlocks exciting possibilities for customers using fine-tuned models. This capability integrates with SageMaker inference components to allow you to deploy and manage hundreds of fine-tuned Low-Rank Adaptation (LoRA) adapters through SageMaker APIs. In this post, we show how to use the new efficient multi-adapter inference feature in SageMaker.


