Artificial Intelligence

Author: Michael Nguyen

Michael Nguyen is a Senior Solutions Architect at AWS, where he has helped startups and fintechs build innovative solutions. His areas of focus include AI/ML and the financial services industry. Prior to AWS he worked for over 20 years as a lead architect in the banking industry specializing in payment card services. Michael is 13x AWS Certified and holds a B.S. in Electrical Engineering from Penn State University, M.S. in Electrical Engineering from Binghamton University, and an MBA from the University of Delaware.

Efficient and cost-effective multi-tenant LoRA serving with Amazon SageMaker

In this post, we explore a solution that addresses these challenges head-on using LoRA serving with Amazon SageMaker. By using the new performance optimizations of LoRA techniques in SageMaker large model inference (LMI) containers along with inference components, we demonstrate how organizations can efficiently manage and serve their growing portfolio of fine-tuned models, while optimizing costs and providing seamless performance for their customers. The latest SageMaker LMI container offers unmerged-LoRA inference, sped up with our LMI-Dist inference engine and OpenAI style chat schema. To learn more about LMI, refer to LMI Starting Guide, LMI handlers Inference API Schema, and Chat Completions API Schema.

Boost inference performance for LLMs with new Amazon SageMaker containers

Today, Amazon SageMaker launches a new version (0.25.0) of Large Model Inference (LMI) Deep Learning Containers (DLCs) and adds support for NVIDIA’s TensorRT-LLM Library. With these upgrades, you can effortlessly access state-of-the-art tooling to optimize large language models (LLMs) on SageMaker and achieve price-performance benefits – Amazon SageMaker LMI TensorRT-LLM DLC reduces latency by 33% […]

Train fraudulent payment detection with Amazon SageMaker

The ability to detect fraudulent card payments is becoming increasingly important as the world moves towards a cashless society. For decades, banks have relied on building complex mathematical models to predict whether a given card payment transaction is likely to be fraudulent or not. These models must be both accurate and precise—they must catch fraudulent […]

Artificial Intelligence

Author: Michael Nguyen

Efficient and cost-effective multi-tenant LoRA serving with Amazon SageMaker

Boost inference performance for LLMs with new Amazon SageMaker containers

Train fraudulent payment detection with Amazon SageMaker

Learn

Resources

Developers

Help