AWS Machine Learning Blog

End-to-end LLM training on instance clusters with over 100 nodes using AWS Trainium

In this post, we show you how to accelerate the full pre-training of LLM models by scaling up to 128 trn1.32xlarge nodes, using a Llama 2-7B model as an example. We share best practices for training LLMs on AWS Trainium, scaling the training on a cluster with over 100 nodes, improving efficiency of recovery from system and hardware failures, improving training stability, and achieving convergence.

Fine-tune large multimodal models using Amazon SageMaker

Large multimodal models (LMMs) integrate multiple data types into a single model. By combining text data with images and other modalities during training, multimodal models such as Claude3, GPT-4V, and Gemini Pro Vision gain more comprehensive understanding and improved ability to process diverse data types. The multimodal approach allows models to handle a wider range […]

Accelerate Mixtral 8x7B pre-training with expert parallelism on Amazon SageMaker

Mixture of Experts (MoE) architectures for large language models (LLMs) have recently gained popularity due to their ability to increase model capacity and computational efficiency compared to fully dense models. By utilizing sparse expert subnetworks that process different subsets of tokens, MoE models can effectively increase the number of parameters while requiring less computation per […]

Solution architecture

Generating fashion product descriptions by fine-tuning a vision-language model with SageMaker and Amazon Bedrock

This post shows you how to predict domain-specific product attributes from product images by fine-tuning a VLM on a fashion dataset using Amazon SageMaker, and then using Amazon Bedrock to generate product descriptions using the predicted attributes as input. So you can follow along, we’re sharing the code in a GitHub repository.

Create a multimodal assistant with advanced RAG and Amazon Bedrock

In this post, we present a new approach named multimodal RAG (mmRAG) to tackle those existing limitations in greater detail. The solution intends to address these limitations for practical generative artificial intelligence (AI) assistant use cases. Additionally, we examine potential solutions to enhance the capabilities of large language models (LLMs) and visual language models (VLMs) with advanced LangChain capabilities, enabling them to generate more comprehensive, coherent, and accurate outputs while effectively handling multimodal data

How 20 Minutes empowers journalists and boosts audience engagement with generative AI on Amazon Bedrock

This post is co-written with Aurélien Capdecomme and Bertrand d’Aure from 20 Minutes. With 19 million monthly readers, 20 Minutes is a major player in the French media landscape. The media organization delivers useful, relevant, and accessible information to an audience that consists primarily of young and active urban readers. Every month, nearly 8.3 million 25–49-year-olds choose […]

Efficient and cost-effective multi-tenant LoRA serving with Amazon SageMaker

In this post, we explore a solution that addresses these challenges head-on using LoRA serving with Amazon SageMaker. By using the new performance optimizations of LoRA techniques in SageMaker large model inference (LMI) containers along with inference components, we demonstrate how organizations can efficiently manage and serve their growing portfolio of fine-tuned models, while optimizing costs and providing seamless performance for their customers. The latest SageMaker LMI container offers unmerged-LoRA inference, sped up with our LMI-Dist inference engine and OpenAI style chat schema. To learn more about LMI, refer to LMI Starting Guide, LMI handlers Inference API Schema, and Chat Completions API Schema.

Mixtral 8x22B is now available in Amazon SageMaker JumpStart

Today, we are excited to announce the Mixtral-8x22B large language model (LLM), developed by Mistral AI, is available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. You can try out this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms and models so you […]

Building Generative AI prompt chaining workflows with human in the loop

While Generative AI can create highly realistic content, including text, images, and videos, it can also generate outputs that appear plausible but are verifiably incorrect. Incorporating human judgment is crucial, especially in complex and high-risk decision-making scenarios. This involves building a human-in-the-loop process where humans play an active role in decision making alongside the AI system. In this blog post, you will learn about prompt chaining, how to break a complex task into multiple tasks to use prompt chaining with an LLM in a specific order, and how to involve a human to review the response generated by the LLM.

How LotteON built a personalized recommendation system using Amazon SageMaker and MLOps

This post is co-written with HyeKyung Yang, Jieun Lim, and SeungBum Shim from LotteON. LotteON aims to be a platform that not only sells products, but also provides a personalized recommendation experience tailored to your preferred lifestyle. LotteON operates various specialty stores, including fashion, beauty, luxury, and kids, and strives to provide a personalized shopping […]