AWS Machine Learning Blog

Category: Compute

Deploy Meta Llama 3.1-8B on AWS Inferentia using Amazon EKS and vLLM

In this post, we walk through the steps to deploy the Meta Llama 3.1-8B model on Inferentia 2 instances using Amazon EKS. This solution combines the exceptional performance and cost-effectiveness of Inferentia 2 chips with the robust and flexible landscape of Amazon EKS. Inferentia 2 chips deliver high throughput and low latency inference, ideal for LLMs.

How Crexi achieved ML models deployment on AWS at scale and boosted efficiency

Commercial Real Estate Exchange, Inc. (Crexi), is a digital marketplace and platform designed to streamline commercial real estate transactions. In this post, we will review how Crexi achieved its business needs and developed a versatile and powerful framework for AI/ML pipeline creation and deployment. This customizable and scalable solution allows its ML models to be efficiently deployed and managed to meet diverse project requirements.

Solution Architecture Diagram

Generate and evaluate images in Amazon Bedrock with Amazon Titan Image Generator G1 v2 and Anthropic Claude 3.5 Sonnet

In this post, we demonstrate how to interact with the Amazon Titan Image Generator G1 v2 model on Amazon Bedrock to generate an image. Then, we show you how to use Anthropic’s Claude 3.5 Sonnet on Amazon Bedrock to describe it, evaluate it with a score from 1–10, explain the reason behind the given score, and suggest improvements to the image.

High-level design of the solution

Create a generative AI–powered custom Google Chat application using Amazon Bedrock

AWS offers powerful generative AI services, including Amazon Bedrock, which allows organizations to create tailored use cases such as AI chat-based assistants that give answers based on knowledge contained in the customers’ documents, and much more. Many businesses want to integrate these cutting-edge AI capabilities with their existing collaboration tools, such as Google Chat, to […]

Automate Amazon Bedrock batch inference: Building a scalable and efficient pipeline

Although batch inference offers numerous benefits, it’s limited to 10 batch inference jobs submitted per model per Region. To address this consideration and enhance your use of batch inference, we’ve developed a scalable solution using AWS Lambda and Amazon DynamoDB. This post guides you through implementing a queue management system that automatically monitors available job slots and submits new jobs as slots become available.

Deploy a serverless web application to edit images using Amazon Bedrock

In this post, we explore a sample solution that you can use to deploy an image editing application by using AWS serverless services and generative AI services. We use Amazon Bedrock and an Amazon Titan FM that allow you to edit images by using prompts.

Create a multimodal chatbot tailored to your unique dataset with Amazon Bedrock FMs

Create a multimodal chatbot tailored to your unique dataset with Amazon Bedrock FMs

In this post, we show how to create a multimodal chat assistant on Amazon Web Services (AWS) using Amazon Bedrock models, where users can submit images and questions, and text responses will be sourced from a closed set of proprietary documents.

Improve employee productivity using generative AI with Amazon Bedrock

Improve employee productivity using generative AI with Amazon Bedrock

In this post, we show you the Employee Productivity GenAI Assistant Example, a solution built on AWS technologies like Amazon Bedrock, to automate writing tasks and enhance employee productivity.

Accelerate pre-training of Mistral’s Mathstral model with highly resilient clusters on Amazon SageMaker HyperPod

In this post, we present to you an in-depth guide to starting a continual pre-training job using PyTorch Fully Sharded Data Parallel (FSDP) for Mistral AI’s Mathstral model with SageMaker HyperPod.

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

In this post, we show you how Zeta Global, a data-driven marketing technology company, has built an efficient MLOps platform to streamline the end-to-end ML workflow, from data ingestion to model deployment, while optimizing resource utilization and cost efficiency.