AWS Machine Learning Blog

Category: Compute

Build a FinOps agent using Amazon Bedrock with multi-agent capability and Amazon Nova as the foundation model

Build a FinOps agent using Amazon Bedrock with multi-agent capability and Amazon Nova as the foundation model

In this post, we use the multi-agent feature of Amazon Bedrock to demonstrate a powerful and innovative approach to AWS cost management. By using the advanced capabilities of Amazon Nova FMs, we’ve developed a solution that showcases how AI-driven agents can revolutionize the way organizations analyze, optimize, and manage their AWS costs.

Architecture Diagram

Automate Amazon EKS troubleshooting using an Amazon Bedrock agentic workflow

In this post, we demonstrate how to orchestrate multiple Amazon Bedrock agents to create a sophisticated Amazon EKS troubleshooting system. By enabling collaboration between specialized agents—deriving insights from K8sGPT and performing actions through the ArgoCD framework—you can build a comprehensive automation that identifies, analyzes, and resolves cluster issues with minimal human intervention.

Host concurrent LLMs with LoRAX

In this post, we explore how Low-Rank Adaptation (LoRA) can be used to address these challenges effectively. Specifically, we discuss using LoRA serving with LoRA eXchange (LoRAX) and Amazon Elastic Compute Cloud (Amazon EC2) GPU instances, allowing organizations to efficiently manage and serve their growing portfolio of fine-tuned models, optimize costs, and provide seamless performance for their customers.

Build a computer vision-based asset inventory application with low or no training

In this post, we present a solution using generative AI and large language models (LLMs) to alleviate the time-consuming and labor-intensive tasks required to build a computer vision application, enabling you to immediately start taking pictures of your asset labels and extract the necessary information to update the inventory using AWS services

Optimizing Mixtral 8x7B on Amazon SageMaker with AWS Inferentia2

This post demonstrates how to deploy and serve the Mixtral 8x7B language model on AWS Inferentia2 instances for cost-effective, high-performance inference. We’ll walk through model compilation using Hugging Face Optimum Neuron, which provides a set of tools enabling straightforward model loading, training, and inference, and the Text Generation Inference (TGI) Container, which has the toolkit for deploying and serving LLMs with Hugging Face.

Streamline AWS resource troubleshooting with Amazon Bedrock Agents and AWS Support Automation Workflows

AWS provides a powerful tool called AWS Support Automation Workflows, which is a collection of curated AWS Systems Manager self-service automation runbooks. These runbooks are created by AWS Support Engineering with best practices learned from solving customer issues. They enable AWS customers to troubleshoot, diagnose, and remediate common issues with their AWS resources. In this post, we explore how to use the power of Amazon Bedrock Agents and AWS Support Automation Workflows to create an intelligent agent capable of troubleshooting issues with AWS resources.

Integrate generative AI capabilities into Microsoft Office using Amazon Bedrock

In this blog post, we showcase a powerful solution that seamlessly integrates AWS generative AI capabilities in the form of large language models (LLMs) based on Amazon Bedrock into the Office experience. By harnessing the latest advancements in generative AI, we empower employees to unlock new levels of efficiency and creativity within the tools they already use every day.

Unleash AI innovation with Amazon SageMaker HyperPod

In this post, we show how SageMaker HyperPod, and its new features introduced at AWS re:Invent 2024, is designed to meet the demands of modern AI workloads, offering a persistent and optimized cluster tailored for distributed training and accelerated inference at cloud scale and attractive price-performance.

Reduce conversational AI response time through inference at the edge with AWS Local Zones

This guide demonstrates how to deploy an open source foundation model from Hugging Face on Amazon EC2 instances across three locations: a commercial AWS Region and two AWS Local Zones. Through comparative benchmarking tests, we illustrate how deploying foundation models in Local Zones closer to end users can significantly reduce latency—a critical factor for real-time applications such as conversational AI assistants.

Optimizing AI implementation costs with Automat-it

In this guest post, we explain how AWS Partner Automat-it helped their customer achieve a more than twelvefold cost savings while keeping AI model performance within the required performance thresholds. This was accomplished through careful tuning of architecture, algorithm selection, and infrastructure management.