AWS Machine Learning Blog

Category: Advanced (300)

Build an internal SaaS service with cost and usage tracking for foundation models on Amazon Bedrock

In this post, we show you how to build an internal SaaS layer to access foundation models with Amazon Bedrock in a multi-tenant (team) architecture. We specifically focus on usage and cost tracking per tenant and also controls such as usage throttling per tenant. We describe how the solution and Amazon Bedrock consumption plans map to the general SaaS journey framework. The code for the solution and an AWS Cloud Development Kit (AWS CDK) template is available in the GitHub repository.

Automate mortgage document fraud detection using an ML model and business-defined rules with Amazon Fraud Detector: Part 3

In the first post of this three-part series, we presented a solution that demonstrates how you can automate detecting document tampering and fraud at scale using AWS AI and machine learning (ML) services for a mortgage underwriting use case. In the second post, we discussed an approach to develop a deep learning-based computer vision model […]

Integrate QnABot on AWS with ServiceNow

Do your employees wait for hours on the telephone to open an IT ticket? Do they wait for an agent to triage an issue, which sometimes only requires restarting the computer? Providing excellent IT support is crucial for any organization, but legacy systems have relied heavily on human agents being available to intake reports and […]

Monitor embedding drift for LLMs deployed from Amazon SageMaker JumpStart

One of the most useful application patterns for generative AI workloads is Retrieval Augmented Generation (RAG). In the RAG pattern, we find pieces of reference content related to an input prompt by performing similarity searches on embeddings. Embeddings capture the information content in bodies of text, allowing natural language processing (NLP) models to work with […]

Build a movie chatbot for TV/OTT platforms using Retrieval Augmented Generation in Amazon Bedrock

In this post, we show you how to securely create a movie chatbot by implementing RAG with your own data using Knowledge Bases for Amazon Bedrock. We use the IMDb and Box Office Mojo dataset to simulate a catalog for media and entertainment customers and showcase how you can build your own RAG solution in just a couple of steps.

Train and host a computer vision model for tampering detection on Amazon SageMaker: Part 2

In the first part of this three-part series, we presented a solution that demonstrates how you can automate detecting document tampering and fraud at scale using AWS AI and machine learning (ML) services for a mortgage underwriting use case. In this post, we present an approach to develop a deep learning-based computer vision model to […]

Talk to your slide deck using multimodal foundation models hosted on Amazon Bedrock and Amazon SageMaker – Part 1

With the advent of generative AI, today’s foundation models (FMs), such as the large language models (LLMs) Claude 2 and Llama 2, can perform a range of generative tasks such as question answering, summarization, and content creation on text data. However, real-world data exists in multiple modalities, such as text, images, video, and audio. Take […]

Benchmark and optimize endpoint deployment in Amazon SageMaker JumpStart 

When deploying a large language model (LLM), machine learning (ML) practitioners typically care about two measurements for model serving performance: latency, defined by the time it takes to generate a single token, and throughput, defined by the number of tokens generated per second. Although a single request to the deployed endpoint would exhibit a throughput […]

Build enterprise-ready generative AI solutions with Cohere foundation models in Amazon Bedrock and Weaviate vector database on AWS Marketplace

This post discusses how enterprises can build accurate, transparent, and secure generative AI applications while keeping full control over proprietary data. The proposed solution is a RAG pipeline using an AI-native technology stack, whose components are designed from the ground up with AI at their core, rather than having AI capabilities added as an afterthought. We demonstrate how to build an end-to-end RAG application using Cohere’s language models through Amazon Bedrock and a Weaviate vector database on AWS Marketplace.

Reduce inference time for BERT models using neural architecture search and SageMaker Automated Model Tuning

In this post, we demonstrate how to use neural architecture search (NAS) based structural pruning to compress a fine-tuned BERT model to improve model performance and reduce inference times. Pre-trained language models (PLMs) are undergoing rapid commercial and enterprise adoption in the areas of productivity tools, customer service, search and recommendations, business process automation, and […]