Simplify your RAG implementation using foundational models

Learn how to deploy production-ready vector storage capabilities, using tried-and-tested components already available in the tech stack of most organizations

New capabilities without added complexity

In this article we’re going to look at machine learning (ML) as a new source of system complexity and cognitive load to developers, and how those challenges can be solved by using familiar and reliable components likely already present in most enterprise IT organizations.

We will focus on the core components of a very common ML use case: large language models (LLMs) enriched using the Retrieval-Augmented Generation (RAG) technique. Then we’ll see how the requirements that drive selecting infrastructure components to satisfy these new use cases can be accomplished without introducing new tools, more complexity, and additional cognitive load to developers, all while satisfying the expectations of a production-ready system running fully in the cloud.

Simplify your RAG implementation on AWS

ML and growing infrastructure complexity

Complexity has been a consistent and mounting concern for most IT organizations: complex systems are harder to operate, more costly to maintain, and more prone to failure.

Nevertheless, as new patterns for designing systems are adopted, prioritizing development velocity and leading-edge capabilities over complexity has usually been the norm.

The cloud has helped by eliminating a good percentage of the undifferentiated heavy lifting required to manage some of the many pieces of infrastructure required by the majority of modern software systems. But the problem of cognitive load to developers and operations teams that now have to work with a growing number of tools and learn a seemingly endless array of technologies has not been solved.

ML is now adding yet another layer of complexity to the environments that developers and infrastructure engineers have to support, introducing the need for specialized computing capabilities, as well as a new set of requirements related to data processing and storage.

We will drill down on some of the challenges related to tooling and infrastructure for a specific type of data: vectors.

What’s the deal with vector storage?

Vectors are numerical representations of any other type of data—generating, storing, and querying them are key in the implementation of applications built around the technique known as RAG.

RAG has turned into arguably the fastest and simplest way to provide custom responses using proprietary data, without the need to directly manipulate an LLM, whether by fine tuning, training, or other mechanisms that are dramatically more expensive and time consuming.

Vectors are the data type used for storing custom knowledge that can be queried and integrated into the responses generated by LLMs, providing more accurate responses, thanks to this additional context.

Representing data as vectors requires (at least) the following two components:

1. An embedding model
2. A database capable of storing, indexing, and querying vector data

That means that most teams today are having to figure out how to add these new moving pieces to their infrastructure while satisfying the stringent demands of production grade systems, which in turn represents more surface to support, more services to maintain, and more tooling to learn.

My personal journey with vectors and LLMs

As I was starting to play around with RAG and LLMs—as most of my peers were—the need for vector storage became evident, which drew me into the rabbit hole of finding the right new specialized database to add to my stack. The database needed to work effortlessly with the groundbreaking LLM concept I was building (Spoiler: it wasn’t as groundbreaking as I first thought).

Soon I realized that when using frameworks like LlamaIndex or LangChain, integrating with pretty much any vector database was pretty trivial; usually a handful of lines of code needed changing for me to be able to connect to yet another one of the vector databases I was trying out.

And as soon as I realized that it also dawned on me that I was spending an inordinate amount of time figuring out how to monitor, operate, query and, in general, interact with every new database I was trying out, compared to the time it actually took me to write the code to get my prototype off the ground.

This is when I concluded that maybe I didn’t have to look away from my existing stack to solve that problem. And that’s when I learned that Redis was actually able to work with vector data!

Old dog, new tricks

I have been a Redis user for more years than I can remember, which means I’m very familiar with its query syntax. I have a solid development environment already set up. I even have some handy Visual Studio Code plugins in place to help me quickly look at collections and query data.

Redis Query Data

But I must admit, the role of Redis pretty much anywhere I’ve used it throughout the years has been as key-value store, usually working as cache, sometimes as queue, and in general where data was ephemeral, and query response time was the key attribute. And, needless to say, it has served that purpose remarkably well, and not just for me but for many other enterprise architects that make the same choice for their own architectures.

It came as a surprise to me that Redis, my trusty old friend, was able to handle vector data and do it with the same remarkable efficiency and reliability I knew for my other use cases. And I could interact with this wholly new type of data with the same tooling I was already very accustomed to and without having to learn a new syntax, install new libraries, or do anything else. I was basically ready to go.

And then there was Amazon Bedrock

So Redis came to solve one of my problems: storing vectors without having to worry about increasing the complexity of my tech stack. But what about generating embeddings and integrating my vectorized data as context for my LLM to use?

In my mind, that would also require new pieces of machinery—something that pulled the data from Amazon Simple Storage Service (Amazon S3), where I had it stored, compute to run the embedding model, and the whole glue that would tie that together in a way that would allow me to keep my data vectors up to date with changing data in Amazon S3.

And this is where Amazon Bedrock, with its incredibly powerful agents and the concept of Knowledge Bases for Amazon Bedrock, became the solution to my complexity problem.

Knowledge for your LLM with a handful of clicks

Let’s dive into the architecture of the overall solution first:

Redis Cloud on AWS Reference Architecture

I’m still using LangChain in my containerized application running in Amazon Elastic Kubernetes Service (Amazon EKS), since I need to establish a WebSocket connection as well as handle the business logic and requests between the user and the ML model.

LangChain: Prompt input to generated output

The magic starts to happen when you look further to the right.

First off, I’m using Guardrails for Amazon Bedrock. Any production-ready LLM implementation must have a way to provide safety and security to users, which means everything from filtering harmful and undesirable topics to protecting personal and sensitive information. All these capabilities are available right off the shelf with Guardrails for Amazon Bedrock, which I was able to apply to any foundational LLM I chose to use.

Amazon Bedrock provides instant access to most of the more relevant foundational models (FMs) available and allows incredibly quick yet powerful integrations using Agents for Amazon Bedrock and Knowledge Bases for Amazon Bedrock.

With these two capabilities I was able to fully automate the process of consuming data from Amazon S3, connect it to my trusty Redis, and have my model of choice use that data for RAG context!

And I was able to do all this without adding a single additional component to my tech stack outside of Amazon Bedrock, which I already needed anyway to run the model itself.

Redis as vector storage

As I mentioned before, getting Redis as vector storage up and running was just as simple as using Redis for any other use case. That becomes even simpler when running Redis Cloud, available in AWS Marketplace, since it will set up all the stuff you need for Redis in your AWS account and VPC.

Getting started with Redis Cloud on AWS Marketplace: select cloud vendor.

After setting up your Redis Cloud account, you can then go to Amazon Bedrock and configure your new Knowledge Bases for Amazon Bedrock. Select Redis Cloud in the vector database section, and then use Knowledge Bases for Amazon Bedrock to connect it to your LLM using Agents for Amazon Bedrock.

Production-ready performance

The performance you’ve grown accustomed to expect from Redis for more “traditional” use cases, which is usually one of the deciding factors in picking the tool, is consistent with the performance you can expect when using it for its vector search capabilities.

Of course, this is all for scenarios where the dataset can fit in-memory, which is a key characteristic that makes Redis really stand out.

The bottom line

What do you get with this setup? A fully managed set of infrastructure components for all the moving pieces of your RAG implementation using FMs, all while leveraging a tool that you are already familiar with and is likely present in the stack your organization already uses. And one that you are fully setup for effective development.

Getting production-ready LLM deployments using the RAG technique could not be easier, and using Redis Cloud, which you can acquire in AWS Marketplace, makes the process even smoother.

Be on the lookout for our upcoming lab where I’ll show you, step by step, how to actually build this exact solution!

Why AWS Marketplace?

Try SaaS products free with your AWS account to establish your proof-of-concept then pay-as-you-go in production with AWS Billing.

AWS Marketplace and Salesforce: Data Cloud

Quickly go from POC to production - access free trials using your AWS account, then pay as you go.

AWS Marketplace and Salesforce: Service Cloud

Add capabilities to your tech stack using fast procurement and deployment, with flexible pricing and standardized licensing.

AWS Marketplace and Salesforce: Sales Cloud

Consolidate and optimize costs for your cloud infrastructure and third-party software, all centrally managed with AWS.