AWS Database Blog

Multi-tenant vector search with Amazon Aurora PostgreSQL and Amazon Bedrock Knowledge Bases

This post is the continuation of the series on building multi-tenant vector stores with Amazon Aurora PostgreSQL-Compatible Edition. In Part 1, we explored a self-managed approach for building a multi-tenant vector search. The self-managed approach uses direct SQL queries and the RDS Data API to ingest and retrieve data from the vector store. It also enforces multi-tenant data isolation using built-in row-level security policies.

In this post, we discuss the fully managed approach using Amazon Bedrock Knowledge Bases to simplify the integration of the data source with your generative AI application using Aurora. Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case.

Solution overview

Consider a multi-tenant use case where users raise home survey requests for the property they are planning to buy. Home surveyors conduct a survey of the property and update their findings. The home survey report with the updated findings is stored in an Amazon Simple Storage Service (Amazon S3) bucket. The home survey company is now planning to provide a feature to allow their users to ask natural language questions about the property. Embedding models are used to convert the home survey document into vector embeddings. The vector embeddings of the document and the original document data are then ingested into a vector store. Finally, the Retrieval Augmented Generation (RAG) approach enhances the prompt to the large language model (LLM) with contextual data to generate a response back to the user.

Amazon Bedrock Knowledge Bases is a fully managed capability that helps you implement the entire RAG workflow, from ingestion to retrieval and prompt augmentation, without having to build custom integrations to data sources and manage data flows.

Part 1 explored the self-managed approach to directly handle the conversion of vector embeddings and writing your SQL queries to insert data into the table. An alternative option is the fully managed approach by using features like Amazon Bedrock Knowledge Bases, which offloads the complexities with a low-code approach. We demonstrate how to use this feature to manage the creation and persistence of the vector embeddings into an Aurora PostgreSQL vector store. An example implementation using Amazon Bedrock Knowledge Bases and an Amazon Aurora PostgreSQL-compatible vector store is shown in the following diagram.

High-level architecture diagram showing Bedrock Knowledge bases with Amazon Aurora as part of a RAG architecture

The high-level steps in this architecture are:

  1. Data is ingested from an S3 bucket through Amazon Bedrock Knowledge Bases.
  2. Amazon Bedrock Knowledge Bases calls an embeddings model in Amazon Bedrock to convert the documents to vector embeddings.
  3. The vector embeddings along with the data chunks and metadata are stored in Aurora with pgvector.
  4. A user asks a natural language query.
  5. The embeddings model configured in Amazon Bedrock Knowledge Bases converts the query to embeddings. This is the same embeddings model used for data ingestion.
  6. Amazon Bedrock Knowledge Bases runs a query against the vector store to retrieve similar documents.
  7. The matching documents are sent to an LLM in Amazon Bedrock to augment the response.
  8. The final response is returned to the user.

In the following sections, we walk through the steps to create a vector store; ingest, retrieve, and augment the vector data; and enforce multi-tenant data isolation.

Prerequisites

To follow along with the steps in this post, you need the following resources:

Additionally, clone the AWS samples repository for data-for-saas-patterns and move to the folder samples/multi-tenant-vector-database/amazon-aurora/aws-managed:

git clone https://github.com/aws-samples/data-for-saas-patterns.git
cd samples/multi-tenant-vector-database/amazon-aurora/aws-managed
Bash

Create a vector store with Amazon Aurora PostgreSQL-compatible

Start with configuring the Aurora PostgreSQL database to enable the pgvector extension and build the required schema for the vector store. These steps vary slightly between the self-managed and fully managed approaches, so you should use a different schema and table names. Run all the SQL commands from the 1_build_vector_db_on_aurora.sql script using psql or the Amazon RDS console query editor or any PostgreSQL query editor tool to build the vector store configuration.

  1. Create and verify the pgvector extension:
CREATE EXTENSION IF NOT EXISTS vector;

SELECT extversion FROM pg_extension WHERE extname='vector';
SQL
  1. Create a schema and vector table:
CREATE SCHEMA aws_managed;

CREATE TABLE aws_managed.kb (id uuid PRIMARY KEY, embedding vector(1024), chunks text, metadata jsonb, tenantid varchar(10));
SQL
  1. Create the index:
CREATE INDEX on aws_managed.kb USING hnsw (embedding vector_cosine_ops);
SQL
  1. Create a user and grant permissions:
CREATE ROLE bedrock_user LOGIN;
/password bedrock_user
GRANT ALL ON SCHEMA aws_managed to bedrock_user;
GRANT ALL ON TABLE aws_managed.kb to bedrock_user;
SQL

After you run the commands, the schema should contain the vector table and index:

\d aws_managed.kb;
                       Table "aws_managed.kb"
  Column   |         Type          | Collation | Nullable | Default 
-----------+-----------------------+-----------+----------+---------
 id        | uuid                  |           | not null | 
 embedding | vector(1024)          |           |          | 
 chunks    | text                  |           |          | 
 metadata  | jsonb                 |           |          | 
 tenantid  | character varying(10) |           |          | 
Indexes:
    "kb_pkey" PRIMARY KEY, btree (id)
    "kb_embedding_idx" hnsw (embedding vector_cosine_ops)
SQL

We use the following fields for the vector table:

  • id – The UUID field will be the primary key for the vector store.
  • embedding – The vector field that will be used to store the vector embeddings. The argument 1024 denotes the dimensions or the size of the vector used by the embeddings model. The Amazon Titan Embeddings V2 model supports flexible embeddings dimensions (1024, 512, 256).
  • chunks – A text field to store the raw text from your source data in chunks.
  • metadata – The JSON metadata field (stored using the jsonb data type), which is used to store source attribution, particularly when using the managed ingestion through Amazon Bedrock Knowledge Bases.
  • tenantid – This field is used to identify and tie the data and chunks to specific tenants in a software as a service (SaaS) multi-tenant pooled environment. We also use this field as the key for filtering the data during retrieval.

Ingest the vector data

To ingest the data, you need to create the knowledge base and configure the data source, embedding model, and the underlying vector store. You can create the knowledge base on the Amazon Bedrock console or using code. For creating through code, review the bedrock_knowledgebase_managed_rag.ipynb notebook, which provides all the required IAM policies and role and the step-by-step instructions.

After you configure the knowledge base with the data source and the vector store, you can start uploading the documents to an S3 bucket and ingest them into the vector store. Amazon Bedrock Knowledge Bases simplifies the data ingestion into the Aurora PostgreSQL-compatible vector store with a single API call:

upload_file_to_s3(
    "../multi_tenant_survey_reports/Home_Survey_Tenant1.pdf", bucket_name, object_name="multi_tenant_survey_reports/Home_Survey_Tenant1.pdf"
)

start_job_response = bedrock_agent_client.start_ingestion_job(
    knowledgeBaseId=kb_id, dataSourceId=ds_id
)
wait_for_ingestion(start_job_response)
Python

Wait for the ingestion job to complete, then query the Aurora PostgreSQL-compatible database to verify the vector embeddings are ingested and stored into the vector table. A few rows are created for the single document ingested depending on the number of chunks the document has been split into using the standard fixed size chunking strategy. Splitting the document into manageable chunks can impact the efficiency and quality of the data retrieval. To learn more about different chunking strategies, see How content chunking works for knowledge bases. You can verify the data has been stored in the vector store and the number of chunks by running the following SQL command:

SELECT count(*) FROM aws_managed.kb ;
 count 
-------
     2
(1 row)
SQL

Preparing for prompt augmentation with vector similarity search

We perform a vector similarity search in RAG to find the appropriate source data (in this case text chunks) to enhance the prompt with domain-specific data before sending the question to the LLM. The natural language question from the end-user is first converted into vector embeddings. Then it retrieves the vector data from the database that closely matches the input vector embedding.

Using the Amazon Bedrock Knowledge Bases APIs abstracts the retrieval mechanism based on the configured vector store and relieves you from writing the complex SQL-based search queries. The following code is an example showing the Retrieve API of Amazon Bedrock Knowledge Bases:

def retrieve(query, kbId, numberOfResults=5):
    response = bedrock_agent_runtime.retrieve(
        retrievalQuery={"text": query},
        knowledgeBaseId=kbId,
        retrievalConfiguration={
            "vectorSearchConfiguration": {"numberOfResults": numberOfResults}
        },
    )
    return response
Python

You can use the Retrieve API to retrieve the vector data chunks from the Tenant1 document based on any given natural language question from the end-user:

question = "What is the condition of the roof in my survey report ? "
response = retrieve(question, kb_id)
print(response)
Python
#Sample Output (trimmed) 
'retrievalResults': [{'content': {'text': 'Home Survey      Property Information    .....(trimmed)'}, 
'location': {'s3Location': {'uri': 's3://<bucket-name>/multi_tenant_survey_reports/Home_Survey_Tenant1.pdf'}, 'type': 'S3'}, 
'score': 0.6019703224377453}, {'content': {'text': 'Walls and Ceilings    Condition: ...... (trimmed)'}, 
'location': {'s3Location': {'uri': 's3://<bucket-name>/multi_tenant_survey_reports/Home_Survey_Tenant1.pdf'}, 'type': 'S3'}, 
'score': 0.48701057728935115}]}
Output

Building an augmented prompt

The next step is building the augmented prompt that will be sent to a foundation model (FM). In Amazon Bedrock, you can choose which foundation model to use. This example uses Anthropic Claude on Amazon Bedrock to generate an answer to the user’s question along with the augmented context. The data chunks retrieved from the vector store can be used to enhance the prompt with more contextual and domain-specific data before sending it to the FM. The following code is an example of how you can invoke the Anthropic’s Claude FM on Amazon Bedrock using the InvokeModel API:

# Function to invoke the FM
def generate_message(bedrock_runtime, model_id, system_prompt, messages, max_tokens):
    body=json.dumps(
        {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": max_tokens,
            "system": system_prompt,
            "messages": messages
        }  
    )  
    response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
    response_body = json.loads(response.get('body').read())
    return response_body

def invoke_llm_with_rag(messages):
    model_id = 'anthropic.claude-3-sonnet-20240229-v1:0'
    response = generate_message(bedrock_runtime, model_id, "", messages, 300)
    return response
Python

When using Amazon Bedrock Knowledge Bases, you need to invoke the Retrieve API, passing the user question and knowledge base ID. Amazon Bedrock Knowledge Bases generates the vector embeddings and queries the vector store to retrieve the closest and related embeddings. The following code is an example of how to frame the prompt and augment it with the domain data retrieved from the vector store:

question = "What is condition of the roof in my survey report?"
response = retrieve(question, kb_id)
contexts = get_contexts(response['retrievalResults'])

prompt = f"""
Human: Use the following pieces of context to provide a concise answer to the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
<context>
{contexts}
</context
Question: {question}
Assistant:
"""

messages=[{ "role":'user', "content":[{'type':'text','text': prompt.format(contexts, question)}]}]
llm_response = invoke_llm_with_rag(messages)
print(llm_response['content'][0]['text'])
Python
#Sample output
Based on the home survey report, the roof is in good condition. The notes state: "A few missing shingles on the southwest corner."
Output

Enforce multi-tenant data isolation

To explore multi-tenancy and data isolation, you can onboard a few more tenants by uploading their documents into the knowledge base data source. You need to ingest these new documents and wait for the ingestion to complete. See the following code:

upload_file_to_s3("../multi_tenant_survey_reports/Home_Survey_Tenant2.pdf", bucket_name, object_name="multi_tenant_survey_reports/Home_Survey_Tenant2.pdf")
...
...
upload_file_to_s3("../multi_tenant_survey_reports/Home_Survey_Tenant5.pdf", bucket_name, object_name="multi_tenant_survey_reports/Home_Survey_Tenant5.pdf")


start_job_response = bedrock_agent_client.start_ingestion_job(
    knowledgeBaseId=kb_id, dataSourceId=ds_id
)
wait_for_ingestion(start_job_response)
Python

After all the new documents are ingested, retrieve the vector data using the same natural language question as in the previous example. This will result in retrieving the data chunks from all the documents because the data is pooled into a single table for all tenants. You can observe from the sample output of the Retrieve API that the retrieved data contains data chunks from multiple tenant documents:

question = "What is the condition of the roof in my survey report ? "
response = retrieve(question, kb_id)
print(response)
Python
#Sample Output (trimmed):
'retrievalResults': [{'content': {'text': 'Home Survey      Property Information.........(masked)'}, 
'location': {'s3Location': {'uri': 's3://<bucket-name>/multi_tenant_survey_reports/Home_Survey_Tenant1.pdf'}, 'type': 'S3'}, 
'score': 0.6003477504255206}, {'content': {'text': 'Walls and Ceilings......(trimmed)'}, 
'location': {'s3Location': {'uri': 's3://<bucket-name>/multi_tenant_survey_reports/Home_Survey_Tenant5.pdf'}, 'type': 'S3'}, 
'score': 0.5328443502932442}, {'content': {'text': 'Flooring    Types: .....(trimmed)'}, 
'location': {'s3Location': {'uri': 's3://<bucket-name>/multi_tenant_survey_reports/Home_Survey_Tenant4.pdf'}, 'type': 'S3'}, 
'score': 0.4506853735594597}]
Output

Tenant data isolation is critical in a multi-tenant SaaS deployment, and you need a way to enforce isolation such that the Retrieve API is tenant-aware to retrieve only tenant-scoped data from the vector store. Amazon Bedrock Knowledge Bases supports a feature called metadata and filtering. You can use this feature to implement tenant data isolation with vector stores. To enable the filters, first tag all the documents in the data source with their respective tenant metadata. For each document, add a metadata.json document with its respective tenantid metadata tag, like in the following code:

{
    "metadataAttributes" : { 
        "tenantid" : "Tenant1" }
}
JSON

When the tagging is complete for all the documents, you can upload the metadata.json file into the S3 bucket and ingest these metadata files into the knowledge base:

# Upload metadata tagging to each tenants document

upload_file_to_s3(
    "../metadata_tags/Home_Survey_Tenant1.pdf.metadata.json",
    bucket_name,
    "multi_tenant_survey_reports/Home_Survey_Tenant1.pdf.metadata.json",
)
...
...
...
upload_file_to_s3(
    "../metadata_tags/Home_Survey_Tenant5.pdf.metadata.json",
    bucket_name,
    "multi_tenant_survey_reports/Home_Survey_Tenant5.pdf.metadata.json",
)

# Ingest metadata json from into the knowledgebase 
start_job_response = bedrock_agent_client.start_ingestion_job(
    knowledgeBaseId=kb_id, dataSourceId=ds_id
)
Python

Next, update the retrieve function to add filtering based on the tenantid tag. During retrieval, a filter configuration uses the tenantid to make sure that only tenant-specific data chunks are retrieved from the underlying vector store of the knowledge base. The following code is the updated retrieve function with metadata filtering enabled:

# Function to retrieve chunks from vector store through KB
def retrieve_with_filters(query, kbId, tenantId, numberOfResults=5):
    tenant_filter = {"equals": {"key": "tenantId", "value": tenantId}}
    response = bedrock_agent_runtime.retrieve(
        retrievalQuery={"text": query},
        knowledgeBaseId=kbId,
        retrievalConfiguration={
            "vectorSearchConfiguration": {
                "numberOfResults": numberOfResults,
                "filter": tenant_filter,
            }
        },
    )
    return response
Python

Finally, you can ask the same question and retrieve the tenant-specific document chunks using the knowledge base metadata and filtering feature. The output will be only from the document chunks specific to the tenantid passed as the filter key value:

question = "What is the condition of the roof in my survey report  ? "
response = retrieve_with_filters(question, kb_id, "Tenant3")
Python
Sample Output (trimmed): 
'retrievalResults': [{'content': {'text': 'Home Survey      Property Information      .....(trimmed)'}, 
'location': {'s3Location': {'uri': 's3://<bucket-name>/multi_tenant_survey_reports/Home_Survey_Tenant3.pdf'}, 'type': 'S3'}, 
'metadata': {'tenantid': 'Tenant3'}, 'score': 0.6927450177600427}, {'content': {'text': 'Walls and Ceilings      Condition: ......(trimmed)}, 
'location': {'s3Location': {'uri': 's3://<bucket-name>/multi_tenant_survey_reports/Home_Survey_Tenant3.pdf'}, 'type': 'S3'}, 
'metadata': {'tenantid': 'Tenant3'}, 'score': 0.5551651590990989}]}
Output

Best practices for multi-tenant vector store deployments

There are several factors like performance, indexing strategies, and semantic search capabilities that need to be considered when choosing a vector store for generative AI applications. For more information, refer to Key considerations when choosing a database for your generative AI applications.

When deploying a multi-tenant vector store solution in production, consider the following best practices for scaling:

The following are best practices for performance:

  • Optimize the chunk size and strategy for your specific use case. Smaller chunk sizes are suitable for smaller documents or where some loss of context is acceptable, for example, in simple Q&A use-cases. Larger chunks maintain a longer context where the results for larger documents should be very specific, but these can exceed the context length of the model and increase cost. For more details, see A practitioner’s guide to data for Generative AI.
  • Test and validate an appropriate embedding model for your use case. The embedding model’s characteristics (including its dimensions) impact both query performance and search quality. Different embedding models have different recall rates, with some smaller models potentially performing better than larger ones.
  • For highly selective queries that filter out most of the results, consider using a B-tree index (such as on the tenantid attribute) to guide the query planner. For low-selectivity queries, consider an approximate index such as HNSW. For more details on configuring HNSW indexes, see Best practices for querying vector data for gen AI apps in PostgreSQL.
  • Use Amazon Aurora Optimized Reads to improve query performance when your vector workload exceeds available memory. For more details, see Improve the performance of generative AI workloads on Amazon Aurora with Optimized Reads and pgvector.
  • Monitor and optimize query performance using PostgreSQL query plans. For more details, see Monitor query plans for Amazon Aurora PostgreSQL.

In general, fully-managed features help to remove undifferentiated work, such as pipeline management, so you can focus on building the features that will delight your customers.

Clean up

To avoid incurring future charges, delete all the resources created in the prerequisites section and the knowledge bases you created.

Conclusion

In this post, we showed you how to work with an Aurora vector store using Amazon Bedrock Knowledge Bases and how to enforce tenant isolation with metadata filtering. In Part 1 of this series, we showed how you can do this with a self-managed ingestion pipeline. Tenant data isolation is important when operating a pooled data model of the vector store for your SaaS multi-tenant generative application. Now you have the choice to adopt either of these approaches depending on your specific requirements.

We invite you to try the self-managed and fully managed approaches to build a multi-tenant vector store and leave your feedback in the comments.


About the Authors

Josh HartJosh Hart is a Principal Solutions Architect at AWS. He works with ISV customers in the UK to help them build and modernize their SaaS applications on AWS.

NihilsonNihilson Gnanadason is a Senior Solutions Architect at AWS. He works with ISVs in the UK to build, run, and scale their software products on AWS.