Accelerate generative AI use cases with Amazon Bedrock and Oracle Database@AWS

Oracle Database version 26ai enables the development of generative AI applications such as voice assistants, chat assistants, language translators, recommendation systems, anomaly detection, or video search and recognition by providing the ability to store vector embeddings and query data based on semantics rather than keywords. By using AI Vector Search in Oracle AI Database 26ai, you can build Retrieval Augmented Generation (RAG) applications with important context without having to retrain the large language model (LLM). The context is stored, searched, and retrieved from Oracle AI Database 26ai and passed to the LLM to generate accurate, up-to-date, and targeted responses to your prompts. Customers can use RAG with AI Vector Search in Oracle AI Database 26ai and an LLM to securely respond to important business questions or generate content for many use cases using private, internal business information.

Oracle AI Database 26ai supports VECTOR datatype, providing the foundation to store vector embeddings alongside business data in the database. With embedding models, you can transform unstructured data into vector embeddings that can then be used for semantic queries on business data. You can generate vector embeddings outside the Oracle Database using pre-trained embedding models hosted in Amazon Bedrock, open source embedding models, or your own embeddings models. Amazon provides various options for vector databases like Amazon OpenSearch Service, Amazon Aurora PostgreSQL-Compatible Edition, Amazon DocumentDB, and others to choose from depending on your use case and requirements. In this post, we walk through the steps of integrating Oracle Database@AWS (ODB@AWS) with Amazon Bedrock for by creating a RAG assistant application using an Amazon Titan embedding model in Amazon Bedrock and vectors stored in Oracle AI Database 26ai.

ODB@AWS enables you to access Oracle Exadata infrastructure managed by Oracle Cloud Infrastructure (OCI) within AWS data centers. You can use it to migrate your Oracle Exadata workloads to AWS while maintaining the same performance and features as your on-premises Exadata deployments. ODB@AWS supports both Oracle Database 19c and 26ai database versions in Exadata Database Service or Autonomous Database Service on Dedicated Infrastructure. It enables integration with various AWS services, like Zero-ETL integration with Amazon Redshift, Amazon S3 integration for backups, Amazon CloudWatch for monitoring, Amazon Bedrock, Amazon SageMaker AI, and other generative AI services to build generative AI applications.

Overview of Oracle AI Database 26ai vector capabilities

Oracle AI Database 26ai provides vector capabilities through its AI Vector Search feature, enabling AI workloads to run on ODB@AWS. Key capabilities include:

Native VECTOR data type and operations

Oracle AI Database 26ai introduces a new VECTOR data type, allowing for the native storage of high-dimensional vectors directly in table columns. You can store, query, and perform analytics operations on vector data within a single, unified database environment. It integrates AI Vector Search capabilities natively, enabling similarity searches, hybrid queries that combine vector and relational data, and advanced analytics without requiring external services.

This allows vectors to coexist with unstructured, semi-structured, and structured data, including documents, PDFs, and images stored in SecureFiles LOBs, JSON data for metadata and context, and graph and spatial data for advanced analytics and AI-driven recommendations. This reduces complexity and improves security while enabling hybrid queries that combine semantic search with business context.

Flexible vector generation

Oracle provides flexibility to integrate with external embedding models and other RAG frameworks such as LangChain and LlamaIndex through Python or REST APIs. It also enables import of pre-trained embedding models directly into the database in ONNX format, allowing vector embeddings to be generated within the database environment. But this adds to database performance overhead and needs constant update and maintenance of the models.

AI Smart Scan (Exadata-optimized)

Because Oracle Database@AWS offering runs on Exadata, it uses AI Smart Scan to query vector data, which offloads vector operations to Exadata storage servers. It optimizes bandwidth, reduces CPU usage on the database tier, and is ideal for high-scale artificial intelligence and machine learning (AI/ML) applications such as real-time semantic search and recommendation engines.

Dedicated memory allocation

Oracle allocates a specialized memory area called vector_pool to store Hierarchical Navigable Small World (HNSW) vector indexes and associated metadata, separate from the database buffer cache. It is also used to speed up Inverted File Flat (IVF) index creation as well as DML operations on base tables with IVF indexes. This feature enables predictable performance for AI-driven workloads without impacting transactional operations.

Simplified development and data management

Developers can work with both relational and vector data within a single environment using standard SQL, reducing the learning curve and streamlining application development. It allows vector operations using PL/SQL packages. By integrating vector capabilities into a general-purpose database, Oracle AI Database 26ai uses existing enterprise-grade features like security, high availability (such as RAC), partitioning, sharding, and disaster recovery.

External table support for vector data

External table support for vectors in Oracle AI Database 26ai makes it possible to store and query embeddings directly from external files for better cost and flexibility. It’s useful for quickly exploring or staging vector data, running similarity searches or hybrid queries that mix relational and vector information, and performing proof-of-concept analysis on large external datasets before committing to full ingestion. You can have your external tables with vector data created on Amazon S3 and run your sematic search against them.

Solution overview

In this post, we demonstrate on how to build a RAG AI assistant application using Oracle AI Database 26ai as a vector data store, LangChain framework, Amazon Titan embedding model, and Anthropic’s Claude LLM model hosted in Amazon Bedrock.

In the following diagram, we use Oracle AI Database 26ai as the vector store deployed in ODB@AWS. The AI chat assistant application is deployed on an Amazon Elastic Compute Cloud (Amazon EC2) instance in a virtual private cloud (VPC) peered with an ODB network using ODB peering. An ODB network is a private, isolated network that hosts Oracle infrastructure within an AWS Availability Zone. Unlike a standard VPC, an ODB network lacks internet connectivity and supports only ODB@AWS resources. ODB peering establishes private network connectivity between your ODB network and an Amazon VPC, enabling applications to communicate with Oracle databases as if they were on the same network. ODB peering bridges AWS and Oracle environments and offers support for a routing capability that enables traffic from specific AWS services connected to your peered VPC to reach the ODB network. Amazon Titan embedding model from Amazon Bedrock is used to create the vector embeddings. Anthropic’s Claude LLM is invoked from Amazon Bedrock for semantic search. Access to Amazon Bedrock is granted to the EC2 instance through its associated AWS Identity and Access Management (IAM) role.

An end-to-end RAG workflow for our demo consists of the following high-level steps:

Data Ingestion (PDF in this demo): Ingest PDFs into the app, which reads them and extract their text content.
Text Chunking: Extracted text is divided into smaller chunks that can be processed effectively. Chunking the text is important for retrieval quality and rate-limit safety.
Generate Embeddings: Applications uses Amazon Titan Text Embeddings v2 model from Amazon Bedrock to generate embeddings which are vector representations of the text chunks.
Vector store: Store vectors and metadata in Oracle AI Database 26ai with Oracle Database@AWS (Oracle AI Vector Search).
User Question: A user types a question in natural language in the chatbot.
Similarity Matching: When the user asks a question, the app compares it with the text chunks and identifies the most semantically similar ones.
RAG: Run retrieval augmented Q&A with Anthropic’s Claude 3 Sonnet model on Amazon Bedrock through an application built on the Streamlit UI.
Response: The LLM generates a response based on the relevant content of the PDFs.

For the complete code, refer to the GitHub repo.

Prerequisites

To implement this solution, we have completed following pre requisites and created required resources:

An AWS account with Amazon Bedrock access in your AWS Region. Amazon Bedrock requires you to request access to its foundation models (FMs) before you can start invoking the model using Amazon Bedrock APIs. You must configure model access in Amazon Bedrock in order to build and run generative AI applications. Amazon Bedrock provides a variety of FMs from several providers, such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon.
An Oracle AI Database 26ai installed on Oracle Database@AWS, which is used as the vector store. In this demo architecture we have 26ai deployed on Exadata Database service in ODB@AWS
An EC2 instance deployed in a VPC that is ODB peered with an ODB network that hosts the Oracle Database@AWS service. Make sure the EC2 instance used as the client here has network connectivity to Oracle Database in ODB@AWS.
An IAM role attached to the EC2 instance with policies granting access to Amazon Bedrock models and SageMaker AI services. See the following example policy. Refer to GitHub for the full policy JSON document.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowBedrockInvoke",
            "Effect": "Allow",
            "Action": [
                "bedrock:InvokeModel",
                "bedrock:InvokeModelWithResponseStream"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowSageMakerStudioDiscovery",
            "Effect": "Allow",
            "Action": [
                "sagemaker:ListDomains",
                "sagemaker:ListUserProfiles",
                "sagemaker:ListApps",
                "sagemaker:DescribeDomain",
                "sagemaker:DescribeUserProfile"
            ],
            "Resource": "*"

      {
            "Sid": "AllowSageMakerStudioPresignedUrls",
            "Effect": "Allow",
            "Action": [
                "sagemaker:CreatePresignedDomainUrl"
            ],
            "Resource": "*"
        }
    ]
}

Application Build

Follow the below steps to build an end-to-end RAG workflow including AI chat assistant built on Streamlit using Oracle 26ai and vector store and integrating with Amazon Bedrock for Embedding and LLM models.

You can also repurpose the full code available on GitHub. Jump to Demonstration section for instructions on how to deploy the code from GitHub and execute it.

Import libraries:

import sys
import os
from pathlib import Path
from typing import Dict, List
import logging
from dotenv import load_dotenv
import time
from concurrent.futures import ThreadPoolExecutor
import numpy as np
from PIL import Image
import base64

import boto3
import oracledb
import streamlit as st
from PyPDF2 import PdfReader
from langchain_aws import BedrockEmbeddings, ChatBedrock
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.documents import Document
from langchain_community.vectorstores.oraclevs import OracleVS
from langchain_community.vectorstores.utils import DistanceStrategy
from langchain.chains import ConversationalRetrievalChain

Ingest documents:
The document in our use case is in PDF format. We load a PDF document and transform each page of the PDF document to text.
```
def extract_text_from_pdf(pdf_file) -> str:
    """Extract text content from a PDF file."""
    try:
        pdf_reader = PdfReader(pdf_file)
        text = ""
        for page in pdf_reader.pages:
            text += page.extract_text()
        return text
    except Exception as e:
        logger.error(f"Error extracting text from PDF: {e}")
        raise
```
Note: For simplicity in our example we are loading the PDF documents into the app from local desktop but this data source (PDF document) can reside inside Oracle database (PDF as BLOB), or it can also be external to the database like in Amazon S3 bucket.

Chunk the text:

Given the extracted text, this function splits it into smaller chunks using LangChain’s RecursiveCharacterTextSplitter module. The chunk size, overlap, and other parameters are configured to optimize processing efficiency.

def chunk_text(text: str) -> List[str]:
    try:
        splitter = RecursiveCharacterTextSplitter(
            separators=["\n\n", "\n", ".", " "],
            chunk_size=MAX_CHUNK_SIZE,
            chunk_overlap=100,
            length_function=len
        )
        return splitter.split_text(text)
    except Exception as e:
        logger.error(f"Error chunking text: {e}")
        raise

The create_documents function below takes a list of text chunks and transforms them into Document objects with metadata. This function creates a new Document object for each text chunk, assigning an ID and a page link as metadata, and uses a list comprehension to process all chunks efficiently.

def create_documents(chunks: List[str]) -> List[Document]:
    """Create document objects with metadata from text chunks."""
    return [
        Document(
            page_content=text,
            metadata={'id': str(i), 'link': f'Page {i}'}
        )
        for i, text in enumerate(chunks)
    ]

The process_chunk_with_delay function handles the processing of a single document chunk while implementing rate limiting through a time delay. It first introduces a pause using time.sleep to respect rate limits, then uses the embedder to convert the document’s content into a numerical embedding vector.

def process_chunk_with_delay(chunk: Document, embedder) -> np.ndarray:
    """Process a single chunk with rate limiting."""
    time.sleep(RATE_LIMIT_DELAY)
    return embedder.embed_documents([chunk.page_content])[0]

The batch_process_embeddings function manages the parallel processing of multiple documents to generate their embeddings while maintaining rate limits. It uses a ThreadPoolExecutor to process multiple documents concurrently (up to MAX_CONCURRENT_REQUESTS), and maps each document through the process_chunk_with_delay function to generate embeddings while respecting rate limits, finally returning a list of generated embeddings.

def batch_process_embeddings(docs: List[Document], embedder) -> List[np.ndarray]:
    """Process embeddings in batches with rate limiting."""
    with ThreadPoolExecutor(max_workers=MAX_CONCURRENT_REQUESTS) as executor:
        embeddings = list(executor.map(
            lambda doc: process_chunk_with_delay(doc, embedder),
            docs
        ))
    return embeddings

Store vector embeddings:

Next, we load the vector embeddings using the amazon.titan-embed-text-v2 model into Oracle Database@AWS as the vector database. This function takes the text chunks as input and creates a vector store using Amazon Titan Embeddings. You can create it from environment variables as shown in the following screenshot. Change according to your database setup

def get_oracle_connection():
    """Create and return Oracle connection."""
    try:
        if not hasattr(st.session_state, 'oracle_connection') or st.session_state.oracle_connection is None:
            connection = oracledb.connect(
                user=os.getenv('ORACLE_USER'),
                password=os.getenv('ORACLE_PASSWORD'),
                dsn=f"{os.getenv('ORACLE_HOST')}:{os.getenv('ORACLE_PORT', '1521')}/{os.getenv('ORACLE_SERVICE')}"
            )
            st.session_state.oracle_connection = connection
            logger.info("New Oracle connection established successfully")
        return st.session_state.oracle_connection
    except Exception as e:
        logger.error(f"Error creating Oracle connection: {e}")
        raise

def check_connection():
    """Check if the Oracle connection is healthy and reinitialize if needed."""
    try:
        connection = st.session_state.get('oracle_connection')
        if connection is None:
            st.session_state.oracle_connection = get_oracle_connection()
            return

        try:
            with connection.cursor() as cursor:
                cursor.execute("SELECT 1 FROM DUAL")
        except Exception as e:
            logger.warning(f"Connection test failed: {e}")
            try:
                connection.close()
            except Exception:
                pass
            st.session_state.oracle_connection = get_oracle_connection()
    except Exception as e:
        logger.error(f"Error checking connection: {e}")
        raise

def safe_close_connection():
    """Safely close the Oracle connection."""
    if hasattr(st.session_state, 'oracle_connection') and st.session_state.oracle_connection is not None:
        try:
            if st.session_state.oracle_connection.ping():
                st.session_state.oracle_connection.close()
                logger.info("Oracle connection closed successfully")
        except Exception as e:
            logger.warning(f"Error during connection cleanup: {e}")
        finally:
            st.session_state.oracle_connection = None

The preceding code manages Oracle database connections in a Streamlit application through three main functions. The get_oracle_connection() function creates a new database connection if one doesn’t exist in the session state, check_connection() verifies if the existing connection is healthy and recreates it if necessary, and safe_close_connection() properly closes the database connection when it’s no longer needed. Together, these functions provide reliable database connectivity while preventing connection leaks and handling errors gracefully.

def initialize_bedrock_embeddings():
    """Initialize and return Bedrock embeddings."""
    try:
        client = boto3.client("bedrock-runtime", 'us-east-1')
        return BedrockEmbeddings(
            model_id="amazon.titan-embed-text-v2:0",
            client=client
        )
    except Exception as e:
        logger.error(f"Error initializing Bedrock embeddings: {e}")
        raise

The preceding code initializes the Amazon Bedrock embedding service, which converts text into numerical vectors (embeddings) using the Amazon Titan model. It creates an Amazon Bedrock client specifically for the us-east-1 Region and sets up the BedrockEmbeddings object with the Amazon Titan embedding model. If anything goes wrong during initialization, it logs the error and raises the exception.

embedder = initialize_bedrock_embeddings()
embeddings = batch_process_embeddings(docs, embedder)

vectorstore = OracleVS.from_documents(
	docs,
    embedder,
    client=st.session_state.oracle_connection,
    table_name="ORAVSEMBEDDING",
    distance_strategy=DistanceStrategy.DOT_PRODUCT
	)

The preceding code sequence performs three main operations in a document embedding workflow:

Initialize the Amazon Bedrock embeddings service using the previously defined function.
Process a batch of documents (docs) to create their vector embeddings using the embedder.
Create a vector store in Oracle Database (OracleVS) by storing the documents and their embeddings, using dot product as the similarity measure between vectors, in a table named ORAVSEMBEDDING.

The next step enables vector similarity search for the documents.

Create conversational chain:

In this function, a conversation chain is created using the conversational AI model (Anthropic’s Claude v1) and vector store (created in the previous function). This chain allows the generative AI application to engage in conversational interactions.

def create_conversation_chain(vectorstore):
    """Create a conversation chain for Q&A."""
    try:
        llm = ChatBedrock(
            model_id="us.anthropic.claude-3-sonnet-20240229-v1:0",
            client=boto3.client("bedrock-runtime", 'us-east-1')
        )
        return ConversationalRetrievalChain.from_llm(
            llm=llm,
            retriever=vectorstore.as_retriever(),
            return_source_documents=True
        )
    except Exception as e:
        logger.error(f"Error creating conversation chain: {e}")
        raise

Create function to handle the questions:

This function is responsible for processing the user’s input question and generating a response from the AI assistant.

# Questions Section
    st.markdown("### Ask Questions")
    question = st.text_input("Enter your question:")
    
    if question:
        if st.button("Ask", type="primary"):
            try:
                if not st.session_state.vectorstore:
                    st.warning("Please process documents first!")
                else:
                    check_connection()
                    
                    if st.session_state.conversation is None:
                        st.session_state.conversation = create_conversation_chain(
                            st.session_state.vectorstore
                        )

                    with st.spinner("Thinking..."):
                        response = st.session_state.conversation.invoke({
                            "question": question,
                            "chat_history": st.session_state.chat_history
                        })
                        
                        st.session_state.chat_history.append((question, response['answer']))
                        display_chat_history()
            except Exception as e:
                st.error(f"Error processing question: {e}")
    
    if st.session_state.chat_history:
        if st.button("Clear Chat History"):
            st.session_state.chat_history = []
            st.rerun()

Create Streamlit components:
Streamlit is an open source Python library that makes it simple to create and share custom web applications for ML and data science. In just a few minutes, you can build and deploy powerful data applications. The following code creates the Streamlit components.

def main():
    """Main application function."""
    set_custom_style()
    init_session_state()
    display_sidebar()

    st.markdown('<h1 class="main-header">📚 OracleRAG: AI-Powered Knowledge Navigator</h1>', 
                unsafe_allow_html=True)

    # Document Upload Section
    st.markdown("### Document Upload")
    uploaded_files = st.file_uploader(
        "Choose PDF files",
        type="pdf",
        accept_multiple_files=True
    )

    if uploaded_files:
        total_size = sum(file.size for file in uploaded_files)
        st.info(f"Total upload size: {total_size/1024/1024:.2f} MB")

        if st.button("Process Documents", type="primary"):
            try:
                for pdf_file in uploaded_files:
                    if pdf_file.name not in st.session_state.processed_files:
                        with st.spinner(f"Processing {pdf_file.name}..."):
                            text = extract_text_from_pdf(pdf_file)
                            chunks = chunk_text(text)
                            docs = create_documents(chunks)
                            
                            show_processing_progress(len(docs))
                            
                            embedder = initialize_bedrock_embeddings()
                            embeddings = batch_process_embeddings(docs, embedder)
                            
                            vectorstore = OracleVS.from_documents(
                                docs,
                                embedder,
                                client=st.session_state.oracle_connection,
                                table_name="ORAVSEMBEDDING",
                                distance_strategy=DistanceStrategy.DOT_PRODUCT
                            )
                            
                            st.session_state.vectorstore = vectorstore
                            st.session_state.processed_files.add(pdf_file.name)
                            st.session_state.total_chunks_processed += len(docs)
                
                st.success("All documents processed successfully!")
            except Exception as e:
                st.error(f"Error during processing: {e}")

Demonstration

Now that you have successfully written code for your generative AI assistant application, it’s time to run the application using Streamlit. Follow below steps to deploy the code from GitHub.

Clone the GitHub repo.

git clone https://github.com/sample-chatbot-bedrock-oracle-on-aws

Navigate to the folder where you cloned the repo.
```
cd ./sample-chatbot-bedrock-oracle-on-aws
```

Create a .env file in your project directory to add your Oracle Database@AWS details. Your .env file should like the following code:

Oracle Database Configuration
ORACLE_USER=<username>
ORACLE_PASSWORD=<password>
ORACLE_HOST=<Oracle-VM-HOSTNAME>
ORACLE_PORT=1521
ORACLE_SERVICE=<ORACLE-SERVICE-NAME>

AWS Configuration
AWS_DEFAULT_REGION=<REGION>

The GitHub repository you cloned earlier includes the file requirements.txt, which has the required libraries you need to install for building the AI assistant application. Install the libraries by running the following command:
```
pip install -r requirements.txt
```
Navigate to the cloned repo folder and run the following command:
```
streamlit run odbapp.py --server.port 8080
```
This will start the application, launch the URL in a browser window which will open the application as shown.
Upload the PDF file by choosing Browse files.
In this example we uploaded Oracle 26ai user guide in pdf format.
Choose Process Documents.
Enter your question and choose Ask.

The AI assistant processes the question through the RAG workflow and generates a response as shown in the following screenshot.

Clean up

When no longer required make sure to delete the resources provisioned for this testing so that you don’t encounter charges for them. You can manually delete the Streamlit application and EC2 instance along with Oracle AI Database 26ai on Exadata database service if not used.

Conclusion

In this post, we walked through the key vector capabilities of Oracle Database 26ai on Oracle Database@AWS and demonstrated how to integrate Amazon Bedrock and Amazon SageMaker with ODB to build generative AI applications.

AWS Database Blog

Accelerate generative AI use cases with Amazon Bedrock and Oracle Database@AWS

Overview of Oracle AI Database 26ai vector capabilities

Native VECTOR data type and operations

Flexible vector generation

AI Smart Scan (Exadata-optimized)

Dedicated memory allocation

Simplified development and data management

External table support for vector data

Solution overview

Prerequisites

Application Build

Demonstration

Clean up

Conclusion

About the Authors

Resources

Blog Topics

Follow

Learn

Resources

Developers

Help