AWS Partner Network (APN) Blog
How Shellkode Uses Amazon Bedrock to Convert Natural Language Queries to NoSQL Statements
By Bakrudeen K, Head AI/ML Practice – Shellkode
By Mukesh S M, AI/ML Engineer – Shellkode
By Rony K Roy, Sr. Specialist Partner Solutions Architect,[AI/ML – AWS
Shellkode |
In today’s data-driven world, efficient querying and retrieval of information from databases plays a vital role in various applications. While traditional database systems have relied on structured query languages and indexing techniques, emerging technologies like large language models (LLMs) offer new and powerful approaches to data querying in the context of MongoDB and NoSQL databases.
Natural language capabilities allow business users to query data through conversational English rather than creating MongoDB queries. However, realizing the full benefits requires overcoming some challenges like schema complexity, query optimization, and performance.
The artificial intelligence (AI) and language models must identify the appropriate MongoDB collections and generate effective queries. They also need a user-friendly interface for natural language questions tailored to MongoDB’s document-oriented structure.
In this post, we showcase a scenario of how a business user can query data residing in a popular NoSQL database, which is MongoDB, using natural language questions. This implementation aims to enhance the productivity of the businesses, which is achieved through the use of generative AI, powered by LLMs.
This post aims to capture the value Shellkode delivered to a software-as-a-service (SaaS) customer by decreasing their workload up to 90% by reducing the dependency on developers and database administrators (DBAs), allowing them to focus on other critical workloads.
Shellkode is an AWS Partner and born-in-cloud company with multiple service delivery locations spread across the globe. Shellkode helps organizations harness the power of change, speed, and innovation to create all-around business value by putting data and cloud at the core of your solutions.
Solution Overview
Shellkode’s solution extends its capabilities by seamlessly incorporating MongoDB as a flexible data repository. It effectively leverages the PyMongo library for streamlined connectivity to the database. Furthermore, it harnesses the LangChain framework to develop applications enriched by language models.
The integration of Amazon Bedrock empowers the system to not generate MongoDB queries in response to user inquiries phrased in natural language and transform the retrieved data from MongoDB into coherent natural language answers, thereby ensuring accessibility for business users.
Figure 1 – Solution deployment architecture.
The steps involved in this workflow are:
- User presents a natural language question in English.
- The natural language question, along with a prompt, is fed into the Amazon Bedrock large language model.
- LangChain, a versatile tool to work with LLMs and prompts, is used in AWS Lambda. LangChain requires an LLM to be defined. As part of Chain Sequence, the prompt and data catalog metadata are passed to the LLM, hosted on Bedrock, to create a MongoDB query.
- To execute the generated MongoDB query and retrieve the data, the PyMongo library, a Python-based tool for MongoDB interaction, establishes a connection with the database.
- The results are subsequently passed back to the LLM to craft a natural language answer based on the data.
Finally, the user receives an English response to their query.
Figure 2 – Solution flow architecture.
Prerequisites
To utilize this solution, you must have access to the MongoDB database from which you intend to retrieve data. Additionally, access to LLMs is essential, which can be obtained through either Amazon SageMaker JumpStart or Amazon Bedrock
Connect to Databases Using PyMongo
To establish a connection with the database, PyMongo employs the MongoDB Uniform Resource Identifier (URI). The solution initializes the Mongo Client from PyMongo and utilizes the URI along with it for database connectivity. It also utilizes AWS Secrets Manager to securely store and manage the database credentials.
Here’s how to initialize MongoDB and establish the secret manager connection:
import boto3
from pymongo import MongoClient
aws_region = 'aws_region'
secret_name = 'secret_name'
session = boto3.session.Session()
client = session.client(
service_name='secretsmanager',
region_name=aws_region,
)
response = client.get_secret_value(SecretId=secret_name)
secret_dict = json.loads(response['SecretString'])
mongo_username = secret_dict['username']
mongo_password = secret_dict['password']
mongo_uri = f"mongodb+srv://{mongo_username}:{mongo_password}@<db_name>.9cmu69a.mongodb.net/?retryWrites=true&w=majority"
client = MongoClient(mongo_uri)
Build Prompts and LLM Chain to Generate MongoDB Query
Using LangChain, create an LLM chain with Amazon Bedrock as the LLM, and prompt template:
from langchain.llms import Bedrock
from langchain import PromptTemplate,LLMChain
llm=Bedrock(
model_id='anthropic.claude-v2',
model_kwargs={'temperature':1e-10}
)
prompt = PromptTemplate(input_variables=[“query”],template=template)
llm_chain = LLMChain(prompt=prompt, llm=llm)
mongodb_query = llm_chain.run(query)
The output of the LLM, which is a MongoDB query, is executed using PyMongo and the data is retrieved from the Mongo database:
result = eval(mongodb_query)
data = [i for i in result]
Convert Data into Natural Language Answer
To convert the data retrieved from the database into a natural language answer (in English), utilize a second LLM chain:
prompt2 = PromptTemplate(input_variables=["input","data"], template=template2)
llm_chain2 = LLMChain(prompt=prompt2, llm=llm)
llm_chain2.run({"input":query,"data":data})
For example:
- The sample data available in MongoDB is:
Figure 3 – MongoDB sample data.
- For the user query “Who acted in the movie titled Blacksmith Scene?” the answer is as follows:
Question: Who acted in the movie titled Blacksmith Scene
MongoDB_Query: db.sample_mflix.movies.find({ "title": "Blacksmith Scene" }) Data_from_DB:{["Charles Kayser","John Ott"]} Final Response: The cast members in the movie titled "Blacksmith Scene" are Charles Kayser and John Ott.
Cleanup
After running the architecture with generative AI, it’s crucial to clean up any resources that won’t be further utilized. Additionally, ensure to stop any SageMaker Studio notebook instances to avoid incurring unnecessary charges.
If you’ve used SageMaker JumpStart to deploy an LLM as a SageMaker endpoint, remember to delete the endpoint, either through the SageMaker console or Studio.
Conclusion
The integration of large language models (LLMs) and MongoDB presents exciting possibilities for efficient and user-friendly data querying in today’s data-driven landscape. This solution serves as a bridge between business users and database systems, enabling natural language questions to be seamlessly translated into MongoDB queries.
By harnessing the capabilities of LLMs, such as those available through Amazon SageMaker JumpStart or Amazon Bedrock, and combining them with the LangChain framework, this architecture empowers users to interact with MongoDB databases in a conversational and intuitive manner. The approach enhances business productivity by facilitating quick and meaningful data insights and ensures accessibility to a broader audience.
The architecture presented here demonstrates the synergy between AI-powered language models, database technologies like MongoDB, and user-friendly interfaces, paving the way for more efficient and inclusive data-driven decision-making. As organizations continue to embrace these advancements, we can expect further innovations and improvements in the field of data querying and analysis.
Shellkode – AWS Partner Spotlight
Shellkode is an AWS Partner that helps organizations harness the power of change, speed, and innovation to create all-around business value by putting data and cloud at the core of your solutions.