Use everyday language to search and retrieve data with Mixtral 8x7B on Amazon SageMaker JumpStart

With the widespread adoption of generative artificial intelligence (AI) solutions, organizations are trying to use these technologies to make their teams more productive. One exciting use case is enabling natural language interactions with relational databases. Rather than writing complex SQL queries, you can describe in plain language what data you want to retrieve or manipulate. The large language model (LLM) can understand the intent behind your natural language input and data topography and automatically generate the appropriate SQL code. This allows analysts to be more productive by not having to context switch into rigid query syntax, while also opening up relational databases to less technical users.

In this post, we show you how to set up and deploy a solution to chat with your databases using natural language, allowing users to gain insights into their data without writing any code or SQL queries.

Benefits of text-to-SQL generative AI and the Mixtral 8x7B model

Consider Michelle, a business analyst responsible for preparing weekly sales reports by running complex SQL queries on their data warehouse to aggregate numbers by product, region, and time period. In the past, this manual process took 2–3 hours per week working with the analyst team to write these queries by hand. Now with text-to-SQL generative AI, Michelle simply describes the report she needs in plain English, such as “Show total revenue last week for shoes in the Western region grouped by sub-category.” The AI assistant automatically generates the required SQL query, runs it on the data warehouse, and returns a formatted report in seconds.

By eliminating the SQL bottleneck, Michelle saves hours per week, now spent on more impactful analysis instead of query writing. She can iterate faster and answer questions on demand. Other business users like Michelle gain similar productivity benefits from this conversational access to relational data. The generative AI tool essentially turns self-service analytics aspirations into reality by allowing business teams to leave the SQL to the machines.

For this implementation, Mixtral 8x7B MoE was used. Mixtral 8x7B is a state-of-the-art Sparse Mixture of Experts (MoE) foundation model released by Mistral AI. It supports multiple use cases such as text summarization, classification, text generation, and code generation. It is an 8x model, which means it contains eight distinct groups of parameters. The model has about 45 billion total parameters and supports a context length of 32,000 tokens. MoE is a type of neural network architecture that consists of multiple “experts,” where each expert is a neural network. In the context of transformer models, MoE replaces some feed-forward layers with sparse MoE layers. These layers have a certain number of experts, and a router network selects which experts process each token at each layer. MoE models enable more compute-efficient and faster inference compared to dense models. Compared to traditional LLMs, Mixtral 8x7B offers the advantage of faster decoding at the speed of a smaller parameter-dense model despite containing more parameters. It also outperforms other open-access models on certain benchmarks and supports a longer context length.

You can currently deploy Mixtral 8x7B on Amazon SageMaker JumpStart with one click. Amazon SageMaker JumpStart provides a simplified way to access and deploy over 100 different open source and third-party foundation models. Instead of having to manually integrate, optimize, and configure each foundation model yourself, SageMaker JumpStart handles those complex tasks for you. With just a few clicks, you can deploy state-of-the-art models from Hugging Face, Cohere, AI21 Labs, Stability AI, and more using optimized containers and SageMaker endpoints. SageMaker JumpStart eliminates the heavy lifting involved in foundation model deployment. You get access to a huge catalog of prebuilt models that you can quickly put to use for inference. It’s a scalable, cost-effective way to implement powerful AI solutions without machine learning (ML) expertise.

Solution overview

The following diagram illustrates the solution architecture.

At a high level, the overall solution consists of three core components:

Structured data source – This can be any relational data source such as Amazon Relational Database Service (Amazon RDS), Amazon Aurora, Amazon Athena, or Snowflake. It contains the business data to query.
Language model – This LLM is able to understand the data schema of the source database and map natural language questions (NQLs) into corresponding SQL queries.
Orchestrator backend – The code scripts can be run in environments such as an Amazon SageMaker Studio notebook, an AWS Lambda function, Amazon Elastic Compute Cloud (Amazon EC2), or Amazon Elastic Container Service (Amazon ECS). Additionally, you could optionally add an orchestration service, such as AWS Step Functions.

The end-to-end flow is as follows:

The user asks a natural language question, which is passed to the Mixtral 8x7B Instruct model, hosted in SageMaker.
The LLM analyzes the question and uses the schema fetched from the connected Amazon Redshift database to generate a SQL query.
The SQL query is run against the database. In case of an error, a retry workflow is run.
Tabular results received are passed back to the LLM to interpret and convert them into a natural language response to the user’s original question.

Prerequisites

To launch an endpoint to host Mixtral 8x7B from SageMaker JumpStart, you may need to request a service quota increase to access an ml.g5.48xlarge instance for endpoint usage. You can request service quota increases through the AWS Management Console, AWS Command Line Interface (AWS CLI), or API to allow access to those additional resources.

To follow along with this example, you also need access to a relational data source. Amazon Redshift is used as the primary data source in this post with the TICKIT database. This database helps analysts track sales activity for the fictional TICKIT website, where users buy and sell tickets online for sporting events, shows, and concerts. In particular, analysts can identify ticket movement over time, success rates for sellers, and the best-selling events, venues, and seasons. You can also experiment with other AWS data sources like Amazon RDS, Athena, or your own relational databases. Make sure to have the connection details for your data source available, such as database URL, user name, and password.

To follow the demo using Amazon Redshift, you first need to set up a Redshift cluster if you don’t already have one. Use the Amazon Redshift console or AWS CLI to launch a cluster with your desired node type and number of nodes. When the cluster is available, create a new database and tables in it to hold your sample relational data. You can load data from Amazon Simple Storage Service (Amazon S3) or directly insert rows. When storing data in Amazon S3, make sure that all public access is blocked and the data is encrypted at rest and in transit. For more information, refer to Security best practices for Amazon S3. Finally, make sure to note the cluster endpoint, database name, and credentials to connect. With a Redshift cluster provisioned and loaded with data, you will have an ideal relational backend ready to pair for natural language access.

To test that you successfully added data to your Redshift cluster, complete the following steps:

On the Amazon Redshift console, choose Clusters in the navigation pane.
Choose the cluster you want to query.
Navigate to the Query Editor tab to open the query editor.
Run the following sample queries or write your own SQL queries:

Find total sales on a given date:

SELECT sum(qtysold)
FROM sales, date
WHERE sales.dateid = date.dateid AND caldate = '2008-01-05';

Find top 10 buyers:

SELECT firstname, lastname, total_quantity
FROM (SELECT buyerid, sum(qtysold) total_quantity 
FROM sales GROUP BY buyerid ORDER BY total_quantity desc limit 10) Q, users
WHERE Q.buyerid = userid ORDER BY Q.total_quantity desc;

The query editor allows saving, scheduling, and sharing queries. You can also view query plans, inspect run details, and monitor query performance.

Implement the solution

The code consists of a number of functions that are invoked by the logic shown in the solution diagram. We show you the relevant code blocks in this breakdown that match with the diagram. You can see the complete code for the solution in the GitHub repository.

To implement this solution, complete the following steps:

Set up a Redshift cluster. For this post, we use an RA3 type cluster.
Load the TICKIT sales dataset into the Redshift cluster. For instructions, see Load data from Amazon S3 to Amazon Redshift.
To confirm that Amazon Redshift access is private and restricted only to your VPC, refer to the steps in Enable private access to Amazon Redshift from your client applications in another VPC.
Set up a SageMaker domain, making sure it has the appropriate permissions to interact with Amazon Redshift.
Clone the following GitHub repository into SageMaker Studio Classic.

The first step is to deploy the Mixtral 8x7B Instruct SageMaker endpoint. We use the default size ml.g5.48xlarge instance. Make sure that you have an ml.g5.48xlarge for endpoint usage service quota of at least 1.

# Note this requires an ml.g5.48xlarge instance.
model_id = "huggingface-llm-mixtral-8x7b-instruct"
from sagemaker.jumpstart.model import JumpStartModel
model = JumpStartModel(model_id=model_id)
predictor = model.deploy(endpoint_name=MIXTRAL_ENDPOINT)

Set up the connectivity to the Redshift cluster. Make sure to replace these placeholders with your Redshift identifiers. For security purposes, you should have the credentials secured using AWS Secrets Manager. For instructions, see Enhance your security posture by storing Amazon Redshift admin credentials without human intervention using AWS Secrets Manager integration
```
redshift_client = boto3.client('redshift-data')
CLUSTER_IDENTIFIER = 'redshift-cluster-1'
DATABASE = 'dev'
DB_USER = 'awsuser'
```

Set up the natural language question and the prompt parameters for the model

prompt = "What are the top five seller names in San Diego, based on the number of tickets sold in 2008?"

params={'sql-len':700,'text-token':500,'tables':tables,'db':schm,'temp':0.01,
'model_id':'mixtral','prompt':prompt}

The Redshift cluster is queried to generate the relevant database schema and example records, as shown in Step 2:

%%time
ress=redshift_qna(params)
"""
    Execute a Q&A process for generating SQL queries based on user questions.
    Args:
        params (dict): A dictionary containing parameters including table name, database name, prompt, etc.
    Returns:
        tuple: A tuple containing the response, generated SQL statement, and query output.
    """
    sql1=f"SELECT table_catalog,table_schema,table_name,column_name,ordinal_position,is_nullable,data_type FROM information_schema.columns WHERE table_schema='{params['db']}'"
    sql2=[]
    for table in params['tables']:
        sql2.append(f"SELECT * from dev.{params['db']}.{table} LIMIT 3")
    sqls=[sql1]+sql2
    
    question=params['prompt']
    results=execute_query_with_pagination(sqls, CLUSTER_IDENTIFIER, DATABASE, DB_USER)    
    
    col_names=results[0].split('\n')[0]
    observations="\n".join(sorted(results[0].split('\n')[1:])).strip()
    params['schema']=f"{col_names}\n{observations}"
    params['sample']=''
    for examples in results[1:]:
        params['sample']+=f"{examples}\n\n"

The generated SQL query is run on the Redshift cluster (Steps 6–8):

q_s=query_llm(prompts,200)
sql_pattern = re.compile(r'<sql>(.*?)(?:</sql>|$)', re.DOTALL)           
sql_match = re.search(sql_pattern, q_s)
q_s = sql_match.group(1) 
print(f" FIRST ATTEMPT SQL:\n{q_s}")
output, q_s=single_execute_query(q_s, CLUSTER_IDENTIFIER, DATABASE, DB_USER,question) 
"""
    Execute a single SQL query on an Amazon Redshift cluster and process the result.

    Args:
        sql_query (str): The SQL query to execute.
        cluster_identifier (str): The identifier of the Redshift cluster.
        database (str): The name of the database.
        db_user (str): The username used to authenticate with the Redshift cluster.
        question (str): A descriptive label or question associated with the query.

    Returns:
        pandas.DataFrame: DataFrame containing the processed result of the SQL query.

    """
    result_sets = []
    response = execute_query_redshift(sql_query, cluster_identifier, database, db_user)

The query might fail because of errors in the LLM-generated SQL. This is why we have a debugging step, which can iterate for a certain number of times, asking the LLM to look at the Amazon Redshift error message and the previous context (user question, DB schema, table samples, and past SQL query generated) and generate a new query addressing it. Guidance is provided to the model using prompt engineering and instructions to come up with a different query. The new query is then run on the cluster again. This process is configured to repeat up to five times in the sample code, or until the query successfully runs. If the query doesn’t run successfully within the number of retries specified, a failure message is returned back to the user. This step highlighted in red in the diagram.

def llm_debugger(question, statement, error, params): 
    """
    Generate debugging guidance and expected SQL correction for a PostgreSQL error.
    Args:
        question (str): The user's question or intent.
        statement (str): The SQL statement that caused the error.
        error (str): The error message encountered.
        params (dict): Additional parameters including schema, sample data, and length.
    Returns:
        str: Formatted debugging guidance and expected SQL correction.
    """
    prompts=f'''<s><<SYS>>[INST]
You are a PostgreSQL developer who is an expert at debugging errors.  

Here are the schema definition of table(s):
{params['schema']}
#############################
Here are example records for each table:
{params['sample']}
#############################
Here is the sql statement that threw the error below:
{statement}
#############################
Here is the error to debug:
{error}
#############################
Here is the intent of the user:
{params['prompt']}
<</SYS>>
First understand the error and think about how you can fix the error.
Use the provided schema and sample row to guide your thought process for a solution.
Do all this thinking inside <thinking></thinking> XML tags. This is a space for you to write down relevant content and will not be shown to the user.

Once your are done debugging, provide the the correct SQL statement without any additional text.
When generating the correct SQL statement:
1. Pay attention to the schema and table name and use them correctly in your generated sql. 
2. Never query for all columns from a table unless the question says so. You must query only the columns that are needed to answer the question.
3. Wrap each column name in double quotes (") to denote them as delimited identifiers. Do not use backslash (\) to escape underscores (_) in column names. 

Format your response as:
<sql> Correct SQL Statement </sql>[/INST]'''
    answer=query_llm(prompts,round(params['sql-len']))
    return answer

If the query successfully runs, we pass the tabular results from Amazon Redshift to the LLM to interpret them and, based on the initial question, provide an answer in natural language to be returned to the user (Steps 10–13):

if len(input_token)>28000:    
        csv_rows=output.split('\n')
        chunk_rows=chunk_csv_rows(csv_rows, 20000)
        initial_summary=[]
        for chunk in chunk_rows:
            prompts=f'''<s><<SYS>>[INST]You are a helpful and truthful assistant. Your job is provide answers based on samples of a tabular data provided.

Here is the tabular data:
#######
{chunk}
#######
<</SYS>>
Question: {question}

When providing your response:
- First, review the result to understand the information within. Then provide a complete answer to the my question, based on the result.
- If you can't answer the question, please say so[/INST]'''
            initial_summary.append(qna_llm(prompts,params))
        prompts = f'''<s><<SYS>>[INST]You are a helpful and truthful assistant.

Here are multiple answer for a question on different subset of a tabular data:
#######
{initial_summary}
#######
<</SYS>>
Question: {question}
Based on the given question above, merege all answers provided in a coherent singular answer[/INST]'''
        response=qna_llm(prompts,params)
        
    else:        
        prompts=f'''<s><<SYS>>[INST]You are a helpful and truthful assistant. Your job is to examine a sql statement and its generated result, then provide a response to my question.

Here is the sql query:
{q_s}

Here is the corresponding sql query result:
{output}
<</SYS>>
question: {question}

When providing your response:
- First, review the sql query and the corresponding result. Then provide a complete answer to the my question, based on the result.
- If you can't answer the question, please say so[/INST]'''
        response=qna_llm(prompts, params) 
    return response, q_s,output

Test the solution

Let’s explore an example run of the solution. We ask the question, “What are the top five seller names in San Diego, based on the number of tickets sold in 2008?” The following is the SQL query generated:

SELECT MAX("u"."firstname") AS firstname, MAX("u"."lastname") AS lastname, SUM("l"."numtickets") AS total_tickets_sold
FROM "users" AS u
JOIN "sales" AS s ON "u"."userid" = "s"."sellerid"
JOIN "listing" AS l ON "s"."listid" = "l"."listid"
JOIN "event" AS e ON "l"."eventid" = "e"."eventid"
JOIN "date" AS d ON "l"."dateid" = "d"."dateid"
WHERE "d"."year" = 2008 AND "u"."city" = 'San Diego'
GROUP BY "u"."userid"
ORDER BY total_tickets_sold DESC
LIMIT 5;

The following is the query response from Amazon Redshift:

firstname,lastname,total_tickets_sold
Julie,Hanson,156
Joan,Wright,115
Ocean,Sears,100
Hiram,Turner,99
Jena,Burch,90

The following is the final answer provided by the LLM:

The top five sellers in San Diego, based on the number of tickets sold in 2008, are:
1. Julie Hanson with a total of 156 tickets sold.
2. Joan Wright with a total of 115 tickets sold.
3. Ocean Sears with a total of 100 tickets sold.
4. Hiram Turner with a total of 99 tickets sold.
5. Jena Burch with a total of 90 tickets sold.

Best practices

Enhancing response efficiency in text-to-SQL systems involves incorporating several key best practices:

Caching parsed SQL – To improve response times and avoid reprocessing repeated queries, parsed SQL and recognized query prompts can be cached from the system. This cache can be checked before invoking the LLM for each new text query.
Monitoring – Usage logs and metrics around query parsing, SQL generation latency, and result set sizes should be collected. Monitoring this data enables optimization by revealing pain points—whether from inadequate training data, limitations in prompt engineering, or data model issues.
Scheduled data refresh – To keep materialized view data current, refresh schedules using batch or incremental approaches are needed. The right balance mitigates the overhead of the refresh while making sure that text queries generate results using the latest data.
Central data catalog – Maintaining a centralized data catalog provides a unified metadata layer across data sources, which is critical for guiding LLM SQL generation. This catalog enables selecting appropriate tables and schemas to handle text queries.
Guardrails – Use prompt engineering to prevent the LLM from generating SQL that would alter tables or logic to prevent running queries that would alter any tables. One important recommendation is to use a user role that only has read privileges.

By considering these optimization dimensions, natural language-to-SQL solutions can scale efficiently while delivering intuitive data access. As with any generative AI system, keeping an eye on performance is key while enabling more users to benefit.

These are just a few of the different best practices that you can follow. For a deeper dive, see Generating value from enterprise data: Best practices for Text2SQL and generative AI.

Clean up

To clean up your resources, complete the steps in this section.

Delete the SageMaker endpoint

To delete a SageMaker model endpoint, follow these steps:

On the SageMaker console, in the navigation pane, choose Inference, then choose Endpoints.
On the Endpoints page, select the endpoint you want to delete.
On the Actions menu, select Delete.
On the confirmation page, choose Delete to delete the endpoint.

The endpoint deletion process will begin. You can check the endpoint status on the Endpoints page to confirm it has been deleted.

Delete the Redshift cluster

Complete the following steps to delete your Redshift cluster:

On the Amazon Redshift console, in the navigation pane, choose Clusters to display your list of clusters.
Choose the cluster you want to delete.
On the Actions menu, choose Delete.
Confirm the cluster to be deleted, then choose Delete cluster.

The cluster status will be updated as the cluster is deleted. This process usually takes a few minutes.

Conclusion

The ability to query data through intuitive natural language interfaces unlocks huge potential for business users. Instead of struggling with complex SQL syntax, teams can self-serve the analytical insights they need, on demand. This improves time-to-value while allowing less technical users to access and extract meaning from enterprise data.

As highlighted in this post, the latest advances in generative AI make robust NLQ-to-SQL systems achievable. With foundation models such as Mixtral 8x7B running on SageMaker and tools and libraries for connecting to different data sources, organizations can now have an enterprise-grade solution to convert natural language queries into efficient SQL. By eliminating the traditional SQL bottleneck, generative NLQ-to-SQL systems give back countless hours each week for analysts and non-technical roles, driving greater business agility and democratization in self-service analytics.

As generative AI continues to mature rapidly, keeping up with the latest models and optimization techniques is critical. This post only scratched the surface of what will be possible in the near future as these technologies improve. Natural language interfaces for accessing and manipulating data still have huge runways for innovation ahead. To learn more about how AWS is helping customers make their ideas a reality, refer to the Generative AI Innovation Center.

About the Authors

Jose Navarro is an AI/ML Solutions Architect at AWS, based in Spain. Jose helps AWS customers—from small startups to large enterprises—architect and take their end-to-end machine learning use cases to production. In his spare time, he loves to exercise, spend quality time with friends and family, and catch up on AI news and papers.

Prashanth Ganapathy is a Senior Solutions Architect in the Small Medium Business (SMB) segment at AWS. He enjoys learning about AWS AI/ML services and helping customers meet their business outcomes by building solutions for them. Outside of work, Prashanth enjoys photography, travel, and trying out different cuisines.

Uchenna Egbe is an Associate Solutions Architect at AWS. He spends his free time researching about herbs, teas, superfoods, and how to incorporate them into his daily diet.

Sebastian Bustillo is a Solutions Architect at AWS. He focuses on AI/ML technologies with a with a profound passion for generative AI and compute accelerators. At AWS, he helps customers unlock business value through generative AI, assisting with the overall process from ideation to production. When he’s not at work, he enjoys brewing a perfect cup of specialty coffee and exploring the world with his wife.