AWS Big Data Blog

Write queries faster with Amazon Q generative SQL for Amazon Redshift

Amazon Redshift is a fully managed, AI-powered cloud data warehouse that delivers the best price-performance for your analytics workloads at any scale. Amazon Q generative SQL brings the capabilities of generative AI directly into the Amazon Redshift query editor. Amazon Q generative SQL for Amazon Redshift was launched in preview during AWS re:Invent 2023. With over 85,000 queries executed in preview, Amazon Redshift announced the general availability in September 2024.

Amazon Q generative SQL for Amazon Redshift uses generative AI to analyze user intent, query patterns, and schema metadata to identify common SQL query patterns directly within Amazon Redshift, accelerating the query authoring process for users and reducing the time required to derive actionable data insights. It provides a conversational interface where users can submit queries in natural language within the scope of their current data permissions. Generative SQL uses query history for better accuracy, and you can further improve accuracy through custom context, such as table descriptions, column descriptions, foreign key and primary key definitions, and sample queries. Custom context enhances the AI model’s understanding of your specific data model, business logic, and query patterns, allowing it to generate more relevant and accurate SQL recommendations. It enables you to get insights faster without extensive knowledge of your organization’s complex database schema and metadata.

Within this feature, user data is secure and private. Your data is not shared across accounts. Your queries, data and database schemas are not used to train a generative AI foundational model (FM). Your input is used as contextual prompts to the FM to answer only your queries.

In this post, we show you how to enable the Amazon Q generative SQL feature in the Redshift query editor and use the feature to get tailored SQL commands based on your natural language queries. With Amazon Q, you can spend less time worrying about the nuances of SQL syntax and optimizations, allowing you to concentrate your efforts on extracting invaluable business insights from your data.

Solution overview

At a high level, the feature works as follows:

  1. For generating the SQL code, you can write your query request in plain English within the conversational interface in the Redshift query editor.
  2. The query editor sends the query context to the underlying Amazon Q generative SQL platform, which uses generative AI to generate SQL code recommendations based on your Redshift metadata.
  3. You receive the generated SQL code suggestions within the same chat interface.

The following diagram illustrates this workflow.

Your content processed by generative SQL is not stored or used by AWS for service improvement.

Amazon Q generative SQL uses a large language model (LLM) and Amazon Bedrock to generate the SQL query. AWS uses different techniques, such as prompt engineering and Retrieval Augmented Generation (RAG), to query the model based on your context:

  • The database you’re connected to
  • The schema you’re working on
  • Your query history
  • Optionally, the query history of other users connected to the same endpoint

Amazon Q generative SQL is conversational, and you can ask it to refine a previously generated query.

In the following sections, we demonstrate how to enable the generative SQL feature in the Redshift query editor and use it to generate SQL queries using natural language.

Prerequisites

To get started, you need an Amazon Redshift Serverless endpoint or an Amazon Redshift provisioned cluster. For this post, we use Redshift Serverless. Refer to Easy analytics and cost-optimization with Amazon Redshift Serverless to get started.

Enable the Amazon Q generative SQL feature in the Redshift query editor

If you’re using the feature for the first time, you need to enable the Amazon Q generative SQL feature in the Redshift query editor.

To enable the feature, complete the following steps:

  1. On the Amazon Redshift console, open the Redshift Serverless dashboard.
  2. Choose Query data.

You can also choose Query Editor V2 in the navigation pane of the Amazon Redshift console.

When you open the Redshift query editor, you will see the new icon for Amazon Q next to the database dropdown menu on the top of the query editor console.

If you choose the Amazon Q icon, you will see the message “Amazon Redshift query editor V2 now supports generative SQL functionality. Contact your administrator to activate this feature in Settings.” If you’re not the administrator, you need to work with the account administrator to enable this feature.

  1. If you’re the administrator, choose the hyperlink in the message, or go to the settings icon and choose Generative SQL settings.
  2. In the Generative SQL settings section, select Q generative SQL, which will turn on Amazon Q generative SQL for all users of the account.

Amazon Q generative SQL is personalized to your database and, based on the updates or conversations you have had with the feature, will apply those learnings to other user conversations who connect to the same database with their own credentials. In the generative SQL settings, you can see the instructions to grant the sys:monitor role to a user or role.

  1. Choose Save.

You will receive a confirmation that the Amazon Q generative SQL settings have been successfully updated.

Load notebooks with sample TPC-DS data

The Redshift query editor comes with sample data and SQL notebooks that you can load into a sample database and corresponding schema. For this post, we use TPC-DS for a decision support benchmark.

We start by loading the TPC-DS data into the Redshift database. When you load this data, the schema tpcds is updated with sample data. We also use the provided notebooks with the tpcds schema to run queries to build a query history.

Complete the following steps:

  1. Connect to your Redshift Serverless workgroup or Redshift provisioned cluster.
  2. Navigate to the sample_data_dev database to view the sample databases available for running the generative SQL feature.
  3. Hover over the tpcds schema and choose Open sample notebooks.
  4. In the Create sample database pop-up message, choose Create.

In a few seconds, you will see the notification that the database sample_data_dev is created successfully and tpcds sample data is loaded successfully. Two sample notebooks for the schema are also generated.

  1. Choose Run all on each notebook tab.

This will take a few minutes to run and will establish a query history for the tpcds data.

This step is not mandatory for using the feature for your organization’s data warehouse.

Use Amazon Q to generate SQL queries from natural language

Now that the Amazon Q generative SQL feature is enabled and ready for use, open a new notebook and choose the Amazon Q icon to open a chat pane in the Redshift query editor.

Amazon Q generative SQL is personalized to your schema. It uses metadata from database schemas to improve the SQL query suggestions. Optionally, administrators can allow the use of the account’s query history to further improve the generated SQL. This can be enabled by running the following GRANT commands to provide access to your query history to other roles or users:

GRANT ROLE SYS:MONITOR to "IAMR:role-name";
GRANT ROLE SYS:MONITOR to "IAM:user-name";
GRANT ROLE SYS:MONITOR to "database-username";

This optional step allows users to make query monitoring history available to other users connected to the same database.

Let’s get started with some query examples.

  1. First, make sure you’re connected to sample_data_dev
  2. Let’s ask the query “What are the top 10 stores in sales in 1998?”

This generates a SQL query. Amazon Q generative SQL is also personalized to your data domain. You will notice that it joins to the Store table to retrieve store_name.

  1. Choose Add to notebook under the query to add the generated SQL.

Our query runs successfully and shows that the store able has the most sales.

  1. Amazon Q is personalized to your conversation. Suppose you want to know what the top selling item was for store able. You can ask this question “What was the unique identifier of the top selling item for the store ‘able’?”

The results show the top selling item. However, the query didn’t filter on the year.

  1. Let’s ask Amazon Q to give us the top selling item for store able in 1998. Instead of repeating the whole question again, you can simply ask “Can you filter by the year 1998?”

Now we have the top selling item for store able for 1998.

  1. To display the item description, you can ask the query “Can you modify the query to include its name and description?”

Amazon Q added the join to the item table and the query ran successfully.

Now that we have done some basic queries, let’s do some deeper analysis.

  1. Let’s ask Amazon Q “Can you give me aggregated store sales, for each county by quarter for all years?”

The answer is correct, but let’s ask a follow-up to include the state.

  1. Ask the follow-up question: “Can you include state?”

This answer looks good; you can also add an ORDER BY clause if you want the data sorted or ask Amazon Q to add that.

So far, we have only been looking at store_sales data. The TPC-DS data contains data for other sales channels, including web_sales and catalog_sales.

  1. Let’s ask Amazon Q “Can you give me the total sales for 1998, from different sales channels, using a union of the sales data from different channels?”

Let’s dive deeper into some other capabilities of Amazon Q generative SQL.

  1. Let’s try logging in with a different user and see how Amazon Q generative SQL interacts with that user. We have created User3 and granted the sys:monitor
  2. Logged in as User3, let’s ask the original question of “What are the top 10 stores in sales in 1998?”

Amazon Q generative SQL is able to use the query history and provide SQL recommendations for User3’s prompts because they have access to the system metadata provided through the role sys:monitor.

Safety features

Amazon Q generative SQL has built-in safety features to warn if a generated SQL statement will modify data and will only run based on user permissions. To test this, let’s ask Amazon Q to “delete data from web_sales table.”

Amazon Q gives a message “I detected that this query changes your database. Only run this SQL command if that is appropriate.”

Now, still logged in as User3, choose Run to try to delete the web_sales data.

As expected, User3 gets a permission denied error, because they don’t have the necessary privileges to delete the web_sales table.

Custom context

Custom context is a feature that allows you to provide domain-specific knowledge and preferences, giving you fine-grained control over the SQL generation process.

The custom context is defined in a JSON file, which can be uploaded by the query editor administrator or can be added directly in the Custom context section in Amazon Q generative SQL settings.

This JSON file contains information that helps Amazon Q generative SQL better understand the specific requirements and constraints of your domain, enabling it to generate more targeted and relevant SQL queries.

By providing a custom context, you can influence factors such as:

  • The terminology and vocabulary used in the generated SQL
  • The level of complexity and optimization of the SQL queries
  • The formatting and structure of the SQL statements
  • The data sources and tables that should be considered

The custom context feature empowers you to take a more active role in shaping the SQL generation process, leading to SQL queries that are better suited to your data and business requirements.

In this post, we use the BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) sample dataset, consisting of three tables. BIRD represents a pioneering, cross-domain dataset that examines the impact of extensive database contents on text-to-SQL parsing.

You can load the following BIRD sample dataset into your Redshift data warehouse to experiment with using custom contexts.

For this post, we demonstrate with three custom contexts.

TablesToInclude

TablesToInclude specifies a set of tables that are considered for SQL generation. This field is crucial when you want to limit the scope of SQL queries to a defined subset of available tables. It can help optimize the generation process by reducing unnecessary table references.

Let’s ask Amazon Q “List the distinct translated title and the set code of all cards translated into Spanish.”

This SQL unnecessarily uses the public.cards table. The public.set_translations table contains the data sufficient to answer the question.

We can add the following TablesToInclude custom context JSON:

{
  "resources": [
    {
      "ResourceId":"Serverless:Serverless-workgroup-name",
      "ResourceType": "REDSHIFT_WAREHOUSE",
      "TablesToInclude": [
        "bird.public.set_translations"
      ]
    }
  ]
}

After adding the custom context, the unwanted joins are eliminated and the correct SQL is generated.

ColumnAnnotations

ColumnAnnotations allows you to provide metadata or annotations specific to individual columns in your data tables. These annotations can offer valuable insights into the definitions and characteristics of the columns, which can be beneficial in guiding the SQL generation process.

Let’s ask Amazon Q to “Show me the unconverted mana cost and name for all the cards created by Rob Alexander.”

The generated SQL points to the column convertedmanacost, which doesn’t give a value for unconverted mana cost. The manacost column gives the unconverted mana cost.

Let’s add this using ColumnAnnotations in the custom context JSON:

{
  "resources": [
    {
      "ResourceId": "Serverless: Serverless-workgroup-name",
      "ResourceType": "REDSHIFT_WAREHOUSE",
      "ColumnAnnotations":
         {"bird.public.cards": { "manaCost": "manaCost is the unconverted mana"} }
    }
  ]
}

After the custom context is added, the correct SQL gets generated.

CuratedQueries

CuratedQueries provides a set of predefined question and answer pairs. In this set, the questions are written in natural language and the corresponding answers are the SQL queries that should be generated to address those questions.

These examples serve as a valuable reference point for Amazon Q generative SQL, helping it understand the types of queries it is expected to generate. You can guide Amazon Q generative SQL with the desired format, structure, and content of the SQL queries it should produce.

Let’s ask Amazon Q “List down the name of artists for cards in Chinese Simplified.”

Although the join key multiverseid exists, it is not correct.

Let’s add the following using CuratedQueries in the custom context JSON:

{
  "resources": [
    {
      "ResourceId": "Serverless: Serverless-workgroup-name",
      "ResourceType": "REDSHIFT_WAREHOUSE",
      "CuratedQueries": [
        {
          "Question": "List down the name of artists for cards in Spanish.",
          "Answer": "SELECT artist FROM public.cards c JOIN public.foreign_data f ON c.uuid = f.uuid WHERE f.language = 'Spanish';"
        }
      ]
    }
  ]
}

After the custom context is added, the correct SQL gets generated.

Additional features

In this section, we discuss the supporting features available with Amazon Q generative SQL feature for Redshift query editor:

Provide feedback

Amazon Q generative SQL allows you to provide feedback on the SQL queries it generates, helping improve the quality and relevance of the SQL over time. This feedback mechanism is accessible through the Amazon Q generative SQL interface, where you can indicate whether the generated SQL was helpful or not.

If you find the generated SQL to not be helpful, you can categorize the feedback into the following areas:

  • Incorrect Tables/Columns – This indicates that the SQL references the wrong tables or columns, or is missing essential tables or columns
  • Incorrect Predicates/Literals/Group By – This category covers issues with the SQL’s filter conditions, literal values, or grouping logic
  • Incorrect SQL Structure – This feedback suggests that the overall structure or syntax of the generated SQL is not correct
  • Other – This option allows you to provide feedback that doesn’t fit into the preceding categories

In addition to selecting the appropriate feedback category, you can also provide free text comments to elaborate on the specific issues or inaccuracies you found in the generated SQL. This additional information can be valuable for Amazon Q to better understand the problems and make improvements.

By actively providing this feedback, you play a crucial role in refining the generation capabilities of Amazon Q generative SQL. The feedback you provide helps the service learn from its mistakes, leading to more accurate and relevant SQL queries that better meet your needs over time.

This feedback loop is an important part of Amazon Q generative SQL’s continuous improvement, because it allows the service to adapt and evolve based on your specific requirements and use cases.

Regenerate SQL

The Regenerate SQL option will prompt Amazon Q to generate a new SQL query based on the same natural language prompt, using its learning and improvement capabilities to provide a potentially better-suited response.

Refresh database

By choosing Refresh database, you can instruct Amazon Q generative SQL to re-fetch and update the metadata information about the connected database.

This metadata includes:

  • Schema definitions – The structure and organization of your database schemas
  • Table definitions – The names, columns, and other properties of the tables in your database
  • Column definitions – The data types, names, and other characteristics of the columns within your database tables

Tips and techniques

To get more accurate SQL recommendations from Amazon Q generative SQL, keep in mind the following best practices:

  • Be as specific as possible. Instead of asking for total store sales, ask for total sales across all sales channels if that is what you need.
  • Add your schema to the path. For example:
    set search_path to tpcds;
  • Iterate when you have complex requests and verify the results. For example, ask which county has the most sales in 2000 and follow up with which item had the most sales.
  • Ask follow-up questions to make queries more specific.
  • If an incomplete response is generated, instead of rephrasing the entire request, provide specific instructions to Amazon Q as a continuation to the prior question.

Clean up

To avoid incurring future charges, delete the Redshift cluster you provisioned as part of this post.

Conclusion

Amazon Q generative SQL for Amazon Redshift simplifies query authoring and increases productivity by allowing you to express queries in natural language and receive SQL code recommendations. This post demonstrated how the Amazon Q generative SQL feature can accelerate data analysis by reducing the time required to write SQL queries. By using natural language processing and seamlessly converting it into SQL, you can boost productivity without requiring an in-depth understanding of your organization’s database structures. Importantly, the robust security measures of Amazon Redshift remain fully enforced, and the quality of the generated SQL continues to improve over time by enabling query history sharing across users.

Get started on your Amazon Q generative SQL journey with Amazon Redshift today by implementing the solution in this post or by referring to Interacting with Amazon Q generative SQL. For pricing information, refer to Amazon Q generative SQL pricing. Also, please try other Redshift generative AI features such as Amazon Redshift Integration with Amazon Bedrock and Amazon Redshift Serverless AI-driven scaling and optimization.


About the authors

Raghu Kuppala is an Analytics Specialist Solutions Architect experienced working in the databases, data warehousing, and analytics space. Outside of work, he enjoys trying different cuisines and spending time with his family and friends.

Sushmita Barthakur is a Senior Data Solutions Architect at Amazon Web Services (AWS), supporting Enterprise customers architect their data workloads on AWS. With a strong background in data analytics, she has extensive experience helping customers architect and build enterprise data lakes, ETL workloads, data warehouses and data analytics solutions, both on-premises and the cloud. Sushmita is based out of Tampa, FL and enjoys traveling, reading and playing tennis.

Xiao Qin is a senior applied scientist with the Learned Systems Group (LSG) at Amazon Web Services (AWS). He studies and applies machine learning techniques to solve data management problems. He is one of the developers that build the Amazon Q generative SQL capability.

Erol MurtezaogluErol Murtezaoglu, a Technical Product Manager at AWS, is an inquisitive and enthusiastic thinker with a drive for self-improvement and learning. He has a strong and proven technical background in software development and architecture, balanced with a drive to deliver commercially successful products. Erol highly values the process of understanding customer needs and problems, in order to deliver solutions that exceed expectations.

Phil Bates was a Senior Analytics Specialist Solutions Architect at AWS, before retiring, with over 25 years of data warehouse experience.