Posted On: Nov 20, 2023

Amazon Redshift can now leverage the column-level statistics stored in AWS Glue Data Catalog to improve data lake query performance by generating optimized query plans. 

AWS Glue supports column-level statistics in the AWS Glue Data Catalog, which allows customers to store statistical information such as minimum and maximum values and number of distinct values for each column. Amazon Redshift now automatically retrieves this information from AWS Glue, then optimize query plans using statistics and provide performance improvements for your data lake queries. With the recently introduced AWS Glue capability of generating column-level statistics, you can automatically collect statistical information from your data lake tables and update the column-level statistics instead of populating this information manually.

To get started, you can use AWS Glue Console or AWS Glue APIs to generate column statistics for your data lake tables, and then you can run queries on these tables in Redshift using auto-mounted Glue catalog or external schemas. 

Amazon Redshift data lake query plan optimizations using AWS Glue column-level statistics is generally available in all AWS Regions where Amazon Redshift Spectrum or Amazon Redshift Serverless are available. To learn more visit the Amazon Redshift Database Developer Guide and AWS Glue documentation.