Posted On: Apr 24, 2023

AWS Glue Crawlers extract the data schema and partitions from Amazon S3 and populate the AWS Glue Data Catalog, keeping metadata current. Today, AWS Glue Crawler support is expanded to automatically add partition indexes for newly discovered tables that will help analytics services such as Amazon Athena and AWS Glue to optimize partition processing to help with query performance on highly partitioned tables. 

The number of partitions in a given table can grow significantly over time. As analytics services like Amazon Athena query a table containing millions of partitions, the time needed to retrieve the partition increases and can cause query runtime to increase. With this release, when the AWS Glue Crawler creates a new AWS Glue Data Catalog table, it will also create a partition index by default without needing to create it manually. The AWS Glue Data Catalog will then create a fast, searchable index based on the partition index keys, reducing the time required to retrieve and filter partition metadata on tables with millions of partitions. The creation of partition indexes benefits the analytics workloads running on Amazon Athena, Amazon EMR, Amazon Redshift Spectrum, and AWS Glue.

AWS Glue Crawler support for creating partition indexes is generally available in all commercial regions where AWS Glue is available. Visit the AWS Glue Crawler documentation to learn more.