Posted On: Nov 19, 2021
Today, we're excited to announce that Amazon Athena supports AWS Glue Data Catalog partition indexes to optimize query planning and reduce query runtime. When you query a table containing a large number of partitions, Athena retrieves the available partitions from the AWS Glue Data Catalog and determines which are required by your query. As new partitions are added, the time needed to retrieve the partitions increases and can cause query runtime to increase. AWS Glue Data Catalog allows customers to create partition indexes which reduce the time required to retrieve and filter partition metadata on tables with tens and hundreds of thousands of partitions.
Using partition indexes with Athena is a simple, two-step process. Start by selecting the columns you want to index from the Glue Data Catalog and start index creation. Next, enable partition filtering on your tables and return to Athena to run your query. For more information, see AWS Glue Partition Indexing and Filtering.
Partition indexes are supported on new and existing tables so you don’t need to rebuild datasets or re-write queries to unlock the performance benefits. To learn more, see Improve Amazon Athena query performance using AWS Glue Data Catalog partition indexes.
Partition indexes also benefit the analytics workloads running on Amazon EMR, Amazon Redshift Spectrum, and AWS Glue in addition to Amazon Athena. To learn more, see Improve query performance using AWS Glue partition indexes.