How do I resolve a MSCK REPAIR TABLE command that takes too long to run or times out in Amazon Athena?

3 minute read
0

I want to resolve a MSCK REPAIR TABLE command that takes too long to run or times out in Amazon Athena.

Short description

When you run the MSCK REPAIR TABLE command, Athena lists prefixes and objects in Amazon Simple Storage Service (Amazon S3). If you have too many Amazon S3 prefixes or objects, then the command takes a long time to run or times out with an error.

To resolve this issue, use one of the following methods:

  • Use Partition projection with Athena.
  • Use an AWS Glue crawler to add partitions to your Athena tables.
  • Use an Athena DDL statement to load partitions

Resolution

Use Partition projection with Athena

Use Partition projection with Athena to generate partitions in-memory. You don't need to add the partitions to the AWS Glue Data Catalog or retrieve them from the Data Catalog. Athena calculates the value of the table instead of scanning through a large list of partitions. Also, partition projection calculates values and locations from the configuration instead of from a repository such as AWS Glue Data Catalog.

Use an AWS Glue crawler to add partitions to your Athena tables

To use an AWS Glue crawler to add partitions to your Athena tables, complete the following steps:

  1. Open the AWS Glue Console.
  2. On the navigation pane, choose Crawlers, and then choose Create crawler.
  3. Create a name for the crawler and then choose Next.
  4. Under Choose data sources and classifiers, and under Data source configuration, select yes for Is your data already mapped to Glue tables?
  5. Under Glue tables, select Add tables, and then select the required database and table. Choose Next.
  6. Select or Create an IAM role for AWS Glue. Choose Next.
  7. Under Set output and scheduling, expand the Advanced options, and then select the following:
    Ignore the change and don't update the table in the Data Catalog.
    Update all new and existing partitions with metadata from the table.
    Ignore the change and don't update the table in the Data Catalog.
  8. Choose Next, and then choose Create crawler to save the crawler configuration.
  9. Choose Run crawler. When you run the crawler, the partitions are loaded to the table.

For more information, see How crawlers work and Incremental crawls for adding new partitions in AWS Glue.

Use an Athena DDL statement to load partitions

To use an Athena DDL statement to load partitions, complete the following steps:

  1. Open the Amazon Athena console.
  2. On the Query editor tab, run the ALTER TABLE ADD PARTITION command to load the partitions.

Related information

Why do my Athena queries take a long time to run?

Creating tables, updating schema, and adding new partitions in the Data Catalog from AWS Glue ETL jobs

AWS Glue Pricing

AWS OFFICIAL
AWS OFFICIALUpdated 3 months ago