The AWS Glue crawler fails with an internal service exception

Last updated: 2019-09-24

How do I prevent AWS Glue crawlers from failing with "ERROR : Internal Service Exception"?

Resolution

Crawler internal service exceptions are sometimes caused by transient issues. Before you start troubleshooting, run the crawler again. If you still get an internal service exception, check for the following common problems:

AWS Glue Data Catalog

  • Be sure that column names don't exceed 255 characters and don't contain special characters. For more information about column requirements, see Column.
  • Check for malformed data. For example, if the column name doesn't conform to the regular expression pattern "[\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\t]", then the crawler won't work.
  • Check for columns that have a length of 0. This happens when columns in the data don't match the data format of the table.
  • If your data contains DECIMAL columns with the "(precision, scale)" format, be sure that the scale value is less than or equal to the precision value.

Amazon Simple Storage Service (Amazon S3)

  • Be sure that the Amazon S3 path doesn't contain special characters.
  • Confirm that the AWS Identity and Access Management (IAM) role for the crawler has permissions to access the Amazon S3 path. For more information, see Create an IAM Role for AWS Glue.
  • Having a large number of small files can cause the crawler to fail with an internal service exception. To avoid this problem, use the S3DistCp tool to combine smaller files. You incur additional Amazon EMR charges when you use S3DistCp. Or, set exclude patterns and then crawl the files in batches.
  • Remove special ASCII characters such as ^, %, and ~ from your data, if possible. If that's not possible, use custom classifiers to classify your data.
  • Confirm that the S3 objects use the STANDARD storage class. To restore objects to the STANDARD storage class, see How Do I Restore an S3 Object That Has Been Archived?
  • Confirm that the include and exclude patterns in the crawler configuration match the S3 bucket paths.
  • If you're crawling an encrypted S3 bucket, confirm that the IAM role for the crawler has the appropriate permissions for the AWS Key Management Service (AWS KMS) key. For more information, see Working with Security Configurations on the AWS Glue Console and Setting Up Encryption in AWS Glue.
  • If you're crawling an encrypted S3 bucket, be sure that the bucket, KMS key, and AWS Glue job are in the same AWS Region.
  • Check the request rate on the S3 bucket that you're crawling. If it's high, consider creating more prefixes to parallelize reads. For more information, see Best Practices Design Patterns: Optimizing Amazon S3 Performance.
  • Be sure that the S3 bucket partitions and keys are consistent. For example, if the crawler expects the objects to use the path s3://mybucket/yyyy=xxxx/mm=xxx/dd=xx/[files], but some of the objects use the path s3://mybucket/yyyy=xxxx/mm=xxx/[files], the crawler fails with an internal service exception.

Amazon DynamoDB

JDBC

  • If you're crawling a JDBC data source that is encrypted with AWS KMS, check the subnet that you're using for the connection. The subnet's route table must have a route to the KMS endpoint.
  • Be sure that you're using the correct Include path syntax. For more information, see Defining Crawlers.
  • If you're crawling a JDBC data store, confirm that the SSL connection is configured correctly. If you're not using an SSL connection, be sure that Require SSL connection is not selected when you configure the crawler.
  • Confirm that the database name in the AWS Glue connection matches the database name in the crawler's Include path, and be sure that you entered the Include path correctly. For more information, see Working with Connections on the AWS Glue Console and Include and Exclude Patterns.

AWS KMS

  • If you're using AWS KMS, the AWS Glue crawler must have access to AWS KMS. To grant access, select the Enable Private DNS Name option when you create the KMS endpoint. Then, add the KMS endpoint to the VPC subnet configuration for the AWS Glue connection. For more information, see Creating an AWS KMS VPC Endpoint (VPC Console).