How do I resolve the "HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://awsdoc-example-bucket/: Slow Down (Service: Amazon S3; Status Code: 503; Error Code: 503 Slow Down;" error in Athena?
Last updated: 2020-07-14
My Amazon Athena query fails with an error like this:
"HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://awsdoc-example-bucket/date=2020-05-29/ingest_date=2020-04-25/part-00000.snappy.parquet (offset=0, length=18614): Slow Down (Service: Amazon S3; Status Code: 503; Error Code: 503 Slow Down;"
This error usually happens when you query an Amazon Simple Storage Service (Amazon S3) bucket prefix that has a large number of objects. You can send 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per prefix in an S3 bucket. There are no limits to the number of prefixes that you can have in your bucket.
Use one of the following methods to prevent request throttling:
- Distribute the objects and requests among multiple prefixes. For more information, see Partitioning data.
- To reduce the number of Amazon S3 requests, reduce the number of files. For example, use the S3DistCp tool to merge a large number of small files (less than 128 MB) into a smaller number of large files. For more information, see Top 10 performance tuning tips for Amazon Athena and review the 4. Optimize file sizes section.
Note: S3DistCp doesn't support concatenation for Parquet files. Use PySpark instead. For more information, see How can I concatenate Parquet files in Amazon EMR?
- Use the Amazon CloudWatch 5xxErrors metric and Amazon S3 server access logs to see if other applications or AWS services were using the same prefix when the Athena query failed. To avoid throttling, use different Amazon S3 prefixes for the Athena data source and the application data source.