My Amazon EMR Hive query fails with an intermittent hive-staging FileNotFoundException
Last updated: 2020-08-28
When I try to write data to Apache Hive tables located in an Amazon Simple Storage Service (Amazon S3) bucket using an Amazon EMR cluster, the query fails with one of the following errors:
- java.io.FileNotFoundException File s3://awsdoc-example-bucket/.hive-staging_hive_xxx_xxxx does not exist.
- java.io.IOException: rename for src path ERROR
When you run INSERT INTO, INSERT OVERWRITE, or other PARTITION commands, Hive creates staging directories in the same S3 bucket as the table. To write the staging query data to that S3 bucket, Hive runs a RENAME operation.
The RENAME operation includes low-level S3 API calls such as HEAD, GET, and PUT. If Hive makes a HEAD or GET request to a key name before creating that file, Amazon S3 provides eventual consistency for read-after-write. When this happens, Hive can't rename the temporary directory to the final output directory. This causes an error such as java.io.IOException or java.io.FileNotFoundException. For more information, see Amazon S3 data consistency model.
Note: The following steps apply to Amazon EMR release version 3.2.1 and later. If your cluster uses Amazon EMR version 5.7.0 or earlier, we recommend upgrading to version 5.8.0 or later. Versions 5.8.0 and later include Hive 2.3.x. The java.io.IOException and java.io.FileNotFoundException errors can still happen in Hive 2.3.x, but only with tables that are stored in Amazon S3. These errors don't happen with HDFS tables, because Hive creates the staging directory in a strongly consistent HDFS location, rather than in the same directory as the table that you're querying.
2. Locate the Hive error log in the /mnt/var/log/hive/user/hadoop/hive.log directory or the YARN application container log under your Amazon S3 log URI, as shown in the following example. For more information, see View log files.
3. Look for error messages like this:
2020-08-27T11:53:28,837 ERROR [HiveServer2-Background-Pool: Thread-64()]: ql.Driver (SessionState.java:printError(1097)) - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 6, vertexId=vertex_1525862550243_0001_1_03, diagnostics=[Vertex vertex_1525862550243_0001_1_03 [Map 6] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: r initializer failed, vertex=vertex_1525862550243_0001_1_03 [Map 6], java.io.FileNotFoundException: File s3://awsdoc-example-bucket/folder/subfolder/subfolder/.hive-staging_hive_2020-08-25_09-36-30_835_6368934499747071892-1 does not exist. at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.listStatus(S3NativeFileSystem.java:972)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: s3://awsdoc-example-bucket/demo.db/folder/ingestion_date=20200827/.hive-staging_hive_2020-08-27_13-52-51_942_3098569974412217069-5/_task_tmp.-ext-10000/_tmp.000000_2 to: s3://awsdoc-example-bucket/demo.db/folder/ingestion_date=20200827/.hive-staging_hive_2019-10-27_13-52-51_942_3098569974412217069-5/_tmp.-ext-10000/000000_2 at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:247)
4. If either of these errors are in your logs, it means that Hive made a HEAD request during the RENAME operation before the file was created. To resolve these errors, enable EMRFS consistent view. For more information, see Consistent view.
If neither of these errors are in your logs, see How can I use logs to troubleshoot issues with Hive queries in Amazon EMR?
5. If you still get these errors after enabling consistent view, configure additional settings for consistent view. For example, if Amazon DynamoDB is throttling the EMRFS table, change the following parameters in emrfs-site.xml to increase the table's read and write capacity units:
When a request fails because of java.io.FileNotFoundException or java.io.IOException, EMRFS retries the request using the default values in emrfs-site.xml. EMRFS continues to retry the request until Amazon S3 is consistent or until reaching the value defined in fs.s3.consistent.retryCount. If EMRFS reaches the retry count before the operation succeeds, you get a ConsistencyException. To resolve this problem, increase fs.s3.consistent.retryCount.