How can I use logs to troubleshoot issues with Hive queries in Amazon EMR?

4 minute read
0

I'm having trouble with Apache Hive queries in Amazon EMR. I want to collect logs so that I can troubleshoot these issues.

Short description

Amazon EMR supports the following methods for working with Hive. Troubleshooting steps differ depending on which method you use:

Resolution

Hive shell

Hive logs are stored in the following directories on the cluster's master node. For more information, see View log files on the master node.

  • /mnt/var/log/hive/
  • /mnt/var/log/hive/user/

Based on where you submitted your Hive query, your query logs are logged in different locations under /mnt/var/hive/ of Amazon EMR master node. Logs in this location are also pushed to the Amazon S3 LogUri that you configured when you created the Amazon EMR cluster.

Example:

s3://example-location/example-cluster-id/node/example-instance-id/applications/hive

For example, if you run queries from the Hive shell as hadoop (the default user), query errors are logged in the following directory:

[hadoop@ip-172-xx-xx-x ~]$ cd /mnt/var/log/hive/user/hadoop
[hadoop@ip-172-xx-xx-x hadoop]$ tail -20 hive.log

Hue, JDBC, or ODBC

HiveServer2 allows clients, such as Beeline, JDBC, ODBC (via SQL Workbench/J, for example) to run queries against Hive.

For more information on clients supported by HiveServer2, see HiveServer2 clients in the Confluence website.

Check for errors in the hive-server2 logs under the following conditions:

  • You need to troubleshoot a failed query submitted by one of these clients.
  • You have trouble connecting to Hive from clients using JDBC or ODBC drivers.
[hadoop@ip-172-xx-xx-x ~]$ cd /mnt/var/log/hive/
[hadoop@ip-172-xx-xx-xxx hive]$ ls -ltr
total 52
-rw-r--r-- 1 hive hive 42 May 25 19:29 hive-server2.out
drwxrwxrwt 4 root root 30 May 25 19:29 user
-rw-r--r-- 1 hive hive 49075 May 25 19:29 hive-server2.log

[hadoop@ip-172-31-33-9 hive]$ tail -20 hive-server2.log

Note that by default, all Hive queries on Amazon EMR use the TEZ engine. The query might trigger a YARN application. To troubleshoot the failure of a YARN application, see the YARN container logs. For more information, see the YARN application history section in this article.

Amazon EMR steps

Check the step logs, which are located in /var/log/hadoop/steps/. For example:

[hadoop@ip-172-xx-xx-x s-3C4CZ9G05FEAX]$cd /var/log/hadoop/steps/s-3C4CZ9G05FEAX
[hadoop@ip-172-xx-xx-x s-3C4CZ9G05FEAX]$ ls -ltr
total 12
-rw-rw-r-- 1 hadoop hadoop 0 May 25 21:09 syslog
-rw-rw-r-- 1 hadoop hadoop 1304 May 25 21:09 stdout
-rw-rw-r-- 1 hadoop hadoop 213 May 25 21:09 stderr
-rw-rw-r-- 1 hadoop hadoop 2589 May 25 21:09 controller

YARN application history

The easiest way to view and monitor YARN application details is to first open the Amazon EMR console. Then, check the Application history tab of the cluster's detail page. For more information, see View application history.

To see if errors occurred in a Tez or MapReduce application that runs in the background when you run a Hive query, check the YARN application logs on Amazon Simple Storage Service (Amazon S3). For more information, see View log files archived to Amazon S3. For example:

$ aws s3 ls s3://aws-logs-223377617334-us-west-2/elasticmapreduce/j-3MCDUQO2MWNJ5/
                           PRE containers/
                           PRE node/
                           PRE steps/
$ aws s3 ls s3://aws-logs-223377617334-us-west-2/elasticmapreduce/j-3MCDUQO2MWNJ5/node/i-045d100a1fcd13ef2/
                           PRE applications/
                           PRE bootstrap-actions/
                           PRE daemons/
                           PRE provision-node/
                           PRE setup-devices/
$ aws s3 ls s3://aws-logs-223377617334-us-west-2/elasticmapreduce/j-3MCDUQO2MWNJ5/containers/application_123456789_0001/container_1527279117205_0001_01_000001/
2020-10-25 15:46:04 842 stdout.gz
2020-10-25 15:46:04 4089 syslog.gz

Note: If you receive errors when running AWS Command Line Interface (AWS CLI) commands, make sure that you’re using the most recent AWS CLI version.


Related information

How do I resolve "OutOfMemoryError" Hive Java heap space exceptions on Amazon EMR that occur when Hive outputs the query results?

Hive cluster errors

AWS OFFICIAL
AWS OFFICIALUpdated 2 years ago