How can I use logs to troubleshoot issues with Hive queries in Amazon EMR?

Last updated: 2020-12-03

I'm having trouble with Apache Hive queries in Amazon EMR. How do I collect logs so that I can troubleshoot these issues?

Short description

Amazon EMR supports the following methods for working with Hive. Troubleshooting steps differ depending on which method you use:

Resolution

Hive shell

Hive logs are stored in the following directories on the cluster's master node. For more information, see View log files on the master node.

  • /mnt/var/log/hive/
  • /mnt/var/log/hive/user/

All query-related errors are logged in the /mnt/var/log/hive/user/ directory. For example, if you run queries from the Hive shell as hadoop (the default user), query errors are logged in the following directory:

[hadoop@ip-172-xx-xx-x ~]$ cd /mnt/var/log/hive/user/hadoop
[hadoop@ip-172-xx-xx-x hadoop]$ tail -20 hive.log

If you run queries from the Hive shell as root (sudo), check the following log for query errors:

[hadoop@ip-172-xx-xx-x ~]$ cd /mnt/var/log/hive/user/root
[hadoop@ip-172-xx-xx-x root]$ tail -20 hive.log

Hue, JDBC, or ODBC

HiveServer2 allows clients such as Beeline or SQL Workbench/J to run queries against Hive. For more information, see HiveServer2 overview on the Hive website. If you have trouble connecting to Hive from clients using JDBC or ODBC drivers, check for errors in the hive-server2 logs:

[hadoop@ip-172-xx-xx-x ~]$ cd /mnt/var/log/hive/
[hadoop@ip-172-xx-xx-xxx hive]$ ls -ltr
total 52
-rw-r--r-- 1 hive hive 42 May 25 19:29 hive-server2.out
drwxrwxrwt 4 root root 30 May 25 19:29 user
-rw-r--r-- 1 hive hive 49075 May 25 19:29 hive-server2.log

[hadoop@ip-172-31-33-9 hive]$ tail -20 hive-server2.log

You can also use the hive-server2 logs to troubleshoot service-related problems such as slow queries, HiveServer2 start failures, query submission issues, and so on.

Amazon EMR steps

Check the step logs, which are located in /var/log/hadoop/steps/. For example:

[hadoop@ip-172-xx-xx-x s-3C4CZ9G05FEAX]$cd /var/log/hadoop/steps/s-3C4CZ9G05FEAX
[hadoop@ip-172-xx-xx-x s-3C4CZ9G05FEAX]$ ls -ltr
total 12
-rw-rw-r-- 1 hadoop hadoop 0 May 25 21:09 syslog
-rw-rw-r-- 1 hadoop hadoop 1304 May 25 21:09 stdout
-rw-rw-r-- 1 hadoop hadoop 213 May 25 21:09 stderr
-rw-rw-r-- 1 hadoop hadoop 2589 May 25 21:09 controller

YARN application history

The easiest way to view and monitor YARN application details is to open the Amazon EMR console and then check the Application history tab of the cluster's detail page. For more information, see View application history.

To see if errors occurred in a Tez or MapReduce application that runs in the background when you run a Hive query, check the YARN application logs on Amazon Simple Storage Service (Amazon S3). For more information, see View log files archived to Amazon S3. For example:

$ aws s3 ls s3://aws-logs-223377617334-us-west-2/elasticmapreduce/j-3MCDUQO2MWNJ5/
                           PRE containers/
                           PRE node/
                           PRE steps/
$ aws s3 ls s3://aws-logs-223377617334-us-west-2/elasticmapreduce/j-3MCDUQO2MWNJ5/node/i-045d100a1fcd13ef2/
                           PRE applications/
                           PRE bootstrap-actions/
                           PRE daemons/
                           PRE provision-node/
                           PRE setup-devices/
$ aws s3 ls s3://aws-logs-223377617334-us-west-2/elasticmapreduce/j-3MCDUQO2MWNJ5/containers/application_123456789_0001/container_1527279117205_0001_01_000001/
2020-10-25 15:46:04 842 stdout.gz
2020-10-25 15:46:04 4089 syslog.gz

Note: If you receive errors when running AWS Command Line Interface (AWS CLI) commands, make sure that you’re using the most recent AWS CLI version.