I'm having trouble with Apache Hive queries in Amazon EMR. How do I collect logs so that I can troubleshoot these issues?
Amazon EMR supports the following methods for working with Hive. Troubleshooting steps differ depending on which method you use:
Hive logs are stored in the following directories on the master node of the Amazon EMR cluster. For more information, see View Log Files.
All query-related errors are logged in the /mnt/var/log/hive/user/ directory. For example, if you run queries from the Hive shell as hadoop (the default user), errors from any queries that you run are logged in the following directory:
[hadoop@ip-172-31-33-9 ~]$ cd /mnt/var/log/hive/user/hadoop [hadoop@ip-172-31-33-9 hadoop]$ tail -20 hive.log
If you run queries from the Hive shell as root (sudo), check the following log for query-related errors:
[hadoop@ip-172-31-33-9 ~]$ cd /mnt/var/log/hive/user/root [hadoop@ip-172-31-33-9 root]$ tail -20 hive.log
Hue, JDBC, or ODBC
HiveServer2 allows clients such as Beeline or SQL Workbench/J to run queries against Hive. For more information, see HiveServer2 Overview on the Apache Hive website. If you have trouble connecting to Hive from clients using JDBC or ODBC drivers, check for errors in the hive-server2 logs:
[hadoop@ip-172-31-33-9 ~]$ cd /mnt/var/log/hive/ [hadoop@ip-172-31-27-169 hive]$ ls -ltr total 52 -rw-r--r-- 1 hive hive 42 May 25 19:29 hive-server2.out drwxrwxrwt 4 root root 30 May 25 19:29 user -rw-r--r-- 1 hive hive 49075 May 25 19:29 hive-server2.log [hadoop@ip-172-31-33-9 hive]$ tail -20 hive-server2.log
The hive-server2 logs can also be used to troubleshoot service-related problems such as slow queries, HiveServer2 start failures, query submission issues, and so on.
Amazon EMR steps
Check the Step Logs, which are located in /var/log/hadoop/steps/. For example:
[hadoop@ip-172-31-33-9 s-3C4CZ9G05FEAX]$cd /var/log/hadoop/steps/s-3C4CZ9G05FEAX [hadoop@ip-172-31-33-9 s-3C4CZ9G05FEAX]$ ls -ltr total 12 -rw-rw-r-- 1 hadoop hadoop 0 May 25 21:09 syslog -rw-rw-r-- 1 hadoop hadoop 1304 May 25 21:09 stdout -rw-rw-r-- 1 hadoop hadoop 213 May 25 21:09 stderr -rw-rw-r-- 1 hadoop hadoop 2589 May 25 21:09 controller
YARN application history
The easiest way to view and monitor YARN application details is to open the Amazon EMR console and then use the Application history tab of the cluster's detail page. For more information, see View Application History.
To see if errors occurred in a Tez or MapReduce application that is launched in the background when you run a Hive query, check YARN application logs on Amazon Simple Storage Service (Amazon S3). For more information, see View Log Files Archived to Amazon S3. For example:
$ aws s3 ls s3://aws-logs-223377617334-us-west-2/elasticmapreduce/j-3MCDUQO2MWNJ5/ PRE containers/ PRE node/ PRE steps/ $ aws s3 ls s3://aws-logs-223377617334-us-west-2/elasticmapreduce/j-3MCDUQO2MWNJ5/node/i-045d100a1fcd13ef2/ PRE applications/ PRE bootstrap-actions/ PRE daemons/ PRE provision-node/ PRE setup-devices/ $ aws s3 ls s3://aws-logs-223377617334-us-west-2/elasticmapreduce/j-3MCDUQO2MWNJ5/containers/application_1527279117205_0001/container_1527279117205_0001_01_000001/ 2018-05-25 15:46:04 842 stdout.gz 2018-05-25 15:46:04 4089 syslog.gz