How can I use logs to troubleshoot issues with Hive queries in Amazon EMR?

4 minute read

I'm having trouble with Apache Hive queries in Amazon EMR. I want to collect logs so that I can troubleshoot these issues.

Short description

Amazon EMR supports the following methods for working with Hive. Troubleshooting steps differ depending on which method you use:

Hive shell
Hadoop User Experience (Hue), Java Database Connectivity (JDBC), or Open Database Connectivity (ODBC) (used with clients such as Beeline and SQL Workbench/J)
Amazon EMR steps
YARN applications

Resolution

Hive shell

Hive logs are stored in the following directories on the cluster's master node. For more information, see View log files on the master node.

/mnt/var/log/hive/
/mnt/var/log/hive/user/

Based on where you submitted your Hive query, your query logs are logged in different locations under /mnt/var/hive/ of Amazon EMR master node. Logs in this location are also pushed to the Amazon S3 LogUri that you configured when you created the Amazon EMR cluster.

Example:

s3://example-location/example-cluster-id/node/example-instance-id/applications/hive

For example, if you run queries from the Hive shell as hadoop (the default user), query errors are logged in the following directory:

[hadoop@ip-172-xx-xx-x ~]$ cd /mnt/var/log/hive/user/hadoop
[hadoop@ip-172-xx-xx-x hadoop]$ tail -20 hive.log

Hue, JDBC, or ODBC

HiveServer2 allows clients, such as Beeline, JDBC, ODBC (via SQL Workbench/J, for example) to run queries against Hive.

For more information on clients supported by HiveServer2, see HiveServer2 clients in the Confluence website.

Check for errors in the hive-server2 logs under the following conditions:

You need to troubleshoot a failed query submitted by one of these clients.
You have trouble connecting to Hive from clients using JDBC or ODBC drivers.

[hadoop@ip-172-xx-xx-x ~]$ cd /mnt/var/log/hive/
[hadoop@ip-172-xx-xx-xxx hive]$ ls -ltr
total 52
-rw-r--r-- 1 hive hive 42 May 25 19:29 hive-server2.out
drwxrwxrwt 4 root root 30 May 25 19:29 user
-rw-r--r-- 1 hive hive 49075 May 25 19:29 hive-server2.log

[hadoop@ip-172-31-33-9 hive]$ tail -20 hive-server2.log

Note that by default, all Hive queries on Amazon EMR use the TEZ engine. The query might trigger a YARN application. To troubleshoot the failure of a YARN application, see the YARN container logs. For more information, see the YARN application history section in this article.

Amazon EMR steps

Check the step logs, which are located in /var/log/hadoop/steps/. For example:

[hadoop@ip-172-xx-xx-x s-3C4CZ9G05FEAX]$cd /var/log/hadoop/steps/s-3C4CZ9G05FEAX
[hadoop@ip-172-xx-xx-x s-3C4CZ9G05FEAX]$ ls -ltr
total 12
-rw-rw-r-- 1 hadoop hadoop 0 May 25 21:09 syslog
-rw-rw-r-- 1 hadoop hadoop 1304 May 25 21:09 stdout
-rw-rw-r-- 1 hadoop hadoop 213 May 25 21:09 stderr
-rw-rw-r-- 1 hadoop hadoop 2589 May 25 21:09 controller

YARN application history

The easiest way to view and monitor YARN application details is to first open the Amazon EMR console. Then, check the Application history tab of the cluster's detail page. For more information, see View application history.

To see if errors occurred in a Tez or MapReduce application that runs in the background when you run a Hive query, check the YARN application logs on Amazon Simple Storage Service (Amazon S3). For more information, see View log files archived to Amazon S3. For example:

$ aws s3 ls s3://aws-logs-223377617334-us-west-2/elasticmapreduce/j-3MCDUQO2MWNJ5/
                           PRE containers/
                           PRE node/
                           PRE steps/
$ aws s3 ls s3://aws-logs-223377617334-us-west-2/elasticmapreduce/j-3MCDUQO2MWNJ5/node/i-045d100a1fcd13ef2/
                           PRE applications/
                           PRE bootstrap-actions/
                           PRE daemons/
                           PRE provision-node/
                           PRE setup-devices/
$ aws s3 ls s3://aws-logs-223377617334-us-west-2/elasticmapreduce/j-3MCDUQO2MWNJ5/containers/application_123456789_0001/container_1527279117205_0001_01_000001/
2020-10-25 15:46:04 842 stdout.gz
2020-10-25 15:46:04 4089 syslog.gz

Note: If you receive errors when running AWS Command Line Interface (AWS CLI) commands, make sure that you’re using the most recent AWS CLI version.

Related information

How do I resolve "OutOfMemoryError" Hive Java heap space exceptions on Amazon EMR that occur when Hive outputs the query results?

Hive cluster errors

Topics

Analytics

Relevant content

How to setup S3 as hive metastore for EMR Serverless
AWS-User-3610377
asked 9 months ago
EMR Presto Use External Hive RDS Metastore
Adrian Cubillo G
asked 2 years ago
EMR with Glue Data Catalog as Hive Metastore
Accepted Answer
zying
asked 4 months ago
EMR Hive read/write performance issues when using S3 as storage layer
AWS-User-2695166
asked 2 years ago
EMR Hive Meta Database on RDS Upgrade Issue
Accepted Answer
Lei
asked 5 months ago
How do I resolve "OutOfMemoryError" Hive Java heap space exceptions on Amazon EMR that occur when Hive outputs the query results?
AWS OFFICIALUpdated 2 years ago
How do I set up an SSL connection between Hive on Amazon EMR and a metastore on Amazon RDS for MySQL?
AWS OFFICIALUpdated 2 years ago
How do I use a PostgreSQL database as the external metastore for Hive on Amazon EMR?
AWS OFFICIALUpdated 2 years ago
How can I use Hive and Spark on Amazon EMR to query an AWS Glue Data Catalog that's in a different AWS account?
AWS OFFICIALUpdated a year ago
Tableau Integration with Kerberos EMR Cluster
SUPPORT ENGINEER
Yokesh NK
published a month ago