After executing a query, Apache Hive throws an OutOfMemoryError exception while outputting the query results. How do I resolve this?

The OutOfMemoryError exception commonly occurs during INSERT OVERWRITE commands when there is insufficient heap space on hive-server2, on the Hive metastore, or on the client side while Hive writes the query output.

Note: This resolution doesn't cover OutOfMemoryError exceptions that occur during Tez container memory tuning.

Increase the maximum memory allocation for the JVM

When you launch a Hive shell, your process is allowed to use up to 1000 MB of memory by default. The maximum memory allocation is defined by the -Xmx parameter. If your process attempts to use more than the maximum value, Hive kills the process and throws the OutOfMemoryError exception. To resolve this issue, increase the -Xmx value in the Hive shell script as appropriate for your use case, and then run your Hive query again.

Increase HADOOP_HEAPSIZE

Check for the following error in /mnt/var/log/hive/user/hadoop/hive.log:

# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
#   Executing /bin/sh -c "kill -9 12345"...
Killed

If you find this error message, the JVM heap space is running out of memory. Increase HADOOP_HEAPSIZE for the Hive CLI service in /etc/hive/conf/hive-env.sh, as shown in the following example. The default value is 1000. Increase it as appropriate for your use case and run your Hive query again. 

export HADOOP_HEAPSIZE=2000

Important: This setting applies to hive-server2, the Hive metastore, and the Hive CLI after these services are restarted. Optionally, you can set separate values for each of these services.

Beeline or SQL Workbench/J

1.    If you are running the same query from Beeline or SQL Workbench/J, check /mnt/var/log/hive/hive-server2.log and hive-server2.out for heap space garbage collection errors that are similar to the following:

Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://emr-analytics_master.abc.aws.> select id, name, x.* from mydb.location a, curated_admin.nxpepnd1_tpn_prvdr_pra_fclt b, curated_admin.test_table c where a.test = b._id and a._id = b._id and b._prod_id = c.prod_f_id;
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
#   Executing /bin/sh -c "kill -9 27745"...
Killed

2.    If you find errors like this, increase HADOOP_HEAPSIZE for hive-server2 in /etc/hive/conf/hive-env.sh. This setting also applies to the Hive metastore and the Hive client.

export HADOOP_HEAPSIZE=2048

Optionally, use conditional statements to specify different heap sizes for hive-server2, the metastore, and the client, as shown in the following example: 

export HIVE_CLIENT_HEAPSIZE=1024
export HIVE_METASTORE_HEAPSIZE=2048
export HIVE_SERVER2_HEAPSIZE=3072 
if [ "$SERVICE" = "metastore" ]
then
export HADOOP_HEAPSIZE=$HIVE_METASTORE_HEAPSIZE
elif [ "$SERVICE" = "hiveserver2" ]
then
export HADOOP_HEAPSIZE=$HIVE_SERVER2_HEAPSIZE
export HADOOP_OPTS="$HADOOP_OPTS -server -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/hive"
else
export HADOOP_HEAPSIZE=$HIVE_CLIENT_HEAPSIZE
fi

3.    Run the Hive query again after you update these settings. If you still get the OutOfMemoryError exception and you are running multiple clients at the same time, continue to step 4. If you still get the OutOfMemoryError exception and you aren't running multiple clients at the same time, skip to step 8.

4.    Increase the -Xmx parameters for each client, as appropriate for your use case.

5.    Choose the garbage collector that is appropriate for your use case by adding -XX:+UseParNewGC (new parallel garbage collector) or -XX:+UseConcMarkSweepGC (concurrent mark sweep garbage collector) in the HADOOP_OPTS lines, as shown in the following example. For more information about choosing a garbage collector, see Tuning Garbage Collection with the 1.4.2 Java[tm] Virtual Machine and Java HotSpot VM Options.

export HADOOP_HEAPSIZE=2048
if [ "$SERVICE" = "cli" ]; then
if [ -z "$DEBUG" ]; then
export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xmx12288m -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+useParNewGC -XX:-useGCOverheadLimit"
else
export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xmx12288m -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-useGCOverheadLimit"
fi
fi

6.    The OutOfMemoryError exception is thrown if garbage collection doesn't succeed in a specified amount of time. Remove -XX:-UseGCOverheadLimit or replace it with -XX:+UseGCOverheadLimit to remove the time limit. Optionally, modify -XX:-UseGCOverheadLimit to specify the time limit for garbage collection. For more information, see The Parallel Collector.

7.    Run the Hive query again.

8.    If a heap space error similar to the following is thrown on the terminal during runtime on Beeline or SQL Workbench/J and there are no errors in hive.log or hive-server2.log, then your Hive client is probably running out of memory. To resolve this error, increase the client memory and run the query again.

Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://emr-analytics_master.abc.aws.> select id, name, x.* from mydb.location a, curated_admin.nxpepnd1_tpn_prvdr_pra_fclt b, curated_admin.test_table c where a.test = b._id and a._id = b._id and b._prod_id = c.prod_f_id;
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
#   Executing /bin/sh -c "kill -9 27745"...
Killed

9.    If Hive still throws the OutOfMemoryError exception, complete the following steps for your client.

Beeline:

By default, Beeline attempts to buffer the entire output relation before printing it to stdout. This behavior can cause an OutOfMemoryError exception when the output relation is large. To resolve the OutOfMemoryError exception in Beeline, launch Beeline using the following command, and then retry the Hive query:

beeline --incremental=true

SQL Workbench/J:

In a 32-bit Java Runtime Environment (JRE), the application can use up to 1 GB of memory by default. In a 64-bit JRE, the application can use up to 65% of the available physical memory by default. To verify the amount of memory available to the application, choose Help and then choose About in SQL Workbench/J.

For macOS, increase the -Xmx1024m value in the Info.plist file as needed. Info.plist is usually located in the /Applications/SQLWorkbenchJ2.app/Contents directory. For example, to double the amount of memory available to the application, change the value from -Xmx1024m to -Xmx2048m and run the query again.

For Windows, create an INI file and then add the vm.heapsize.preferred parameter to the INI file to increase the amount of memory that is available to the application.

If you are using shell or batch scripts, you can increase the available memory for SQL Workbench/J during installation. The command in the following example creates 3 GB of available memory during installation:

java -Xmx3g -jar sqlworkbench.jar

Note: If the OutOfMemoryError exception is on the client side rather than on hive-server2 or the Hive CLI, save the output to Amazon Simple Storage Service (Amazon S3) or HDFS rather than viewing the query results on Beeline or SQL Workbench/J. Using these output options reduces the memory load on the client side.

Restarting Hive

If you modify Hive properties in hive-site.xml or hive-env.sh, you might need to restart Hive for the updated settings to be reflected on Hive services such as hive-server2 and the metastore.

For Amazon EMR release versions 4.7.0 and later:

1.    Connect to the master node using SSH.

2.    Run the following commands to restart the metastore:

sudo stop hive-hcatalog-server
sudo start hive-hcatalog-server
sudo status hive-hcatalog-server

Note: Don't attempt to restart the metastore using the sudo restart hive-hcatalog-server command, because Hive throws an error.

3.    Run the following commands to restart HiveServer2:

sudo stop hive-server2
sudo start hive-server2
sudo status hive-server2

For Amazon EMR release versions 4.0.0 to 4.6:

1.    Connect to the master node using SSH.

2.    Run the following commands to restart the metastore:

sudo stop hive-metastore
sudo start hive-metastore
sudo status hive-metastore

3.    Run the following commands to restart HiveServer2:

sudo stop hive-server2
sudo start hive-server2
sudo status hive-server2

Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center.

Published: 2018-10-24