How do I resolve the "java.lang.OutOfMemoryError: GC overhead limit exceeded" exception in Amazon EMR?

Last updated: 2019-10-10

The NameNode service in Amazon EMR fails with the following exception: "java.lang.OutOfMemoryError: GC overhead limit exceeded."

Short Description

The NameNode service uses memory to store namespace objects and metadata for files stored in HDFS. The more files that you have in HDFS, the more memory that NameNode uses. The "java.lang.OutOfMemoryError: GC overhead limit exceeded" error indicates that the NameNode heap size is insufficient for the amount of HDFS data in the cluster. Increase the heap size to prevent out-of-memory exceptions.

Resolution

Check the logs to confirm the error

1.    Connect to the master node using SSH.

2.    Run the following command on the master node to check the status of the NameNode service:

initctl list

The following output indicates that the NameNode service has stopped:

hadoop-hdfs-namenode stop/waiting

3.    Check the NameNode log at the following path to confirm the OutofMemory exception: /var/log/hadoop-hdfs/hadoop-hdfs-namenode-ip-xxxx.out. Replace xxxx with the private IP address of the master node (for example: /var/log/hadoop-hdfs/hadoop-hdfs-namenode-ip-10-0-1-109.out).

An output like this confirms that the NameNode service failed because of an OutOfMemory exception:

# java.lang.OutOfMemoryError: GC overhead limit exceeded
# -XX:OnOutOfMemoryError="kill -9 %p
kill -9 %p

Increase the NameNode Heap size

Important: This configuration change requires a restart of the NameNode service. Be sure that no HDFS read or write operations are performed while you're making the change.

For Amazon EMR release versions 5.21.0 and later:

To increase the heap size, supply a hadoop-env configuration object for the instance group on a running cluster. Or, add the configuration object when you launch a new cluster. The following configuration object increases the heap size from 1 GB to 2 GB. Choose a size that's appropriate for your workload.

[
  {
    "Classification": "hadoop-env",
    "Properties": {
      
    },
    "Configurations": [
      {
        "Classification": "export",
        "Properties": {
          "HADOOP_NAMENODE_HEAPSIZE": "2048"
        },
        "Configurations": [
          
        ]
      }
    ]
  }
]

Amazon EMR applies your new configurations and gracefully restarts the NameNode process.

For Amazon EMR release versions 5.20.0 and earlier:

1.    Connect to the master node using SSH.

2.    In the /etc/hadoop/conf/hadoop-env.sh file, increase the NameNode heap size. Choose a size that's appropriate for your workload. Example:

export HADOOP_NAMENODE_HEAPSIZE=2048

3.    Save your changes.

4.    Restart the NameNode service:

sudo stop hadoop-hdfs-namenode
sudo start hadoop-hdfs-namenode

5.    Confirm that the NameNode process is running:

initctl list

A successful output looks like this:

hadoop-hdfs-namenode start/running, process 6324

6.    Confirm that HDFS commands are working:

hdfs dfs -ls /

A successful output looks like this:

Found 4 items
drwxr-xr-x   - hdfs hadoop          0 2019-09-26 14:02 /apps
drwxrwxrwt   - hdfs hadoop          0 2019-09-26 14:03 /tmp
drwxr-xr-x   - hdfs hadoop          0 2019-09-26 14:02 /user
drwxr-xr-x   - hdfs hadoop          0 2019-09-26 14:02 /var