How do I resolve the "java.lang.OutOfMemoryError: GC overhead limit exceeded" exception in Amazon EMR?

3 minute read

The NameNode service in Amazon EMR fails with the following exception: "java.lang.OutOfMemoryError: GC overhead limit exceeded."

Short description

The NameNode service uses memory to store namespace objects and metadata for files stored in HDFS. The more files that you have in HDFS, the more memory that NameNode uses. The "java.lang.OutOfMemoryError: GC overhead limit exceeded" error indicates that the NameNode heap size is insufficient for the amount of HDFS data in the cluster. Increase the heap size to prevent out-of-memory exceptions.

Resolution

Check the logs to confirm the error

1. Connect to the master node using SSH.

2. Run the following command on the master node to check the status of the NameNode service:

initctl list

The following output indicates that the NameNode service has stopped:

hadoop-hdfs-namenode stop/waiting

3. Check the NameNode log at the following path to confirm the OutofMemory exception: /var/log/hadoop-hdfs/hadoop-hdfs-namenode-ip-xxxx.out. Replace xxxx with the private IP address of the master node (for example: /var/log/hadoop-hdfs/hadoop-hdfs-namenode-ip-10-0-1-109.out).

An output similar to the following confirms that the NameNode service failed because of an OutOfMemory exception:

# java.lang.OutOfMemoryError: GC overhead limit exceeded
# -XX:OnOutOfMemoryError="kill -9 %p
kill -9 %p

Increase the NameNode Heap size

Important: This configuration change requires a restart of the NameNode service. Be sure that no HDFS read or write operations are performed while you're making the change.

For Amazon EMR release versions 5.21.0 and later:

To increase the heap size, supply a hadoop-env configuration object for the instance group on a running cluster. Or, add the configuration object when you launch a new cluster. The following configuration object increases the heap size from 1 GB to 2 GB. Choose a size that's appropriate for your workload.

[
  {
    "Classification": "hadoop-env",
    "Properties": {
      
    },
    "Configurations": [
      {
        "Classification": "export",
        "Properties": {
          "HADOOP_NAMENODE_HEAPSIZE": "2048"
        },
        "Configurations": [
          
        ]
      }
    ]
  }
]

Amazon EMR applies your new configurations and gracefully restarts the NameNode process.

For Amazon EMR release versions 5.20.0 and earlier:

1. Connect to the master node using SSH.

2. In the /etc/hadoop/conf/hadoop-env.sh file, increase the NameNode heap size. Choose a size that's appropriate for your workload. Example:

export HADOOP_NAMENODE_HEAPSIZE=2048

3. Save your changes.

4. Restart the NameNode service:

sudo stop hadoop-hdfs-namenode
sudo start hadoop-hdfs-namenode

5. Confirm that the NameNode process is running:

initctl list

A successful output looks like this:

hadoop-hdfs-namenode start/running, process 6324

6. Confirm that HDFS commands are working:

hdfs dfs -ls /

A successful output looks like this:

Found 4 items
drwxr-xr-x   - hdfs hadoop          0 2019-09-26 14:02 /apps
drwxrwxrwt   - hdfs hadoop          0 2019-09-26 14:03 /tmp
drwxr-xr-x   - hdfs hadoop          0 2019-09-26 14:02 /user
drwxr-xr-x   - hdfs hadoop          0 2019-09-26 14:02 /var

Related information

Configuring applications

How do I resolve "OutOfMemoryError" Hive Java heap space exceptions on Amazon EMR that occur when Hive outputs the query results?

Topics

Analytics

Relevant content

AWS Code Build - Account Limit Exceeded Exception
PrashantGupta
asked a month ago
GC overhead limit exceeded
rePost-User-9456311
asked 2 years ago
Amazon WorkMail "You have exceeded the sending limit for this account."
dave
asked 6 months ago
How yarn memory is allocated in emr? How to increase yarn memory?
anudeep
asked a year ago
Resource utilization exceed the limit as per billing in serverless
Accepted Answer
Vaas
asked 4 months ago
How do I resolve the error "Container killed by YARN for exceeding memory limits" in Spark on Amazon EMR?
AWS OFFICIALUpdated a year ago
How do I resolve the "failed to obtain in-memory shard lock" exception in Amazon OpenSearch Service?
AWS OFFICIALUpdated a year ago
How can I turn off Safemode for the NameNode service on my Amazon EMR cluster?
AWS OFFICIALUpdated a year ago
How do I restart a service in Amazon EMR?
AWS OFFICIALUpdated 2 years ago
Decoding Instance-State log in EMR
SUPPORT ENGINEER
Yokesh NK
published 10 days ago