Short description
------------------



This issue affects Amazon EMR release versions **5.19.0 - 5.21.0**. In these versions, Amazon EMR stores node label files in HDFS:


* DEFAULT\_DIR\_NAME = "node-labels"
* MIRROR\_FILENAME = "nodelabel.mirror"
* EDITLOG\_FILENAME = "nodelabel.editlog"


Amazon EMR stores these files at the following location in **yarn-site.xml** on all nodes: **yarn.node-labels.fs-store.root-dir: '/apps/yarn/nodelabels'**. The issue happens when these files become corrupted when you lose all nodes that contain the file's blocks during a resize operation. ResourceManager then restarts, gets stuck in a restart loop, and then CommonNodeLabelsManager throws an exception.


To find the exception, search for "org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager" in **/var/log/hadoop-yarn/yarn-yarn-resourcemanager-\*.log**.


To resolve this error, delete the node label files. Then, restart ResourceManager to recreate the files.



 Resolution
-----------



1.    Check file system health and locate the blocks:





```plaintext
hdfs fsck /apps/yarn/nodelabels/ -locations -blocks -files
```



2.    Remove the files:





```plaintext
hdfs dfs -rm -skipTrash /apps/yarn/nodelabels/*
```



3.    Restart ResourceManager:





```plaintext
sudo stop hadoop-yarn-resourcemanager
sudo start hadoop-yarn-resourcemanager
```



4.    When ResourceManager restarts, it recreates the node label files. This resolves the restart loop. However, you can't submit YARN applications yet. Before you can submit YARN applications, manually add node label entries:





```plaintext
yarn rmadmin -addToClusterNodeLabels "CORE(exclusive=false)"
```



5.    List the labels to confirm that ResourceManager recreated them:





```plaintext
yarn cluster --list-node-labels
```




---




 Related information
--------------------



[Understand node types: master, core, and task nodes](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-master-core-task-nodes.html)







I enabled node labels on an Amazon EMR cluster. Then, YARN ResourceManager failed.

Resolve node label and YARN ResourceManager Failures in Amazon EMR

How can I resolve node label and YARN ResourceManager failures in Amazon EMR?

Short description

Resolution

Related information

Relevant content