How can I configure or modify node labeling with the Amazon EMR YARN Scheduler queue?

Last updated: 2022-12-13

I want to configure or modify node labeling using the Amazon EMR Apache YARN Scheduler queue.

Short description

The default YARN node label settings for EMR clusters are as follows:

Amazon EMR version 5.19.x and above in Amazon EMR-5.x.x series:

The YARN node labels feature is turned on by default. This means that CORE node label is created for core nodes with the following properties. YARN Application-Master containers are allocated only on core nodes. For all other containers, there isn't a partition restriction. You can allocate the containers on either core or task nodes.

yarn.node-labels.enabled: true
yarn.node-labels.am.default-node-label-expression: 'CORE'

Amazon EMR version 6.X and above:

The YARN node labels feature is turned off by default. The application master processes can run on both core and task nodes.

Resolution

Note: It's a best practice to perform changes in a test environment before proceeding in your production environment. Also, when you turn off the YARN node label feature, the Application-Master container launches in any node type, such as core or task. There's no restriction for task nodes. If you configure the task nodes with Spot Instances, then running jobs might fail if the task node goes down due to spot capacity constraint.

Turn off the YARN labels feature in Amazon EMR version 5.19.x and above in Amazon EMR-5.x.x series

Turn off the default YARN label feature when creating a new EMR cluster:

1.    Add the following properties in the Edit software settings, Enter configuration section:

[
  {
    "Classification": "yarn-site",
    "Properties": {
      "yarn.node-labels.enabled": "false",
      "yarn.node-labels.am.default-node-label-expression": ""
    }
  }
]

2.    Create a script with the .sh extension using the following content, and upload it to an Amazon Simple Storage Service (Amazon S3) bucket.

3.    In the Bootstrap Actions section, add the newly created script as a custom action and proceed with cluster creation.

#!/bin/bash
sudo sed -i 's/yarn rmadmin -addToClusterNodeLabels "CORE(exclusive=false)"/echo "NO LABELS"/g' /var/aws/emr/bigtop-deploy/puppet/modules/hadoop/manifests/init.pp

4.    After the cluster creation completes, confirm that the change was applied by running the following command in the master node:

yarn cluster --list-node-labels

The following is the expected output of the preceding command showing an empty value for node labels:

<<<<< Node Labels: >>>>>>

Turn off the default YARN label feature in an existing EMR cluster:

1.    Connect to the Amazon EMR primary node using SSH.

2.    Create a backup of the existing yarn-site.xml file. The path is :/etc/hadoop/conf/yarn-site.xml.

3.    Open the yarn-site.xml in file editor mode using the following command:

sudo su vi yarn-site.xml

4.    Change the yarn.node-labels.enabled property value to false.

<property>
<name>yarn.node-labels.enabled</name>
<value>false</value>
</property>

5.    Remove the value CORE in the property yarn.node-labels.am.default-node-label-expression as shown in the following example:

<property>
    <name>yarn.node-labels.am.default-node-label-expression</name>
    <value></value>
  </property>

6.    Restart the YARN ResourceManger service:

sudo systemctl restart hadoop-yarn-resourcemanager.service

sudo systemctl status hadoop-yarn-resourcemanager.service

7.    Confirm that the change is applied successfully using the following command:

yarn cluster --list-node-labels

The following is the expected output of the preceding command showing an empty value for node labels:

<<<<< Node Labels: >>>>>>

Turn on the YARN labels feature in Amazon EMR version 6.x and above

Turn on the YARN label feature when creating a new EMR cluster:

1.    Add the following properties to the Edit software settings, Enter configuration section, and then proceed with cluster creation:

[
  {
    "Classification": "yarn-site",
    "Properties": {
      "yarn.node-labels.enabled": "true",
      "yarn.node-labels.am.default-node-label-expression": "CORE"
    }
  }
]

2.    After the cluster creation completes, confirm that the change is applied successfully by running the following command in the master node:

yarn cluster --list-node-labels

The following is the expected output of the preceding command:

<<<<< Node Labels: <CORE:exclusivity=false>  >>>>>

Turn on the YARN label feature in an existing EMR cluster:

1.    From the Amazon EMR console, select Clusters, and then select the cluster that you want to edit.

2.    Choose the Configurations tab.

3.    Select the Edit in JSON option under the Reconfigure tab, and add the following properties.

[
  {
    "Classification": "yarn-site",
    "Properties": {
      "yarn.node-labels.enabled": "true",
      "yarn.node-labels.am.default-node-label-expression": "CORE"
    }
  }
]

4.    Select the Apply this configuration to all active instance groups option, and then save the changes.

5.    Confirm that the change is applied by running the following command in the master node:

yarn cluster --list-node-labels

The following is the expected output of the preceding command:

<<<<< Node Labels: <CORE:exclusivity=false>  >>>>>>

Did this article help?


Do you need billing or technical support?