How can I create queues on my Amazon EMR YARN CapacityScheduler?

3 minute read
0

How do I create queues on my Amazon EMR Hadoop YARN CapacityScheduler?

Short description

EMR clusters have a single queue by default. You can add additional queues to your cluster and allocate available cluster resource capacity to your new queues.

Resolution

Create a reconfiguration command

The following example reconfiguration does the following:

  • Creates two additional queues, alpha and beta.
  • Allocates 30% of the total resource capacity of your cluster to each of the new queues. When adding queues and allocating cluster capacity, the sum of capacities for all queues must be equal to 100. So, in the following example reconfiguration, the capacity of the default queue decreases to 40%.
  • Provides full access (designated by the "*" label) to both queues. This allows both queues to access labeled core nodes.
  • To submit to particular queue specify the queue in the yarn.scheduler.capacity.queue-mappings parameter. This parameter maps users to a queue with the same name as the user. The parent queue name must be the same as the primary group of the user, such as u:user:primary_group.user. In the following example, the parameter is set to u:hadoop:alpha. This maps to the newly created queue alpha.

Note: The capacity for each queue’s access to the core label matches the capacity of the queue itself. So, the core partition splits between queues at the same ratio as the rest of the cluster.

- Classification: capacity-scheduler
  Properties:
    yarn.scheduler.capacity.root.queues: 'default,alpha,beta'
    yarn.scheduler.capacity.root.default.capacity: '40'
    yarn.scheduler.capacity.root.default.accessible-node-labels.CORE.capacity: '40'
    yarn.scheduler.capacity.root.alpha.capacity: '30'
    yarn.scheduler.capacity.root.alpha.accessible-node-labels: '*'
    yarn.scheduler.capacity.root.alpha.accessible-node-labels.CORE.capacity: '30'
    yarn.scheduler.capacity.root.beta.capacity: '30'
    yarn.scheduler.capacity.root.beta.accessible-node-labels: '*'
    yarn.scheduler.capacity.root.beta.accessible-node-labels.CORE.capacity: '30'
- classification: yarn-site
  properties:
    yarn.scheduler.capacity.queue-mappings: 'u:hadoop:alpha'
  configurations: []

Note: If you want to override the default queue mapping settings, set parameter yarn.scheduler.capacity.queue-mappings-override.enable to true. By default, this parameter is set to false. When set to true, users can submit jobs to queues other than the designated queue. For more information, see Enable override of default queue mappings on the Hortonworks Docs website.

Verify your modifications

Access the YARN ResourceManager Web UI to verify that your modifications have taken place.

The following is an example of a Spark job submitted on Amazon EMR 6.4.0 that has the preceding example reconfiguration:

spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --conf spark.driver.memoryOverhead=512 --conf spark.executor.memoryOverhead=512 /usr/lib/spark/examples/jars/spark-examples.jar 100

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
...
...
...
22/11/29 07:58:07 INFO Client: Application report for application_1669707794547_0001 (state: ACCEPTED)
22/11/29 07:58:08 INFO Client: Application report for application_1669707794547_0001 (state: RUNNING)

This application application_1669707794547_0001 is submitted to queue "alpha"

Related information

Hadoop: Capacity Scheduler on the Apache Hadoop website

Configure Hadoop YARN CapacityScheduler on Amazon EMR on Amazon EC2 for multi-tenant heterogeneous workloads

AWS OFFICIAL
AWS OFFICIALUpdated a year ago