How can I create queues on my Amazon EMR YARN CapacityScheduler?

3 minute read

How do I create queues on my Amazon EMR Hadoop YARN CapacityScheduler?

Short description

EMR clusters have a single queue by default. You can add additional queues to your cluster and allocate available cluster resource capacity to your new queues.

Resolution

Create a reconfiguration command

The following example reconfiguration does the following:

Creates two additional queues, alpha and beta.
Allocates 30% of the total resource capacity of your cluster to each of the new queues. When adding queues and allocating cluster capacity, the sum of capacities for all queues must be equal to 100. So, in the following example reconfiguration, the capacity of the default queue decreases to 40%.
Provides full access (designated by the "*" label) to both queues. This allows both queues to access labeled core nodes.
To submit to particular queue specify the queue in the yarn.scheduler.capacity.queue-mappings parameter. This parameter maps users to a queue with the same name as the user. The parent queue name must be the same as the primary group of the user, such as u:user:primary_group.user. In the following example, the parameter is set to u:hadoop:alpha. This maps to the newly created queue alpha.

Note: The capacity for each queue’s access to the core label matches the capacity of the queue itself. So, the core partition splits between queues at the same ratio as the rest of the cluster.

- Classification: capacity-scheduler
  Properties:
    yarn.scheduler.capacity.root.queues: 'default,alpha,beta'
    yarn.scheduler.capacity.root.default.capacity: '40'
    yarn.scheduler.capacity.root.default.accessible-node-labels.CORE.capacity: '40'
    yarn.scheduler.capacity.root.alpha.capacity: '30'
    yarn.scheduler.capacity.root.alpha.accessible-node-labels: '*'
    yarn.scheduler.capacity.root.alpha.accessible-node-labels.CORE.capacity: '30'
    yarn.scheduler.capacity.root.beta.capacity: '30'
    yarn.scheduler.capacity.root.beta.accessible-node-labels: '*'
    yarn.scheduler.capacity.root.beta.accessible-node-labels.CORE.capacity: '30'
- classification: yarn-site
  properties:
    yarn.scheduler.capacity.queue-mappings: 'u:hadoop:alpha'
  configurations: []

Note: If you want to override the default queue mapping settings, set parameter yarn.scheduler.capacity.queue-mappings-override.enable to true. By default, this parameter is set to false. When set to true, users can submit jobs to queues other than the designated queue. For more information, see Enable override of default queue mappings on the Hortonworks Docs website.

Verify your modifications

Access the YARN ResourceManager Web UI to verify that your modifications have taken place.

The following is an example of a Spark job submitted on Amazon EMR 6.4.0 that has the preceding example reconfiguration:

spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --conf spark.driver.memoryOverhead=512 --conf spark.executor.memoryOverhead=512 /usr/lib/spark/examples/jars/spark-examples.jar 100

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
...
...
...
22/11/29 07:58:07 INFO Client: Application report for application_1669707794547_0001 (state: ACCEPTED)
22/11/29 07:58:08 INFO Client: Application report for application_1669707794547_0001 (state: RUNNING)

This application application_1669707794547_0001 is submitted to queue "alpha"

Related information

Hadoop: Capacity Scheduler on the Apache Hadoop website

Configure Hadoop YARN CapacityScheduler on Amazon EMR on Amazon EC2 for multi-tenant heterogeneous workloads

Topics

Analytics

Relevant content

Can queries be assigned directly to the DEFAULT queue?
klarson
asked 5 years ago
How can I allow a duplicate PlayerId to a Gamelift FlexMatch queue?
IndieGameDeveloperFromParallelWorld
asked 2 months ago
Concurrently executions from a FIFO queue
Accepted Answer
rePost-User-8659680
asked a year ago
How to calculate cost of SQS queue when consumed by a batched Lambda?
Accepted Answer
Martin
asked 5 months ago
Can SQS Standard Queue give priority to some messages in the Queue
Accepted Answer
rePost-User-4021996
asked 2 years ago
How can I create notifications for when an Amazon EMR cluster or step changes state?
AWS OFFICIALUpdated 2 years ago
How can I configure or modify node labeling with the Amazon EMR YARN Scheduler queue?
AWS OFFICIALUpdated a year ago
How can I install Python libraries on my EMR clusters?
AWS OFFICIALUpdated a year ago
How do I create and prioritize query queues in my Amazon Redshift cluster?
AWS OFFICIALUpdated 4 months ago
How do I pause a queue and cancel thousands of jobs in that queue?
EXPERT
Bo_L
published 6 months ago