Using multiple queues and instance types in AWS ParallelCluster 2.9

Since its release as an officially supported AWS tool and open source project in November 2018, AWS ParallelCluster has made it simple for high performance computing (HPC) customers to set up easy-to-use environments with compute, storage, job scheduling, and networking in the cloud in one cohesive package. These clusters can cater to a wide variety of needs and workloads including computational fluid dynamics (CFD), molecular dynamics, machine learning/artificial intelligence, seismic imaging, genomics, weather research and forecasting (WRF), and financial risk modeling.

In many cases, these workloads can have multiple stages or different requirements, depending on the particulars of the case on which they are working. As a result, customers have asked for ways to make their HPC clusters more flexible so that they can run a larger variety of workloads from within a single cluster. Since the AWS ParallelCluster 2.9.0 release, a new functionality makes that very thing possible. In this blog post, we walk through possible use cases for when to consider using these new functionalities, as well as how to take advantage of these features. For those who are new to AWS ParallelCluster or who have not set up a cluster before, get started by completing one of our online workshops for HPC.

What’s new in AWS ParallelCluster 2.9

There are two new features available in AWS ParallelCluster 2.9 that we focus on in this post: support for multiple job submission queues and support for multiple instance types per queue that are available when using the Slurm job scheduler.

Although these features are available with the new settings, this version of AWS ParallelCluster remains backward-compatible with configurations from earlier versions of the product. Use the new command pcluster-config convert to have the command-line interface assist in converting a configuration file with a single queue to a multi-queue format. Find additional instructions in the product documentation.

Multiple job queues

A common setup inside many traditional HPC environments is to use multiple queues that have varying levels of priority and different compute resources associated with them. This helps ensure that the most urgent workloads have prioritized access to any resources they require, whereas other workloads can wait longer or run on resources with lower costs. Although AWS offers additional scale beyond what traditional on-premises environments can offer so that you don’t have to queue up or prioritize your jobs, a similar framework is possible using Amazon Elastic Compute Cloud (Amazon EC2) On-Demand and Amazon EC2 Spot.

For urgent or uninterruptible workloads we can set up an AWS ParallelCluster job queue that uses Amazon EC2 On-Demand. For workloads that are fault-tolerant we can set up an AWS ParallelCluster job queue that uses Amazon EC2 Spot to save up to 90% off compared to On-Demand.

Configuring a cluster with this dual-queue setup works similarly to earlier versions of AWS ParallelCluster, but uses a new setting (queue_settings) and section label (queue) inside the configuration file to specify more than one queue. Each queue section also includes a new compute_resource_settings parameter that points to the new compute_resource section where we define the instance type(s) we would like to use within each queue.

The following is an example configuration file that illustrates how we could define a cluster with an EC2 Spot queue and an EC2 On-Demand queue, like we described earlier. The settings that are new to 2.9.0 have been bolded. Note that this feature is only available when using the Slurm job scheduler.

In this example, we set up two queues with mix of EC2 On-Demand instance and EC2 Spot Instances c5n.18xlarge. We also disable hyperthreading and enable EFA for MPI jobs.

[global]
cluster_template = default
update_check = true
sanity_check = true

[aws]
aws_region_name = us-east-1

[cluster default]
base_os = alinux2
scheduler = slurm
key_name = <your_key>
vpc_settings = public
queue_settings = od-queue, spot-queue

[queue od-queue]
compute_resource_settings = c5n-od
compute_type = ondemand
disable_hyperthreading = true
enable_efa = true
placement_group = DYNAMIC

[queue spot-queue]
compute_resource_settings = c5n-spot
compute_type = spot
disable_hyperthreading = true
enable_efa = true
placement_group = DYNAMIC

[compute_resource c5n-od]
instance_type = c5n.18xlarge
min_count = 0
max_count = 10
initial_count = 2

[compute_resource c5n-spot]
instance_type = c5n.18xlarge
min_count = 0
max_count = 10 

[vpc public]
vpc_id = <your_vpc_id>
master_subnet_id = <your_subnet_id>

[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

Current AWS ParallelCluster users will notice that most of these settings are familiar. Notice, however, that some options that previously appeared in the cluster section have migrated to the queue section. In the example, enable_efa, placement_group, and disable_hyperthreading are options that have migrated. Additionally, settings about the type and number of instances specified have migrated to the compute_resource section. This is to ensure the flexibility to configure each queue’s settings independently when choosing to use AWS ParallelCluster with more than one queue at a time.

After creating our cluster (pcluster create -c <your_config_file> <your_cluster_name>), we can test our setup by logging in to our cluster’s head node and submitting a job to each of the two queues:

pcluster ssh <your_cluster_name> -i <your_keyfile>

cat > hello_world.sh << EOF
#!/bin/bash 
sleep 30 
echo "Hello World from $(hostname)"
EOF

sbatch --partition=od-queue hello_world.sh
sbatch --partition=spot-queue hello_world.sh

From there, we can verify that the job is running using the squeue command to see the state of all running jobs, and see information about all of our partitions and nodes that are available in an active and inactive state.

Screeshot of output signifying all partitions and nodes that are available in an active or inactive state.

Multiple instance types

In addition to being able to use multiple queues to distinguish between use cases such as Amazon EC2 Spot vs. Amazon EC2 On-Demand Instance, specifying multiple instance types within a single cluster is possible with the latest version of AWS ParallelCluster. Use these different instance types to distinguish between tasks that are optimized for instance type versus another. For example, multi-stage computational fluid dynamics workflows can include a first stage in which the grid or mesh generates, followed by a stage in which we invoke a computational solver to run a simulation stage and produce simulation output.

With AWS ParallelCluster, separating these multi-step workflows into either multiple queues (as shown previously) or multiple instance types within the same queue is now possible. In both cases, we can select distinct compute resources optimized for the price/performance or memory/CPU of each step in our workflow or for different workload types. Note that this feature is only available when using the Slurm job scheduler.

Here is one use case of a cluster configuration for a tightly coupled workload, such as the computational fluid dynamics workflow. We achieve this with the two preceding steps (meshing and simulation) using multiple instance types within a single job submission queue:

[global]
cluster_template = default
update_check = true
sanity_check = false

[aws]
aws_region_name = us-east-1

[cluster default]
base_os = alinux2
scheduler = slurm
key_name = <your_key>
vpc_settings = public
cluster_type = ondemand
efs_settings = customfs
queue_settings = cfd

[queue cfd]
compute_resource_settings = mesh-compute, sim-compute
compute_type = ondemand
disable_hyperthreading = true
enable_efa = true 
placement_group = DYNAMIC

[compute_resource mesh-compute]
instance_type = c5.2xlarge
min_count = 0
max_count = 6
initial_count = 2

[compute_resource sim-compute]
instance_type = c5n.18xlarge
min_count = 0
max_count = 16  
initial_count = 2

[vpc public]
vpc_id = <your_vpc_id>
master_subnet_id = <your_subnet_id>

[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

Another example that is useful for loosely coupled workloads is financial services customers. Financial service customers need to run risk simulations and portfolio risk simulations constantly. A risk simulation usually involves a portfolio stratification analysis. This step requires a large number of CPU cores, but does not necessarily need a lot of memory per core. The next step of the analysis will be a portfolio risk analysis of the large numbers of data with complex risk models. This step is memory and compute intensive.

With AWS ParallelCluster 2.9, customers can mix and match different instance types with different memory per core ratios on the same cluster to adapt their varied risk simulation workloads.

For example:

A portfolio stratification step can use numbers of compute-optimized c5.2xlarge instances.
A portfolio risk simulation step can use numbers of memory-optimized r5.8xlarge instances.

Similar to the previous, new instance settings CFD example, we use a mix of EC2 Spot and EC2 On-Demand Instances depending on the priority of the jobs in this setting, as well. For portfolio stratification, we use EC2 Spot Instances, and for portfolio risk simulation, we use EC2 On-Demand Instances to avoid any interruption.

[queue porfolio-stratification]
compute_resource_settings = porfolio-stratification
compute_type = spot

[compute_resource porfolio-stratification]
instance_type = c5.2xlarge
min_count = 0
max_count = 20
initial_count = 2

[queue porfolio-risk]
compute_resource_settings = porfolio-risk
compute_type = ondemand

[compute_resource porfolio-risk]
instance_type = r5.8xlarge
min_count = 0
max_count = 10  
initial_count = 2

The last example is for a loosely coupled genomic analysis in which an analysis job can run on multiple different instances with similar memory per cores requirement (for example, 4 GB of memory per vCPU). By using multiple instances such as m5.2xlarge, m5.4xlarge, m5.8xlarge instance types, with a similar memory/core requirement, we can take advantage of the larger pool of instances for our analysis.

Let’s say we want to run 1,000 analyses. Each analysis requires one vCPU with 4 GB of memory. A config file can look like the following, with 45 m5.2xlarge, 20 m5.4xlarge, and 10 m5.8xlarge to get you a combination of 1,000 vCPU.

[queue analysis]
compute_resource_settings = m5.2xlarge,m5.4xlarge,m5.8xlarge
compute_type = ondemand

[compute_resource m5.2xlarge]
instance_type = m5.2xlarge
min_count = 0
max_count = 45

[compute_resource m5.4xlarge]
instance_type = m5.4xlarge
min_count = 0
max_count = 20  

[compute_resource m5.8xlarge]
instance_type = m5.8xlarge
min_count = 0
max_count = 10

After creating a new cluster with this configuration, and logging in to the cluster as before, we can specify the instance type for each stage in the workflow by using the --constraint flag along with the sbatch command:

sbatch --constraint=c5.2xlarge <workflow 1 script>
sbatch --constraint=r5.8xlarge <workflow 2 script>

Updating an existing cluster

Although these new features can add significant flexibility to our cluster, we find that we may want to create additional queues, add instance types, or modify existing queues and instance types after we have created our cluster. As is true with previous versions of AWS ParallelCluster, to update an existing, running cluster, we can use the pcluster update command after modifying our configuration file:

pcluster update -c <new_config_file> <cluster_name>

Using this command, we could update a cluster using the first sample configuration file to the second one. With this additional functionality, adding flexibility to our HPC clusters and switching between the over 300 instance types available on AWS is possible.

Conclusion

In this post, we have demonstrated how to use some of the newest features available inside AWS ParallelCluster’s latest 2.9 release to enable more complex and varied HPC workflows. We highlighted new settings that make it possible for users to specify multiple job queues in addition to multiple instance types from within a single cluster. We also demonstrated how this can make running Amazon EC2 Spot Instances for interruptible workloads alongside Amazon EC2 On-Demand Instances possible. Finally, we highlighted how to specify more than one instance type in cases in which there may be multiple types of workloads running or multiple stages in the HPC workflows.

With this additional functionality, orchestrating and running more complex HPC workflows on AWS using AWS ParallelCluster and the breadth and depth of instance types available with Amazon EC2 is easier than ever.