Deep dive into the AWS ParallelCluster 3 configuration file

In September, we announced the release of AWS ParallelCluster 3, a major release with lots of changes and new features. To help get you started migrating your clusters, we provided the Moving from AWS ParallelCluster 2.x to 3.x guide. We know moving versions can be a quite an undertaking, so we’re augmenting that official documentation with additional color and context on a few key areas. With this blog post, we’ll focus on the configuration file format changes for ParallelCluster 3, and how they map back to the same configuration sections for ParallelCluster 2.

The AWS ParallelCluster 3 configuration file

The first major change to discuss is that AWS ParallelCluster 3 restricts a configuration file to define a single cluster resource. Previously you were able to define multiple cluster configurations within the same configuration file, and then provide an option to the command line interface (CLI) to specify which cluster you are operating on. ParallelCluster 3 CLI instead asks you to provide the configuration file for the cluster resource you want to operate on.

We believe that associating a configuration file with a single cluster (along with some other changes we’ll discuss later), will make each file more readable and maintainable in the long run.

With this in mind, when you migrate a ParallelCluster 2 configuration file that defines multiple clusters to version 3, you’ll need to create those individual configuration files for each cluster. Any resource setting that is referenced from more than one cluster definition will need to be repeated in each destination configuration file.

Introducing the ParallelCluster Configuration Converter

To help you transform your ParallelCluster configuration file from version 2 to version 3 specification, we have introduced a configuration converter tool, which is available in ParallelCluster 3.0.1. This tool, takes a ParallelCluster 2 configuration file as input and outputs a ParallelCluster 3 configuration file. It manages the transformation of various parameter specifications while considering functional feature differences between ParallelCluster 2 and ParallelCluster 3. It provides verbose messages to highlight these differences with additional information, warnings, or error messages. There’s more on the tool in the online documentation. We’ll discuss the specifics of the configuration file changes later in this post, but this tool will help you when you are ready to migrate. In line with ParallelCluster 3’s approach of one cluster per config file, the config converter will migrate one cluster section at a time for you, as specified (by you) on the command line using the ‘–cluster-template’ option.

Syntax changes

The next major thing you will notice is that the configuration file is now using YAML instead of the INI syntax. We think this improves readability and maintainability by collecting resource types under a tree structure.

To better understand the differences between ParallelCluster version 2 and 3, we will break down the analysis into the following high-level components of a cluster: Head Node, Scheduler and Compute Nodes, Storage, and Networking. Note that while these examples are not exhaustive, they cover the most important options and changes to give you a good sense of what to look for when you migrate your own configuration files.

A note on inclusive language

You’ll also have noticed that we have started using the term “head node” in lieu of “master node”. The language we use and what we choose to name things reflect our core values. For the past couple of years, it’s been a goal of ours to change some problematic language for cluster resources. The scope of what we wanted to accomplish for version 3 presented us with a golden opportunity to finally make changes that break from such traditional non-inclusive naming.

Across the entire product, we no longer refer to a ‘master node’, but instead to a ‘head node’ (and that extends to names for environment variables like MASTER_IP, which is now PCLUSTER_HEAD_NODE_IP).

Configuration file sections

The HeadNode section

The following table lists the configuration options for a cluster head node, and contrasts the two configuration file formats with ParallelCluster 2 and ParallelCluster 3 side-by-side.

AWS ParallelCluster version 2 AWS ParallelCluster version 3

AWS ParallelCluster version 2	AWS ParallelCluster version 3
`[vpc public] vpc_id = vpc-2f09a348 master_subnet_id = subnet-b46032ec ssh_from = 0.0.0.0/0 [cluster mycluster] key_name = My_PC3_KeyPair base_os = alinux2 scheduler = slurm master_instance_type = c5n.18xlarge vpc_settings = public queue_settings = multi-queue,spot,ondemand`	`HeadNode: InstanceType: c5.4xlarge Networking: SubnetId: subnet-b46032ec Ssh: KeyName: My_PC3_KeyPair AllowedIps: 0.0.0.0/0`


[vpc public]
vpc_id = vpc-2f09a348
master_subnet_id = subnet-b46032ec
ssh_from = 0.0.0.0/0

[cluster mycluster]
key_name = My_PC3_KeyPair
base_os = alinux2
scheduler = slurm
master_instance_type = c5n.18xlarge
vpc_settings = public
queue_settings = multi-queue,spot,ondemand


HeadNode:
  InstanceType: c5.4xlarge
  Networking:
    SubnetId: subnet-b46032ec
  Ssh:
    KeyName: My_PC3_KeyPair
    AllowedIps: 0.0.0.0/0

Notice that ParallelCluster 2’s [cluster] section contains configuration settings for the head node, compute nodes, and scheduler within the same section, yet it splits the SSH ingress rule and key pair name across the [vpc] and [cluster] sections, respectively. In contrast, ParallelCluster 3 has a distinct HeadNode section which only contains settings that relate to the head node and does not contain any information about the compute nodes or scheduler. Also note that the ParallelCluster 3 version only asks for the subnet to deploy to, since the VPC can be inferred from that.

Another practice we’re leaving behind is ParallelCluster 2’s use of ad hoc pointers in configuration files. Sections that needed to refer to a resource that was defined in another section of the file had an attribute where the attribute name was prefixed with the type of resource (“vpc” or “queue”) and ended with a “_settings” suffix. The value was a “pointer” to another section in the configuration. In our example, the vpc_settings = public attribute pointed the [vpc public] section. When the concept is simple, it’s a good methodology to use and it’s a common pattern for INI files. But maintenance and understanding became more difficult once there were a lot of sections being referenced, each of which had pointers of their own to other sections. While a ParallelCluster itself didn’t lose track of all these pointer references, humans did. This was the case for defining scheduler queues, which we’ll talk about in the next section.

There are many more configuration options in the HeadNode section, some of which are like ParallelCluster 2 properties. You’ll see more on this in the HeadNode section of the documentation. One new capability not shown in the previous example is the ability to set IAM permissions specific to the head node, separate from the compute nodes.

Scheduling and ComputeResources sections

A common pattern for cluster configuration files (and a great way to use the cloud) is to define multiple queues with different underlying compute resources. In ParallelCluster 2, you defined a [cluster] section with pointers to one or more [queue] sections. Each [queue] section had more pointers to the [compute_resource] sections, which could overlap with other defined queue sections! If you made a change to a [compute_resource], you may introduce unwanted changes to another [queue] section.

ParallelCluster 3 configuration files avoid this problem by providing a hierarchy of resources. A Scheduling section contains a set of queues, where each queue contains the ComputeResource definitions. The following table shows an example of a Slurm cluster that defines multiple queues, and again shows the version 2 and 3 definitions side by side:

AWS ParallelCluster version 2 AWS ParallelCluster version 3

AWS ParallelCluster version 2	AWS ParallelCluster version 3
`[cluster multi-queue] scheduler = slurm queue_settings = q1_ondemand, q2_spot [queue q1_ondemand] compute_resource_settings = ondemand_i1 [queue q2_spot] compute_resource_settings = spot_i1, spot_i2 compute_type = spot [compute_resource ondemand_i1] instance_type = c5.2xlarge [compute_resource spot_i1] instance_type = c5.xlarge min_count = 0 max_count = 10 [compute_resource spot_i1] instance_type = t2.micro min_count = 1`	Scheduling: Scheduler: slurm SlurmSettings: ScaledownIdletime: 10 Dns: DisableManagedDns: true SlurmQueues: - Name: q1_ondemand ComputeSettings: LocalStorage: RootVolume: Size: 100 CapacityType: ONDEMAND ComputeResources: - Name: compute-resource-1 InstanceType: c5.2xlarge MinCount: 0 MaxCount: 64 Networking: SubnetIds: - subnet-a12321bc - Name: q2_spot CapacityType: SPOT ComputeResources: - Name: spot_i1 InstanceType: c5.xlarge MinCount: 0 MaxCount: 10 Networking: SubnetIds: - subnet-a12321bc - Name: spot_i2 InstanceType: t2.micro MinCount: 1 Networking: SubnetIds: - subnet-a12321bc


[cluster multi-queue]
scheduler = slurm
queue_settings = q1_ondemand, q2_spot

[queue q1_ondemand]
compute_resource_settings = ondemand_i1

[queue q2_spot]
compute_resource_settings = spot_i1, spot_i2
compute_type = spot                 

[compute_resource ondemand_i1]
instance_type = c5.2xlarge

[compute_resource spot_i1]
instance_type = c5.xlarge
min_count = 0                       
max_count = 10
[compute_resource spot_i1]
instance_type = t2.micro
min_count = 1


Scheduling:
  Scheduler: slurm
  SlurmSettings:
    ScaledownIdletime: 10
    Dns:
      DisableManagedDns: true
  SlurmQueues:
    - Name: q1_ondemand
      ComputeSettings:
        LocalStorage:
          RootVolume:
            Size: 100
      CapacityType: ONDEMAND
      ComputeResources:
        - Name: compute-resource-1
          InstanceType: c5.2xlarge
          MinCount: 0
          MaxCount: 64
          Networking:
             SubnetIds:
             - subnet-a12321bc
    - Name: q2_spot
      CapacityType: SPOT
      ComputeResources:
        - Name: spot_i1
          InstanceType: c5.xlarge
          MinCount: 0
           MaxCount: 10 
          Networking:
             SubnetIds:
             - subnet-a12321bc
        - Name: spot_i2
          InstanceType: t2.micro
          MinCount: 1
          Networking:
             SubnetIds:
             - subnet-a12321bc

The new format defines a Scheduling section which allows for one or more queues (in this case, SlurmQueues) to be defined within that section. Each queue section defines the ComputeResources in a self-contained child structure. This may cause a little redundancy (like repeating the subnet information) if compute resources have the same structure across queues, but it enables for a resource definition to be consistent within itself, and thus easier to maintain over the long term.

Some notes on network configuration

It’s worth explaining a couple of things about networking in a little more detail.

First, in the previous example you saw that the each ComputeResources section required Networking/SubnetIds to be specified. This begs the question about whether you can define different subnets for difference compute resources. The answer is “no” – in ParallelCluster you still need to maintain the same subnet specification across all compute resources. You can’t provide different subnets for different queues (yet).

Next, there’s some detail to understand about including Elastic Fabric Adapter (EFA). You’ll recall that EFA is a network interface for Amazon EC2 instances for applications requiring high levels of inter-node communications at scale. In ParallelCluster 3 the specification for EFA is defined within a ComputeResources/Efa subsection for each queue that needs it. You do that by asserting ‘Enabled: true’. However, there’s one more step: you also need to specify whether you want to use a placement group for EFA, or not. It would be unusual not to use a cluster placement group with EFA, but we didn’t want the configuration syntax to exclude this choice. If you have a specific placemnt group you want to use, you can specify it, or ParallelCluster will create one for you.

AWS ParallelCluster version 2 AWS ParallelCluster version 3

AWS ParallelCluster version 2	AWS ParallelCluster version 3
`[cluster default] # … other settings … queue_settings = q1_mpi enable_efa = false placement_group = DYNAMIC`	`Scheduling: Scheduler: slurm SlurmQueues: - Name: q1_mpi ComputeResources: - Name: c5n24xlarge InstanceType: c5n.18xlarge Efa: Enabled: true Networking: SubnetIds: - subnet-a12321bc PlacementGroup: Enabled: true`


[cluster default]
# … other settings …
queue_settings = q1_mpi 
enable_efa = false
placement_group = DYNAMIC


Scheduling:
  Scheduler: slurm
  SlurmQueues:
  - Name: q1_mpi
    ComputeResources:
    - Name: c5n24xlarge
      InstanceType: c5n.18xlarge
      Efa:
       Enabled: true
    Networking:
      SubnetIds:
      - subnet-a12321bc
      PlacementGroup:
        Enabled: true

Separating the head node from the compute nodes

There are some other sections that relate to compute resources, like specifying local instance storage for nodes, or defining a custom AMI for the compute nodes which is separate from the head node’s AMI. Two notable differences from ParallelCluster 2 are that in ParallelCluster 3 you can also define separate IAM permissions and custom bootstrap actions for the head node (see Iam, CustomActions) versus the compute nodes (see Iam, CustomActions).

SharedStorage section

In ParallelCluster 3, we aligned the storage options for instances (the ephemeral and root volumes) to the sections where the resource is defined (like the HeadNode, or the ComputeSettings for a specific queue).

Shared storage configuration settings are separated from these, under the SharedStorage section of the configuration file. There, you define up to five Amazon Elastic Block Store (Amazon EBS), one Amazon Elastic File System (Amazon EFS), and one Amazon FSx for Lustre file systems, which will be shared across all resources.

In contrast to ParallelCluster 2, where a default EBS volume (mounted at /shared) was always created if no other shared volumes were specified, ParallelCluster 3 doesn’t define a default shared volume – you need to explicitly define one. The Configuration Converter tool does explicitly define a shared volume when you use it to migrate your configuration file. If you don’t need it, you’re free to remove this, or alter it, before you launch your new cluster.

AWS ParallelCluster version 2 AWS ParallelCluster version 3

AWS ParallelCluster version 2	AWS ParallelCluster version 3
`[ebs myebs] shared_dir = /shared volume_type = gp3 volume_size = 100 [fsx myfsx] shared_dir = /lustre storage_capacity = 1200 import_path = s3://myhpcbucket deployment_type = SCRATCH_2`	`SharedStorage: - MountDir: /shared Name: myebs StorageType: Ebs EbsSettings: VolumeType: gp3 Size: 100 - MountDir: /lustre Name: myfsx StorageType: FsxLustre FsxLustreSettings: StorageCapacity: 1200 DeploymentType: SCRATCH_2 ImportPath: s3://myhpcbucket`


[ebs myebs]
shared_dir = /shared
volume_type = gp3
volume_size = 100

[fsx myfsx]
shared_dir = /lustre
storage_capacity = 1200
import_path =  s3://myhpcbucket
deployment_type = SCRATCH_2


SharedStorage:
  - MountDir: /shared
    Name: myebs
    StorageType: Ebs
    EbsSettings:
      VolumeType: gp3
      Size: 100
  - MountDir: /lustre
    Name: myfsx
    StorageType: FsxLustre
    FsxLustreSettings:
      StorageCapacity: 1200
      DeploymentType: SCRATCH_2
      ImportPath: s3://myhpcbucket

Finally, in ParallelCluster 2 you defined IAM permissions for access to Amazon Simple Storage Service (Amazon S3) buckets that applied to your head nodes and compute nodes. In ParallelCluster 3, you’re free to define separate access rules for each resource type. Refer to the S3Access documentation for the HeadNode and Scheduling sections (for the compute fleets) for more on this.

Conclusion

In this post, we took a deep dive into parts of the ParallelCluster 3 configuration file, and how it differs from the previous version. We explained the hierarchical arrangement of resources, as well as key parts of the configured resources (head node, queues, compute resources, storage, etc.) and how they all fit together in ParallelCluster 3 configurations.

Migrating a ParallelCluster 2 cluster definition to ParallelCluster 3 can be a relatively straight forward process, as the conceptual components remain the same with some changes in organization of these components in a ParallelCluster configuration file. The YAML formatting is simple and the hierarchical organization makes for a more intuitive organization that’s easier to read and maintain.

To help reduce the burden of manually translating parts of a ParallelCluster 2 configuration to ParallelCluster 3, we’ve developed a tool to transform your configuration files from version 2 to version 3, which is available starting in ParallelCluster 3.0.1. You can find more details about the tool in the online documentation.

For getting started with AWS ParallelCluster 3, you can follow one of our step-by-step workshops, or watch an HPC tech Short.

AWS HPC Blog

Deep dive into the AWS ParallelCluster 3 configuration file

The AWS ParallelCluster 3 configuration file

Introducing the ParallelCluster Configuration Converter

Syntax changes

A note on inclusive language

Configuration file sections

The HeadNode section

Scheduling and ComputeResources sections

Some notes on network configuration

Separating the head node from the compute nodes

SharedStorage section

Conclusion

Resources

Follow