Easing your migration from SGE to Slurm in AWS ParallelCluster 3
Nick Ihli, Director of Cloud at SchedMD, Austin Cherian, Senior Product Manager-Technical for HPC at AWS.
Selecting a job scheduler is an important decision because of the investment you make in time and effort to use it effectively for your run HPC jobs. Key to any software selection, though, is knowing there’s a robust support framework and track record for innovation.
In June 2020, we announced that we would stop using the Son of Grid Engine (SGE) and Torque job schedulers. We took that decision because the open-source software (OSS) repositories for these two projects had seen no community updates for many years. That makes them higher risk as vectors for attack because “no updates” also means “no patches” for vulnerabilities that are discovered. With every ParallelCluster 2.x release, we worked harder (and harder) to tighten the protective net around these packages to ensure we meet your expectations of AWS in the shared responsibility model. But with ParallelCluster 3, we shifted to only directly supporting schedulers with viable support models. We’ve worked closely with SchedMD, the maintainers and developers of Slurm, to enhance Slurm so it works even better with ParallelCluster.
December 31, 2021 was the last date of support for SGE and Torque in ParallelCluster. The clusters created with these won’t stop working, of course, but to keep your operations going knowing that AWS’s service and support teams are there to help you, it’s time to start your migration to Slurm or AWS Batch. This blog post will help you do that for Slurm.
To help you understand the details of moving from SGE to Slurm, we’ll present two perspectives: the user and the administrator. Both require gaining familiarity and comfort using the scheduler’s client and job submission commands so you can effectively manage users and cluster resources and run interactive or scripted job submissions. Figure 1 breaks down how the user and administrator roles generally make use of the client and job submission commands.
Migrating from a legacy scheduler like SGE to Slurm is like driving a new car. You may know in theory how it operates, but you’ll need some time to find and understand how to operate all the right controls figure out the radio, and build muscle some memory. To facilitate a hands-on approach to this migration, it’s useful to have side-by-side commands for important functions of the scheduler in each of the areas we outlined in Figure 1.
In this blog, we’ll detail these aspects so you can run your jobs using Slurm. We’ll also discuss methods and show you some tools that make it easy for SGE users to get comfortable with Slurm quickly, including some specialized wrapper commands that will close this gap rapidly.
Since Slurm’s command-line syntax is different, we’ll use figures that show a side-by-side comparison of the SGE and equivalent Slurm commands. Besides learning new commands, you will learn how to find the specific information you need from the command’s output. For more information on Slurm command syntax and additional examples refer to the official Slurm documentation
System Makeup and Info
The first command,
sinfo, is one of Slurm’s major commands that gives insight into the node and partition information. The
sinfo command output in Figure 2 lists partitions, nodes in each partition, and the state those nodes are in. Partitions are equivalent to queues in SGE. Nodes can exist in multiple partitions. When a node is allocated to a job, its state changes and another line will be displayed, showing the node(s) in the specific state associated with the partition. To see a more specific view on each node, we can run
Here are some examples of commands and their outputs:
You’ll frequently need to get information about jobs. You can do this using Slurm’s
squeue command, (like SGE’s
qstat). In a later section, we’ll show you some wrapper scripts for
squeue that can be used to obtain job information in SGE’s own output format – to provide compatibility with any other utilities you might have created locally. Figure 3 shows a side-by-side comparison of the two commands and their expected output. For more information on the
Job Submission and Control
In both SGE and Slurm, you can submit jobs interactively or via a job script. In this section, we’ll compare both methods, including how to translate a job script from SGE to Slurm and how to submit it. We’ll also compare using job arrays in Slurm and SGE. For more information on Slurm command syntax and additional examples refer to the official Slurm documentation
Job Submission using a job script
We can submit job scripts in Slurm using the
sbatch. In a job script, we provide arguments to the scheduler before specifying commands to execute on the cluster. The arguments are specified with
#$ in SGE. Instead of
#$, Slurm uses
#SBATCH for job requirements. Figure 4 illustrates two job scripts that show the Slurm equivalent
After creating a job script, submitting a job is easy: we use
qsub <script name> in SGE and
sbatch <script name> in Slurm. You can then use
squeue, which we described in the previous section to verify your job’s status and monitor its progress.
Environment variables control different aspects of submitted jobs and can be used in job scripts. Most of Slurm’s environment variables are self-explanatory. Figure 5 shows a side-by-side comparison of SGE and Slurm environment variables.
Another type of common job that you will migrate to Slurm is a Job Array. Job Arrays help to increase throughput massively by using the parallelism of your cluster efficiently. For example, Job arrays with millions of tasks can be submitted in milliseconds. Slurm handles an array initially as a single job record. Additional Job records are created only as needed – typically when a task of a job array starts. This drastically increases the scalability of a Slurm environment when managing large job-counts.
Slurm optimizes backfill scheduling when using arrays. Backfill scheduling is typically a heavier operation, as Slurm is looking for those jobs that fit within the backfill window. For job arrays, once an element of a job array is discovered to not be runnable, or affects the scheduling of higher priority pending jobs, the remaining elements of that job array will be quickly skipped. You can find more details on Slurm’s backfill scheduling in the “scheduling configuration guide” for Slurm.
#SBATCH option for job arrays is denoted with ‘
-a’ or ‘
--array=’. This is like SGE, with range options, and the option to limit how many tasks in an array can run at once. Figure 6 shows examples comparing job array scripts between SGE and Slurm.
As an SGE user, you might run interactive jobs sometimes. Slurm also supports interactive job submission. Where
qrsh are the SGE commands for interactive jobs, Slurm has
salloc. The recommended method is to use the parameter
LaunchParameters=use_interactive_step in your slurm.conf file, and use
salloc to submit the interactive job.
salloc will grant an allocation and place the user on the allocated node, ready for interactive commands.
Command Wrappers for Migrating to Slurm
From the administrator’s perspective, helping users learn new commands can be a challenge, leading to longer migration timeframes. To help ease the migration process, SchedMD developed some wrapper translation scripts. The wrappers are not meant as a replacement to using the Slurm commands, but instead as temporary helper scripts.
There are many command wrappers to try, including
Figure 8 walks through an example comparison between Slurm’s
squeue and using the
As you can see, the output format more closely resembles SGE’s
There are some caveats to note, however. Slurm won’t read in
#? directives from the job script, but will be looking for
#SBATCH options. That’s because it pipes the job script’s content through to
sbatch, interpreting the
qsub command-line options where necessary, rather than parsing and re-interpreting SGE’s job scripts. The job script will need to be in Slurm format to use this
qsub wrapper for Slurm.
A convenient option that can be used with the
qsub wrapper is the option
--sbatchline. This will output the
sbatch command translation but not actually submit the job. This is helpful to users wanting to understand the Slurm equivalents from the SGE submission string. Figure 9 shows an example of how this works:
--sbatchline’, helps you find the corresponding Slurm syntax for your SGE
qsub incantation without digging through documentation.
Slurm provides other methods to make it easy for users to submit jobs successfully. For example, a job submit plugin or
cli_filter allows administrators to take user’s submissions, make validation checks and force job requirements based on those checks. These are powerful, and it’s best for you to check out Slurm’s documentation directly for job submit plugins and CLI filter plugins.
If you’re an SGE user, migrating to ParallelCluster 3 means change. This is brought about by the end of support for SGE because of lack of community’s interest in maintaining SGE’s open-source codebase. We recommend migrating to ParallelCluster 3 now, and making the switch to either Slurm or AWS Batch.
In this blog we’ve covered the things you need to know to make a move to Slurm, which is the obvious HPC successor to SGE, and the most friction-free path. As we illustrated in this post, the learning curve for switching to Slurm isn’t steep but requires getting used to the new syntax and some nuance.
In this post, we described a detailed migration from SGE to Slurm and provided side-by-side commands to make the migration easy. A hands-on approach to migrating through trying each feature of Slurm is recommended. For more information on Slurm command syntax and additional examples refer to the official Slurm documentation, or watch our 5-part series of HPC Tech Shorts to see the teams from AWS and SchedMD showing them in action.
If you need additional support, SchedMD (the official maintainers of Slurm) offer commercial packages to help you with migrations to Slurm. SchedMD also offer professional services in AWS Marketplace to support your development.