Deploying an HPC cluster and remote visualization in a single step using AWS ParallelCluster
Since its initial release in November 2018, AWS ParallelCluster (an AWS-supported open source tool) has made it easier and more cost effective for users to manage and deploy HPC clusters in the cloud. Since then, the team has continued to enhance the product with more configuration flexibility and enhancements like built-in support for the Elastic Fabric Adapter network interface, and integration with Amazon Elastic File System and Amazon FSx for Lustre shared filesystems. With the release of Version 2.5.0, ParallelCluster further simplifies cluster deployments by adding native support for NICE DCV on the master node.
NICE DCV is an AWS high-performance remote display protocol that provides customers with a secure way to deliver remote desktops and application streaming from any cloud or data center to any device, over varying network conditions. Using NICE DCV, customers can run graphics-intensive HPC applications remotely on EC2 instances and stream the results to their local machine at no additional charge, eliminating the need for expensive dedicated workstations.
Prior to the version 2.5.0 release, HPC users requiring remote visualization would typically have to deploy another EC2 instance to install, and configure a DCV server. This would entail installing prerequisites such as a Window Manager, configuring display drivers, installing DCV server software, configuring secure authentication, and configuring desktop sessions. With the release of ParallelCluster 2.5.0, all you need to do is enable DCV in the ParallelCluster config file, and all this is done for you. The remainder of this post will walk through the process of provisioning an HPC cluster to run a simulation using NAMD, and then visualizing the results in DCV.
For anyone unfamiliar with ParallelCluster or looking to deploy a cluster for the first time, additional background and introductory concepts are detailed in the ParallelCluster launch post. The Parallel Cluster User Guide is also useful as a reference.
The first step after installing ParallelCluster is to create or update the ParallelCluster configuration file. By default, ParallelCluster obtains its settings from the
.parallelcluster/config file located in the user’s home directory. A comprehensive list of all configuration parameters can be found in the configuration section of the ParallelCluster documentation.
ParallelCluster 2.5.0 greatly simplifies the
pcluster configuration process. Previously, when you launched
pcluster configure, you would be prompted for information that you would have to track down and copy/paste into the terminal. In version 2.5.0, the configuration program queries your VPC and presents you with a numbered list of options to choose from.
To create an initial configuration file, simply type
pcluster configure. The configuration wizard will prompt you for required inputs and provide a list of menu options. You can choose to deploy clusters in an existing VPC, or the program can create one for you. I want to deploy a cluster in an existing VPC, so we will walk through that scenario.
You can edit your configuration file or simply run
pcluster create [cluster-name] to create your cluster.
Once the configuration file is written, you’ll need to make a few modifications to enable NICE DCV on the Master node. This is enabled in two steps:
- Add a line
dcv_settings = defaultto the cluster section of the config file.
- Create a dcv section that references the value you chose for the dcv_settings and include the line
enable = master.
In the example config file below, I’ve highlighted the DCV-related items in bold. I’ve also chosen to deploy the cluster with a GPU-enabled Master instance. A Master instance typically does not need a lot of memory and CPU because it does not participate in any job runs; a c5.large instance would typically suffice. In this case, the master node is hosting NICE DCV remote desktop sessions, so I chose to deploy a g4dn.xlarge because it provides a cost-effective option for graphics-intensive applications.
Example pcluster config file
dcv_settings = default
enable = master
Create a cluster
To create a cluster, issue the
pcluster create command:
You should have a cluster and remote desktop available within minutes. Once your create command completes, you can connect to your desktop session and run a simulation job.
Connect to a NICE DCV session
To connect to your NICE DCV session, simply run
pcluster dcv connect [cluster] -k [keyname], where
[cluster] specifies the name you gave your cluster and
-k specifies the location of your private key. Once you are successfully authenticated, your NICE DCV session will launch in a browser and your credentials will be securely passed to the NICE DCV server using an authorization token.
bk:~ bkknorr$ pcluster dcv connect hpcblog -k hpc-key.pem
Job submission using NAMD
The following demonstration uses NAMD and VMD to run a molecular dynamics job and then visualize the output. This demonstration uses the first part of Unit 3 of the Membrane Protein Tutorial, starting on page 34. The requisite software and tutorial files need to be uploaded to a shared filesystem.
Upon connecting to your NICE DCV session, launch a terminal and navigate to your working directory containing the tutorial files. The job submission example below uses Sun Grid Engine (sge), the default ParallelCluster job scheduler.
Breaking down the above command:
- qsub submit a job to Sun Grid Engine (sge).
- -cwd execute the job from the current working directory.
- -pe mpi 4 request four slots from the parallel environment named
mpi. The number of slots requested will determine how many cores are used to run the job. The p3.2xlarge compute instance I am using has four cores and eight threads. I’ve disabled hypterthreading, so I’m requesting only four slots rather than eight.
- -N kcsa_popcwimineq-01 [optional] name of the job I am running.
- -o kcsa_popcwimineq-01.log [optional ] name of the output logfile.
- kcsa_popcwimineq-01.job name of the input file.
Visualize output in NICE DCV
Once your simulation job completes, you can now visualize your results using VMD. The first step is to download and install VMD.
Once VMD is installed, you can launch vmd from your terminal to view the output of your job.
As demonstrated in this post, AWS ParallelCluster 2.5.0 further simplifies HPC cluster deployment and management. With the ability to provision compute resources and remote visualization with a single command, HPC practitioners can iterate quicker and reduce time to insight.
Learn more about ParallelCluster on GitHub.