AWS HPC Blog

Integrating Research and Engineering Studio with AWS ParallelCluster

Integrating Research and Engineering Studio with AWS ParallelClusterResearch and Engineering Studio on AWS (RES) is an easy-to-use self-service portal for researchers and engineers to access and manage their cloud-based workspaces with persistent storage and secure virtual desktops to access their data and run interactive applications.

The people who use RES are also frequently users of HPC clusters. RES today does not ship with direct integration to other AWS solutions for HPC, like AWS ParallelCluster.

So today, we’re happy to announce a new HPC recipe which creates a RES-compatible ParallelCluster login node software stack. This solution uses new features added to RES 2024.04 including RES-ready AMIs and project launch templates. The RES-ready AMI allows you to pre-install RES dependencies for your virtual desktop instances (VDIs) to pre-bake software and configuration into your images, and improve boot times for your end users.

These AMIs can be registered as a RES software stack for your end users to create a VDI, which joins a ParallelCluster to become their own personal login node. Using this recipe, you’ll see how to create customized software stacks that can be used as conduits to access other native AWS cloud services.

ParallelCluster login node for RES

Starting from version 3.7.0, ParallelCluster provides login nodes for users to access the Slurm-based environment in the cloud. Both RES and ParallelCluster support multi-user access using Active Directory (AD) integration. With that mechanism in place, shared storage and user authentication becomes easier to manage.

We start with an existing ParallelCluster that’s integrated with the same AD and using the same ldap_id_mapping setting. We take a snapshot of a login node of the cluster, turn it into an Amazon Machine Image (AMI), and use as the Base image within an EC2 Image Builder recipe to create a RES-ready AMI. The ldap_ip_mapping setting is an installation-level setting used by System Security Services Daemon (SSSD). This setting (when enabled) allows SSSD to generate a UID/GID.

This setting is used in both RES and ParallelCluster. In RES, it’s enabled through the EnableLdapIDMapping AWS CloudFormation parameter which you use to launch the product.

In ParallelCluster, it’s configured in the DirectoryService section of the configuration file, as part of AdditionalSssdConfigs (more on this in our docs), like this:

DirectoryService:
…
  AdditionalSssdConfigs:
    ldap_id_mapping: true

The RES-compatible login node AMI is created through an AWS Systems Manager (SSM) automation document. SSM allows you to gather insights and automate tasks across your AWS accounts.

This automation process performs two tasks:

  1. It creates an Amazon Machine Image (AMI) from the login node
  2. Updates the ParallelCluster head node security group by adding ingress rules to allow connections from RES virtual desktops. Slurm (6819-6829) and NFS (2049) ingress connections are allowed via the RESPCLoginNodeSG security group. This group will be used later to associate to RES project(s) when creating login node VDI instances.

After the RES-compatible login node AMI has been created, we can create the RES-ready AMI using EC2 Image Builder. EC2 Image Builder simplifies the build process of building and maintaining “golden images”. Our SSM automation process will create an Image Builder recipe that includes the base AMI, and an additional component for the RES login node to configure the image to work as a RES VDI. The resulting AMI from the Image Builder pipeline will be used to create a RES software stack.

Once the RES-ready AMI has been created by Image Builder, a RES administrator can login to create the Project and Software Stack by in two steps:

  1. Update a RES Project to modify the Launch template (or create a project if it’s their first time using RES). The RESPCLoginNodeSG must be added to any project that requires access to the ParallelCluster. This can be added in the RES project Resource Configurations -> Advanced Options -> Add Security Groups configuration section.
  2. Create a new Software Stack using the RES-ready AMI created from the Image Builder pipeline.

Once the software stack has been created, end-users that have access to it as part of a Project can create their own, dedicated, login node virtual desktops.

Virtual desktop login node in action

End-users will access the login node virtual desktop in the same way they access other virtual desktops.

Figure 1 – The Virtual Desktops contain a virtual desktop (PC-LoginNode372) which is a based on a LoginNode instance compatible with ParallelCluster.

Figure 1 – The Virtual Desktops contain a virtual desktop (PC-LoginNode372) which is a based on a LoginNode instance compatible with ParallelCluster.

Once end users have access to the login node VDI they can interact with ParallelCluster in the same way they’re accustomed. They’ll have access to the same shared storage in the ParallelCluster Login Node and in the RES VDI. The next couple of screenshots show examples of shared storage accessed from both the Login Node and RES VDI.

Figure 2 – A terminal session on a ParallelCluster Login Node showing the user shared storage directory listing

Figure 2 – A terminal session on a ParallelCluster Login Node showing the user shared storage directory listing

Figure 3 – A RES virtual desktop session showing the same shared storage directory accessible from the ParallelCluster Login Node

Figure 3 – A RES virtual desktop session showing the same shared storage directory accessible from the ParallelCluster Login Node

End users can now interact with a ParallelCluster (PC) from a RES VDI. This VDI acts similarly to a ParallelCluster login node with the added benefit that end-users in the RES environment can launch their own login node, with VDI, any time they need one.

Figure 4 – A virtual desktop session showing examples of Slurm commands demonstrating the integration with ParallelCluster.

Figure 4 – A virtual desktop session showing examples of Slurm commands demonstrating the integration with ParallelCluster.

Getting started with RES-compatible ParallelCluster login nodes

You can follow the steps to create a RES-compatible login node for your ParallelCluster by heading to the Login Node for Research and Engineering Studio recipe that’s part of the HPC Recipes Library (a great resource, if you’re unfamiliar with it until now).

Conclusion

Integrating AWS ParallelCluster and Research and Engineering Studio unlocks the ability for end users using interactive desktops to process large amounts of data when HPC is necessary. It’s a great experience, because not only does this put large-scale computational power in the hands of scientists, it does so in a way that’s friendly to use, and accessible any time they need it.

Doug Morand

Doug Morand

Doug Morand is a solutions architect on the Worldwide Public Sector team with Amazon Web Services (AWS). He primarily works with higher education and research institutions to help architect and design cloud computing solutions.

Jianjun Xu

Jianjun Xu

Dr. Jianjun Xu is a Principal Solutions Architect in the AWS Higher Education Research team. He was an astrophysicist and a senior software development executive in EdTech before he joined AWS. His focuses on research computing and specialized in AI/ML and HPC.