AWS HPC Blog
Integrating Research and Engineering Studio with AWS ParallelCluster
Research and Engineering Studio on AWS (RES) is an easy-to-use self-service portal for researchers and engineers to access and manage their cloud-based workspaces with persistent storage and secure virtual desktops to access their data and run interactive applications.
The people who use RES are also frequently users of HPC clusters. RES today does not ship with direct integration to other AWS solutions for HPC, like AWS ParallelCluster.
So today, we’re happy to announce a new HPC recipe which creates a RES-compatible ParallelCluster login node software stack. This solution uses new features added to RES 2024.04 including RES-ready AMIs and project launch templates. The RES-ready AMI allows you to pre-install RES dependencies for your virtual desktop instances (VDIs) to pre-bake software and configuration into your images, and improve boot times for your end users.
These AMIs can be registered as a RES software stack for your end users to create a VDI, which joins a ParallelCluster to become their own personal login node. Using this recipe, you’ll see how to create customized software stacks that can be used as conduits to access other native AWS cloud services.
ParallelCluster login node for RES
Starting from version 3.7.0, ParallelCluster provides login nodes for users to access the Slurm-based environment in the cloud. Both RES and ParallelCluster support multi-user access using Active Directory (AD) integration. With that mechanism in place, shared storage and user authentication becomes easier to manage.
We start with an existing ParallelCluster that’s integrated with the same AD and using the same ldap_id_mapping
setting. We take a snapshot of a login node of the cluster, turn it into an Amazon Machine Image (AMI), and use as the Base image within an EC2 Image Builder recipe to create a RES-ready AMI. The ldap_ip_mapping
setting is an installation-level setting used by System Security Services Daemon (SSSD). This setting (when enabled) allows SSSD to generate a UID/GID.
This setting is used in both RES and ParallelCluster. In RES, it’s enabled through the EnableLdapIDMapping
AWS CloudFormation parameter which you use to launch the product.
In ParallelCluster, it’s configured in the DirectoryService
section of the configuration file, as part of AdditionalSssdConfigs
(more on this in our docs), like this:
DirectoryService:
…
AdditionalSssdConfigs:
ldap_id_mapping: true
The RES-compatible login node AMI is created through an AWS Systems Manager (SSM) automation document. SSM allows you to gather insights and automate tasks across your AWS accounts.
This automation process performs two tasks:
- It creates an Amazon Machine Image (AMI) from the login node
- Updates the ParallelCluster head node security group by adding ingress rules to allow connections from RES virtual desktops. Slurm (6819-6829) and NFS (2049) ingress connections are allowed via the
RESPCLoginNodeSG
security group. This group will be used later to associate to RES project(s) when creating login node VDI instances.
After the RES-compatible login node AMI has been created, we can create the RES-ready AMI using EC2 Image Builder. EC2 Image Builder simplifies the build process of building and maintaining “golden images”. Our SSM automation process will create an Image Builder recipe that includes the base AMI, and an additional component for the RES login node to configure the image to work as a RES VDI. The resulting AMI from the Image Builder pipeline will be used to create a RES software stack.
Once the RES-ready AMI has been created by Image Builder, a RES administrator can login to create the Project and Software Stack by in two steps:
- Update a RES Project to modify the Launch template (or create a project if it’s their first time using RES). The
RESPCLoginNodeSG
must be added to any project that requires access to the ParallelCluster. This can be added in the RES project Resource Configurations -> Advanced Options -> Add Security Groups configuration section. - Create a new Software Stack using the RES-ready AMI created from the Image Builder pipeline.
Once the software stack has been created, end-users that have access to it as part of a Project can create their own, dedicated, login node virtual desktops.
Virtual desktop login node in action
End-users will access the login node virtual desktop in the same way they access other virtual desktops.
Once end users have access to the login node VDI they can interact with ParallelCluster in the same way they’re accustomed. They’ll have access to the same shared storage in the ParallelCluster Login Node and in the RES VDI. The next couple of screenshots show examples of shared storage accessed from both the Login Node and RES VDI.
End users can now interact with a ParallelCluster (PC) from a RES VDI. This VDI acts similarly to a ParallelCluster login node with the added benefit that end-users in the RES environment can launch their own login node, with VDI, any time they need one.
Getting started with RES-compatible ParallelCluster login nodes
You can follow the steps to create a RES-compatible login node for your ParallelCluster by heading to the Login Node for Research and Engineering Studio recipe that’s part of the HPC Recipes Library (a great resource, if you’re unfamiliar with it until now).
Conclusion
Integrating AWS ParallelCluster and Research and Engineering Studio unlocks the ability for end users using interactive desktops to process large amounts of data when HPC is necessary. It’s a great experience, because not only does this put large-scale computational power in the hands of scientists, it does so in a way that’s friendly to use, and accessible any time they need it.