AWS Open Source Blog
Managing AWS ParallelCluster SSH users with AWS OpsWorks
In a previous article, we highlighted the potential for deploying a local LDAP server to provide a mechanism for managing a multi-user AWS ParallelCluster deployment with low administrator overhead. If we want our cluster users to access or manage other AWS resources, it’s preferable to control their access via AWS Identity and Access Management (IAM). Federation with a centralized directory service and the use of third-party tools help. However, the complexity of these solutions often presents a significant barrier, particularly in cases where HPC users manage their research computing environments for themselves.
In this article, we provide an additional stepping stone for users who need to progress beyond local cluster user management toward a more complete integration with IAM. AWS OpsWorks is primarily a configuration management service integrated with Chef and Puppet. For user management, we don’t need to make use of these configuration management tools directly. Instead, we leverage an AWS OpsWorks Stacks capability that allows IAM users to be assigned an SSH key and provisioned as POSIX user accounts on a registered instance.
Account setup
The summarized manual preparation steps we need to follow to enable OpsWorks-managed users within AWS ParallelCluster are:
- Create an Amazon Virtual Private Cloud (Amazon VPC) and subnets to host the cluster.
- Add cluster users to IAM.
- Create a minimal OpsWorks stack targeting the cluster VPC.
- Import IAM users into OpsWorks.
Additionally, you should ensure that account quotas for both Amazon Elastic Compute Cloud (Amazon EC2) and OpsWorks suit your needs, requesting quota increases where necessary.
This guide assumes you have full administrator access to IAM within the AWS account, as well as permissions to work with Amazon EC2 and OpsWorks services. If this is not the case, consult with your account administrator to determine which steps they need to undertake on your behalf.
To begin, we create an Amazon VPC and subnets for our cluster. To see how this can be achieved using the pcluster
CLI tool, check out the AWS ParallelCluster documentation. Alternatively, you can create your own networking resources using your preferred method.
Next, we need to add the desired user accounts to IAM. Note that we need to grant users either programmatic access or console access during creation. These options relate to general IAM usage; neither is required in order for the integration between OpsWorks and AWS ParallelCluster to function.
Select programmatic access if you do not intend to grant your users any access to the AWS account other than SSH access to the deploying cluster. User accounts do not need any specific access permissions granted. Although we can arrange IAM user accounts into groups if desired, they will not be reflected in the POSIX groups within the cluster.
Once we create the user, we can optionally disable the automatically created API keys. Navigate to the user via the IAM dashboard and open the security credentials tab. Delete or inactivate the API key.
Once IAM users are in place, we create a new OpsWorks stack in the same VPC we plan to use for the AWS ParallelCluster deployment. Most settings retain their default values when using the default Chef 12 stack. We will not use OpsWorks to provision any instances or apply any Chef configuration management recipes.
Once we create the stack, we are ready to import IAM users. On the OpsWorks console, navigate to Users (under OpsWorks Stacks) and then choose Import IAM users to <your region>. Select the IAM users to import, then click Import to OpsWorks. Now we are able to integrate those user accounts with any OpsWorks stacks created within that Region. (Note: If multiple Regions are used, we must import users to each Region.)
For each imported user, we next need to grant access to resources within a stack. In the table of imported users, choose the edit link for the first user. From the user dashboard, enter the SSH public key for the user. Additionally, use the check boxes under Instance access to control their access permissions and sudo privileges on a per-stack basis. The Permission level section grants the user additional IAM permissions for resources in the stack. This is not required, so leave the selection as IAM Policies Only. With this configuration, the user only has the IAM permissions already associated with their user account.
Repeat the user configuration steps for each IAM account we imported into OpsWorks. Once this setup is complete, we are ready to deploy AWS ParallelCluster.
Cluster deployment
To allow instances to register themselves with OpsWorks during the post-install phase, apply the AWS-managed IAM policy AWSOpsWorksInstanceRegistration
to the controller and all compute instances. For deregistration, we need permissions provided by the AWSOpsWorksRegisterCLI_EC2
policy. Add these policies to the ParallelCluster configuration file via the additional_iam_policies
parameter, as shown in the following example:
Note the inclusion of a post_install
parameter and corresponding post_install_args
. The former should be a reference to the following script (uploaded to an Amazon Simple Storage Service [Amazon S3] bucket). The latter should include the ID of the intended OpsWorks stack. Obtain this ID from the OpsWorks dashboard by clicking on the stack name and copying the OpsWorks ID. Note that this is not the same as the stack name.
Registration of instances with OpsWorks is achieved using an AWS CLI command executed within the post-install script. Instances must also be deregistered from OpsWorks when no longer needed; to accomplish this, we insert an additional script onto the controller instance to perform periodic deregistration of terminated instances via a cron job.
The deregistration script (inserted via the configure_opsworks_deregistration
function) obtains the EC2 instance ID and corresponding OpsWorks ID for all instances registered with the current stack. It then checks their state—if an instance is in the terminated
state, it can be safely deregistered. The cron job created by the preceding example post-install script runs every minute. We can reduce this frequency depending on the typical runtime of the cluster jobs and the value of scaledown_idletime
set within the cluster configuration. A log of deregistration activity can be found in /var/log/opsworks-deregister.log
.
Once the cluster is deployed using the pcluster
CLI tool, both the default system user (centos, in this example) and any users configured via OpsWorks can access the cluster via SSH and submit jobs to the batch scheduler. The SSH user name for imported user accounts is visible on the user configuration page within OpsWorks. Each imported user is a member of the opsworks POSIX group.
Managing users after deployment
When an OpsWorks stack registers instances, an opsworks-agent
service is installed. This step allows the list of users and their access/sudo permissions to stay in sync with the stack configuration. To add users, we need to return to the OpsWorks console and import additional users from IAM to the stack (creating the IAM users first where necessary).
Warning: If we remove access to resources for a particular user, their user account is deleted from all instances within the stack and their home directory deleted as well. We can block SSH access for a user without deleting their account and home directory by replacing their SSH key with an invalid input (e.g., a random string) in the OpsWorks user management console. Note that the SSH key must not be empty; an empty key is equivalent to denying access via the stack configuration options, and results in the user account being deleted.
Remember that management of OpsWorks users is independent of stacks. If desired, multiple clusters can be registered with the same stack (for example, to give a single pool of users access to multiple clusters with different configurations). Alternatively, create a separate stack for each cluster to provide granular access control.
Conclusion
In this post, we walked through the process of creating an OpsWorks Stack for the purposes of managing SSH user access to AWS ParallelCluster. By using OpsWorks, account administrators have a mechanism to associate IAM accounts with POSIX users on EC2 instances and to automate both key rotation and access control changes.