Using AWS ParallelCluster with a serverless API
Update – February 22, 2022 : We have released AWS ParallelCluster version 3. It brings with it the new ParallelCluster API and a number of improvements and changes to functionality. Check the Changelog, Instructions for Moving from 2.x to 3.x, or the AWS ParallelCluster documentation for more.
This post is contributed by Dario La Porta, AWS Senior Consultant – HPC
AWS ParallelCluster simplifies the creation and the deployment of HPC clusters. Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. AWS Lambda automatically runs your code without requiring you to provision or manage servers.
In this post, I create a serverless API of the AWS ParallelCluster command line interface using these services. With this API, you can create, monitor, and destroy your clusters. This makes it possible to integrate AWS ParallelCluster programmatically with other applications you may have running on-premises or in the AWS Cloud.
The serverless integration of AWS ParallelCluster can enable a cleaner and more reproducible infrastructure as code paradigm to legacy HPC environments.
Taking this serverless, infrastructure as code approach enables several new types of functionality for HPC environments. For example, you can build on-demand clusters from an API when on-premises resources cannot handle the workload. AWS ParallelCluster can extend on-premises resources for running elastic and large-scale HPC on AWS’ virtually unlimited infrastructure.
You can also create an event-driven workflow in which new clusters are created when new data is stored in an S3 bucket. With event-driven workflows, you can be creative in finding new ways to build HPC infrastructure easily. It also helps optimize time for researchers.
Security is paramount in HPC environments because customers are performing scientific analyses that are central to their businesses. By using a serverless API, this solution can improve security by removing the need to run the AWS ParallelCluster CLI in a user environment. This help keep customer environments secure and more easily control the IAM roles and security groups that researchers have access to.
Additionally, the Amazon API Gateway for HPC job submission post explains how to submit a job in the cluster using the API. You can use this instead of connecting to the master node via SSH.
This diagram shows the components required to create the cluster and interact with the solution.
Cost of the solution
You can deploy the solution in this blog post within the AWS Free Tier. Make sure that your AWS ParallelCluster configuration uses the t2.micro instance type for the cluster’s master and compute instances. This is the default instance type for AWS ParallelCluster configuration.
For real-world HPC use cases, you most likely want to use a different instance type, such as C5 or C5n. C5n in particular can work well for HPC workloads because it includes the option to use the Elastic Fabric Adapter (EFA) network interface. This makes it possible to scale tightly coupled workloads to more compute instances and reduce communications latency when using protocols such as MPI.
To stay within the AWS Free Tier allowance, be sure to destroy the created resources as described in the teardown section of this post.
The stack creates the VPC, the public subnets, and the private subnet required for the cluster in the eu-west-1 Region.
You can also use an existing VPC that complies with the AWS ParallelCluster network requirements.
Deploy the API with AWS SAM
The AWS Serverless Application Model (AWS SAM) is an open-source framework that you can use to build serverless applications on AWS. You use AWS SAM to simplify the setup of the serverless architecture.
In this case, the framework automates the manual configuration of setting up the API Gateway and Lambda function. Instead you can focus more on how the API works with AWS ParallelCluster. It improves security and provides a simple, alternative method for cluster lifecycle management.
- the sam-app folder in the aws-sample repository contains the code required to build the AWS ParallelCluster serverless API.
- sam-app/template.yml contains the policy required for the Lambda function for the creation of the cluster. Be sure to modify <AWS ACCOUNT ID> to match the value for your account.
The AWS Identity and Access Management Roles in AWS ParallelCluster document contains the latest version of the policy, in the ParallelClusterUserPolicy section.
To deploy the application, run the following commands:
cd sam-app sam build sam deploy --guided
From here, provide parameter values for the SAM deployment wizard that are appropriate for your Region and AWS account. After the deployment, take a note of the Outputs:
The API Gateway endpoint URL is used to interact with the API, and has the following format:
AWS ParallelCluster configuration file
AWS ParallelCluster is an open source cluster management tool to deploy and manage HPC clusters in the AWS Cloud. AWS ParallelCluster uses a configuration file to build the cluster and its syntax is explained in the documentation guide. The pcluster.conf configuration file can be created in a directory of your local file system.
The configuration file has been tested with AWS ParallelCluster v2.6.0. The master_subnet_id contains the id of the created public subnet and the compute_subnet_id contains the private one.
Deploy the cluster with the pcluster API
The pcluster API created in the previous steps requires some parameters:
- command – the pcluster command to execute. A detailed list is available commands is available in the AWS ParallelCluster CLI commands page.
- cluster_name – the name of the cluster.
- –data-binary “$(base64 /path/to/pcluster/config)” – parameter used to pass the local AWS ParallelCluster configuration file to the API.
- -H “additional_parameters: <param1> <param2> <…>” – used to pass additional parameters to the pcluster cli.
The following command creates a cluster named “cluster1”:
$ curl --request POST -H "additional_parameters: --nowait" --data-binary "$(base64 /tmp/pcluster.conf)" "https://<ServerlessRestApi>.execute-api.eu-west-1.amazonaws.com/Prod/pcluster?command=create&cluster_name=cluster1" Beginning cluster creation for cluster: cluster1 Creating stack named: parallelcluster-cluster1 Status: CREATE_IN_PROGRESS
The cluster creation status can be queried with the following:
$ curl --request POST -H "additional_parameters: --nowait" --data-binary "$(base64 /tmp/pcluster.conf)" "https://<ServerlessRestApi>.execute-api.eu-west-1.amazonaws.com/Prod/pcluster?command=status&cluster_name=cluster1" Status: CREATE_IN_PROGRESS
When the cluster is in the “CREATE_COMPLETE” state, you can retrieve the master node IP address using the following API call:
$ curl --request POST -H "additional_parameters: --nowait" --data-binary "$(base64 /tmp/pcluster.conf)" "https://<ServerlessRestApi>.execute-api.eu-west-1.amazonaws.com/Prod/pcluster?command=status&cluster_name=cluster1" Status: CREATE_COMPLETE $ curl --request POST -H "additional_parameters: " --data-binary "$(base64 /tmp/pcluster.conf)" "https://<ServerlessRestApi>.execute-api.eu-west-1.amazonaws.com/Prod/pcluster?command=status&cluster_name=cluster1" Status: CREATE_COMPLETE MasterServer: RUNNING MasterPublicIP: 184.108.40.206 ClusterUser: ec2-user MasterPrivateIP: 10.0.0.134
When the cluster is not needed anymore, destroy it with the following API call:
$ curl --request POST -H "additional_parameters: --nowait" --data-binary "$(base64 /tmp/pcluster.conf)" "https://<ServerlessRestApi>.execute-api.eu-west-1.amazonaws.com/Prod/pcluster?command=delete&cluster_name=cluster1" Deleting: cluster1
The additional_parameters: —nowait prevents waiting for stack events after executing a stack command and avoids triggering the Lambda function timeout. The Amazon API Gateway for HPC job submission post explains how you can submit a job in the cluster using the API, instead of connecting to the master node via SSH.
The authentication to the API can be managed by following the Controlling and Managing Access to a REST API in API Gateway Documentation.
You can destroy the resources by deleting the CloudFormation stacks created during installation. Deleting a Stack on the AWS CloudFormation Console explains the required steps.
In this post, I show how to integrate AWS ParallelCluster with Amazon API Gateway and manage the lifecycle of an HPC cluster using this API. Using Amazon API Gateway and AWS Lambda, you can run a serverless implementation of the AWS ParallelCluster CLI. This makes it possible to integrate AWS ParallelCluster programmatically with other applications you run on-premise or in the AWS Cloud.
This solution can help you improve the security of your HPC environment by simplifying the IAM roles and security groups that must be granted to individual users to successfully create HPC clusters. With this implementation, researchers no longer must run the AWS ParallelCluster CLI in their own user environment. As a result, by simplifying the security management of your HPC clusters’ lifecycle management, you can better ensure that important research is safe and secure.
To learn more, read more about how to use AWS ParallelCluster.