AWS Compute Blog

Using cost allocation tags with AWS ParallelCluster

This post is courtesy of Dario La Porta, Senior Consultant, HPC.

With high performance computing (HPC) workloads running in the AWS Cloud, customers can scale workloads easily and select from a variety of instance types.

With this additional flexibility, elasticity, and scale, it’s important to track your costs and resource utilization for specific projects or users in HPC environments. You can do this using orchestration tools such as AWS ParallelCluster with AWS Cost Explorer and AWS Budgets. These allow you to manage cost allocation, forecast spending, and set up billing alarms that trigger on defined budget thresholds. You can also analyze usage to reduce cost or optimize price and performance.

In this post, I show how this can be done by using tags, at both AWS account-level and for specific projects or users. You can download the code referenced in this post from this repo.

AWS ParallelCluster deployment

AWS ParallelCluster is an open source cluster management tool to deploy and manage HPC clusters in the AWS Cloud.

Copy the following files to an Amazon S3 bucket before the deployment of the cluster:

  • The post_install.sh script configures the Slurm cluster after the deployment. Replace <bucket> with your bucket name.
  • The projects_list.conf file contains the list of the projects assigned to each user.
  • The sbatch script used as wrapper to the Slurm sbatch command. Replace <account_id> with the id of your account.

This Amazon CloudFormation template is used to deploys an Amazon Virtual Private Cloud (Amazon VPC) with public subnets (for the cluster’s head node) and private subnets (for the cluster’s compute nodes). You can also specify the CIDR range for the subnets and VPC when you launch this stack. In addition, it creates the policies required for the AWS ParallelCluster’s AdditionalIamPolicies configuration. These are used to apply the tags to track per-project and per-user costs.

The stack creates the VPC, the public subnets, the private subnet, and the additional policies required for the cluster.

Stack outputs

Stack outputs

After the stack is deployed, you can use the provided AWS ParallelCluster config file to build the cluster.

The template uses t2.micro instance type for the cluster’s master and compute instances.

For real-world HPC use cases, you most likely want to use a different instance type, such as C5 or C5n.

You must replace <public-subnet-id> with the id of the created public subnet, <private-subnet-id> with the id of the created private subnet, <account_id> with your account ID, <key> with your key pair and <bucket> with the bucket name that contains the post_install.shprojects_list.conf, and sbatch scripts.

This tutorial assumes you know how to set up an HPC cluster in AWS ParallelCluster. To learn how to do this, refer to the AWS ParallelCluster documentation.

Review your HPC cost

After the cluster deployment, the following tags are created in the environment when you submit a job:

  • aws-parallelcluster-jobid – the job ID assigned to the compute instance.
  • aws-parallelcluster-username – the owner of the submitted job.
  • aws-parallelcluster-project – the project assigned to the job.

A tag is a label that you or AWS assigns to an AWS resource. Each tag consists of a key and a value. For each resource, each tag key must be unique, and each tag key can have only one value.

After the tags are applied to the environment, you can activate the user-defined cost allocation tags for your billing reports.

You can specify the project of a job using the Slurm –comment parameter:

$ sbatch --comment ProjectA script.sh

The sbatch command is a wrapper to the Slurm sbatch command. This script extends the standard sbatch functionality, enabling project tag management.

Usually, a cluster can be used for multiple purposes and projects. When a user specifies the project related to a job, the underline system adds the correct aws-parallelcluster-project tag to the instances created to run the computation.

In addition, the aws-parallelcluster-username tag link the instances to the user. This assignment can be used to bill the correct user for the used resources.

The AWS Cost Explorer service is used to visualize, understand, and manage the HPC expenses related to your projects. Note that your account’s spending in Cost Explorer may take up to 24 hours to propagate. The following graphs are examples showing how you can group your costs by Projects and Job IDs.

Costs by Project ID

Costs by Project ID

Costs by Job ID

Costs by Job ID

When the solution is deployed in a multiuser environment, you can also track the expenses of each user grouping the report by aws-parallelcluster-username.

You can build a multiuser environment by reading the AWS ParallelCluster multiple user access to clusters documentation.

Creating budgets for HPC spending

Sometimes tracking the expenses is not sufficient because you may need to set budget limits for users for specific projects.

To set a custom budget that alerts you if costs or usage exceed a budgeted amount, use the AWS Budgets service. The documentation for creating a budget explains how to create a cost budget. Under Budget parameters, you can choose the AWS ParallelCluster tags that you want to use for the budget creation.

Budget parameters

Budget parameters

Budget history

Budget history

You can also use the create-budget API to create project and user-level budgets.

aws budgets create-budget \
    --account-id 111122223333 \
    --budget file://budget.json \
    --notifications-with-subscribers file://notifications-with-subscribers.json

The budget.json file contains the budget object that you want to create. You can replace <amount> with the budget allocated for the project and <project_name> with the name of the project.

The notifications-with-subscribers.json file contains the notification associated with the budget. The <email> string must be replaced with the email address where you want to receive the budget notifications. You can review the syntax of both files in the create-budget API documentation.

You can limit which projects are assigned to a user by using the /opt/slurm/etc/projects_list.conf configuration file. This limits users to using only the correctly allocated projects.

For example:

ec2-user=ProjectA, ProjectB

The ec2-user is assigned to ProjectA and ProjectB.

If the user tries to specify a different project or omit it in the submission line, Slurm does not allow the execution of the job.

$ sbatch --comment ProjectC script.sh 
You are not allowed to use the project ProjectC

$ sbatch  script.sh 
You need to specify a project. "--comment ProjectName"

Using this approach, you can mandate users of the cluster to assign every job to a specific project. This allows expense tracking for each user.

If your budget reaches the limit for a project, you can prevent the user from exceed the allocated budget. Set the budget variable to yes under the /opt/slurm/bin/sbatch script.

Ensure that the project name defined in the /opt/slurm/etc/projects_list.conf file has the same name of the budget defined in AWS Budgets.

$ sbatch --comment ProjectB script.sh 
The Project ProjectB doens not have any associated budget. Please ask the administrator to create it.

$ sbatch --comment ProjectA script.sh 
The Project ProjectA doens not have more budget allocated for this month.

Conclusion

In this post, I show you the creation of an AWS ParallelCluster Slurm cluster that allows to track your expenses with AWS Cost Explorer and AWS Budgets.

Often, it’s challenging to assign specific cost to a relative HPC project or user. This approach can help you use cost optimization techniques and to analyze the data to find savings.

You can now assign a project to each job and monitor it from AWS Cost Explorer. You can also ensure that users stay within a budget that you have allocated in AWS Budgets.

To learn more, read about how to use AWS ParallelClusterAWS Cost Explorer, and AWS Budgets.