AWS HPC Blog
Running Windows HPC Workloads using HPC Pack in AWS
Customers have been running Microsoft Workloads on Amazon Web Services (AWS) for over 12 years, longer than any other cloud provider. For select high performance computing (HPC) use cases, customers have told us they’d like to run HPC simulations using scalable AWS infrastructure that require the Windows operating system. If you have ever tried to stand up an HPC cluster manually before, you might have come across challenges such as: unclear documentation, Active Directory domain joining issues, networking topology challenges, and several tedious steps during the deployment process. In addition, HPC in the cloud is typically leveraged differently than traditional HPC infrastructure. Instead of long-lived and static clusters of compute, clusters on AWS are elastic and often get spun up to run a simulation and then torn down once the work is done.
To help the set-up process of an HPC cluster for Windows HPC workloads, we have provided an AWS CloudFormation template that automates the creation process to deploy an HPC Pack 2019 Windows cluster. This will help you get started quickly to run Windows-based HPC workloads, while leveraging highly scalable, resilient, and secure AWS infrastructure.
In order to demonstrate running a Windows HPC workload on AWS, we will use EnergyPlus (https://energyplus.net/). EnergyPlus is an open-source energy simulation tool maintained by the U.S. Department of Energy’s (DOE) Building Technology Office (BTO) that is used for modeling energy consumption in buildings.
This blog post will outline the solution, how to deploy it, and how to run a sample parametric sweep job using EnergyPlus.
Solution overview
This solution allows you to install HPC Pack 2019 to manage a cluster of Windows Servers for High Performance Computing. The AWS CloudFormation template will launch a Windows-based HPC cluster running Windows Server 2016 and supporting core infrastructure including Amazon Virtual Private Cloud (VPC) and Active Directory Domain Controllers using AWS Managed Microsoft AD. From a security perspective, the head node is publicly facing, and the compute nodes remain private via private subnets. All input files are pulled from EnergyPlus and output files are uploaded to Amazon Simple Storage Service (S3). Compute nodes interact with Amazon S3 through an S3 Gateway Endpoint, allowing for performance gains and cost efficiency through private connectivity to S3 versus routing traffic through a NAT Gateway.
AWS Secrets Manager is used to store the configuration information for your HPC Pack cluster. This includes information that is needed to join the compute nodes to Active Directory, as well as the password to leverage the certificate that’s created. Using Secrets Manager to store sensitive information allows the compute nodes to dynamically retrieve this information when an instance is added to the cluster but doesn’t persist that information on the instance.
The cluster that is created by the CloudFormation template works better with loosely coupled workloads. For more tightly coupled workloads, the solution can be modified to use a shared file system such as Amazon FSx for Windows File Server, and Amazon EC2 placement groups — a configuration that can be set to pack instances closer together inside an Availability Zone to help achieve the low-latency network performance necessary for tightly-coupled node-to-node communications.
The example parametric sweep job will run a simulation across the worker nodes in the cluster. In order to take advantage of the scalability of the cloud, the first step in the parametric sweep job will scale out the number of nodes in the cluster, and the last step will scale in the number of nodes in the cluster.
Walkthrough
The walkthrough is broken into three sections: pre-requisites, deploying artifacts, deploying the AWS CloudFormation template, and running a building energy HPC simulation.
Prerequisites
For this walkthrough, you should have the following prerequisites:
- An AWS account
- Beginner knowledge of core AWS services – compute, network, and storage
- Beginner to intermediate knowledge of HPC
Clone GitHub repository
The solution is hosted in GitHub at the following URL: https://github.com/aws-samples/aws-cfn-windows-hpc-template.
1. Clone repository to your local machine using the command:
git clone https://github.com/aws-samples/aws-cfn-windows-hpc-template
Create S3 buckets
We need to create two S3 buckets, one for deployment artifacts and one for simulation results.
- Open the Amazon S3 Console at https://console.aws.amazon.com/s3.
- Choose Create bucket.
- On the Create bucket page:
- In the Bucket name area, enter a unique bucket name for deployment artifacts.
- A name such as
hpcpack-2019-[AWS ACCOUNT ID]
should ensure a globally unique bucket.
- A name such as
- Ensure that Block all public access in the Block Public Access settings for this bucket is selected.
- In the Default encryption section:
- Select Enable.
- Select Amazon S3 key (SSE-S3) as the Encryption key type.
- Choose Create bucket on the bottom of the page.
- In the Bucket name area, enter a unique bucket name for deployment artifacts.
- Once the bucket was created successfully, choose Create bucket
- On the Create bucket page:
- In the Bucket name area, enter a unique bucket name for simulation results.
- A name such as
hpcpack-output-2019-[AWS ACCOUNT ID]
should ensure a globally unique bucket.
- A name such as
- Ensure that Block all public access in the Block Public Access settings for this bucket is selected.
- In the Default encryption section:
- Select Enable.
- Select Amazon S3 key (SSE-S3) as the Encryption key type.
- Choose Create bucket on the bottom of the screen.
- In the Bucket name area, enter a unique bucket name for simulation results.
Deploy artifacts
- Download HPC Pack from Microsoft’s website (https://www.microsoft.com/en-us/download/confirmation.aspx?id=101360).
- Rename the downloaded HPC Pack file to HPCPack.zip.
- Upload artifacts to the bucket (Console).
- Open the Amazon S3 Console at https://console.aws.amazon.com/s3.
- In the Buckets section, choose the deployment bucket you created.
- Choose Upload.
- Choose Add Files.
- Choose the HPCPack.zip file, as well as ScriptsForComputeNode2019.zip and ScriptsForHeadNode2019.zip from the cloned solution.
- Choose Upload and wait for the Upload succeeded
- Upload HPCPack.zip to the bucket (optional method via CLI)
- Execute the following CLI Commands, replacing the S3 bucket name with the deployment bucket name you chose earlier:
aws s3 cp HPCPack.zip s3://hpcpack-2019-[AWS ACCOUNT ID] aws s3 cp ScriptsForComputeNode2019.zip s3://hpcpack-2019-[AWS ACCOUNT ID] aws s3 cp ScriptsForHeadNode2019.zip s3://hpcpack-2019-[AWS ACCOUNT ID]
- Execute the following CLI Commands, replacing the S3 bucket name with the deployment bucket name you chose earlier:
Deploy CloudFormation templates
After we’ve deployed our artifacts, we need to deploy our CloudFormation templates.
- Navigate to the CloudFormation console at https://console.aws.amazon.com/cloudformation/home.
- Choose Create stack.
- Choose the With new resources (standard)
- In the Create stack page, do the following:
- In the Specify template section, choose the Upload a template file
- Select the Choose file
- Navigate to the cloned GitHub solution.
- Select the HPCLab2019.yml file in the path HPCLab/CloudFormation.
- Choose Open.
- Choose Next.
- On the Specify stack details page, do the following:
- In the Stack name text area, enter a name for the CloudFormation stack, i.e.
HPC-Pack-Cluster
. - In the parameters section, fill in the following:
- The Installation Bucket Name parameter will be the first bucket you created to which you uploaded the 3 zip files.
- Select the region in which you created the installation bucket for the Installation Bucket Region parameter.
- The Remote Desktop Location should be your own public IP in CIDR notation, i.e.
192.168.1.1/32
. - In the Output Bucket Name text area, input the bucket where the simulation results should go. This should be the second bucket you created.
- For the Scaling Notification Email, enter your email address to receive notifications when compute nodes are added.
- Select Next.
- In the Stack name text area, enter a name for the CloudFormation stack, i.e.
- On the Configure Stack Options page, leave all options as default.
- Choose Next.
- On the review page, scroll to the bottom and select the checkbox that says “I acknowledge that AWS CloudFormation might create IAM resources.”.
- Choose Create stack, it should take approximately 50 minutes to complete.
Run an HPC Job using EnergyPlus to analyze building energy efficiency simulations
The steps below will allow you to run a Parametric Sweep job using the open source EnergyPlus simulation program. EnergyPlus allows engineers, architects, and researchers to model energy consumption, as well as water use in building.
EnergyPlus will be downloaded on the Head Node from a public GitHub repository (https://github.com/NREL/EnergyPlus) then uploaded to your S3 bucket as part of bootstrapping the Head Node. When a compute node is created, it will then download the artifact from your S3 bucket to be able to run simulations. We copy EnergyPlus from GitHub to S3 because it will limit the amount of data ingress and egress to the internet for each compute node.
Login to Head Node
- Navigate to the CloudFormation Console at https://console.aws.amazon.com/cloudformation/.
- Choose the HPC Pack CloudFormation stack you deployed in the previous section.
- Select the Outputs
- Locate the PublicIP field and save the value. You’ll need this to login to the head node.
- Navigate to Secrets Manager at https://console.aws.amazon.com/secretsmanager/.
- Locate the secret with the name that begins with “ClusterSecrets” and select it.
- Select the Retrieve Secret Value
- Identify the HPCUserName and HPCUserPassword values which you will use to log into the head node.
- Open your preferred Windows Remote Desktop Protocol (RDP) client.
- Login to the head node by using the PublicIP retrieved earlier, with the credentials from Secrets Manager.
Run the simulation
The following steps are to be performed on the head node.
- Select the Start Menu.
- Locate Microsoft (R) HPC Pack 2019 folder and choose HPC Cluster Manager. You should see the head node and worker nodes under Resource Management. You are able to RDP into the compute nodes from the head node if you’d like. Below is a screenshot of how the HPC Cluster Manager should look.
- Within HPC Cluster Manager, select Job Management.
- Next, select Add job from XML from under the Actions section.
- Add the job definition file from the following location:
C:\cfn\install\Parametric-Job-Prod.xml
- Leave all settings as default and select Submit.
- You may be prompted for the Windows Admin password. This is the password you used to RDP into the head node.
- Wait for the job to finish. Below are examples of the finished parametric sweep job.
- Navigate to S3 in your browser at https://console.aws.amazon.com/s3.
- Select the
S3 OUTPUT BUCKET
you specified in the CloudFormation deployment parameters. Your output files should be in the bucket. - (Optional) Download the
eplustbl.htm
file to and open it to view the results in a web browser.
You’re finished with processing! You can also continue to analyze the data in S3 using Glue, Athena, and SageMaker for analytics or machine learning.
Moving forward, you can also Right Click the finished job and View Job to view tasks performed on the compute nodes. HPC Pack allows you to copy and edit the job based on a number of HPC job settings relating to parameters such as core and node allocation for additional job creation.
Cleaning up
In the Ohio Region (us-east-2), the default configuration of 1 head node and 2 compute nodes that are m5.xlarge Windows EC2 instances will cost $1.128 per hour in EC2 costs. The AWS Directory Service for Microsoft Active Directory (Standard Edition) will cost $0.12 per hour. The two NAT Gateways will cost $0.09 per hour.
To avoid incurring future charges, delete the CloudFormation template and S3 bucket that were created.
Conclusion
In conclusion, deploying Windows-based HPC clusters on AWS could be simplified by leveraging repeatable infrastructure as code. This post showed you how to automate the creation of an HPC cluster, while allowing you to leverage scalable AWS infrastructure to run Windows-based HPC simulations.
Get started with HPC on AWS today using the AWS HPC Getting Started page. Non-profit and research institutions can apply for AWS credits for their HPC use cases using the AWS Cloud Credit for Research Program.