Running Simcenter STAR-CCM+ on AWS with AWS ParallelCluster, Elastic Fabric Adapter and Amazon FSx for Lustre

Update June 27, 2022: The latest version of the scripts referenced in this blog can be found at https://cfd-on-pcluster.workshop.aws/starccm.html.

This post is contributed by Anh Tran – Sr. HPC Specialized Solutions Architect

Introduction

AWS recently introduced many HPC services that boost the performance and scalability of Computational Fluid Dynamics (CFD) workloads on AWS. These services include: Amazon FSx for Lustre, Elastic Fabric Adapter (EFA), and AWS ParallelCluster 2.5.1. In this technical post, I walk through these three services. Additionally, I outline an example of using AWS ParallelCluster to set up an HPC system with EFA and Amazon FSx Lustre to run a CFD workload. The CFD application that you will set up during this blog post is Simcenter STAR-CCM+ – the predominant CFD application from Siemens.

Service and solution overview

Services

This blog primarily uses three services – Amazon FSx for Lustre, EFA, and AWS ParallelCluster. Let’s dig into each of these services before reviewing the solution.

Amazon FSx for Lustre

In December 2018, AWS released Amazon FSx for Lustre. This is a fully managed, high-performance file system, optimized for fast processing workloads, like HPC. Amazon FSx for Lustre allows users to access and alter data from either Amazon S3 or on-premises seamlessly and exceptionally fast. For example, you can launch and run a file system that provides low latency access to your data. Additionally, you can read and write data at speeds of up to hundreds of gigabytes per second of throughput, and millions of IOPS. This speed and low-latency unleashes innovation at an unparalleled pace. This blog post uses the latest version of Amazon FSx for Lustre which recently added a new API for moving data in and out of Amazon S3. This API also includes POSIX support, which allows files to mount with the same user id. Additionally, the latest version also includes a new backup feature that allows you to back up your files to an S3 bucket. I go into more detail of how to take advantage of this at the end of the blog.

Elastic Fabric Adapter

In April of 2019, AWS released EFA. This enables you to run applications requiring high levels of inter-node communications at scale on AWS.

AWS ParallelCluster 2.5.1.

AWS ParallelCluster is an open source cluster management tool that simplifies deploying and managing HPC clusters with Amazon FSx for Lustre, EFA, a variety of job schedulers, and the MPI library of your choice. AWS ParallelCluster simplifies cluster orchestration on AWS so that HPC environments become easy-to-use even for if you’re new to the cloud. AWS recently released AWS ParallelCluster 2.5.1 – which is the version we will use for this blog.

These three AWS HPC components are optimal for CFD applications. Together, they provide simple deployment of HPC systems on AWS, low latency network communication for MPI workloads, and a fast, parallel filesystem. Now, let’s take a look at how these services come together and seamlessly run a real CFD application: Simcenter STAR-CCM+.

Solution

AWS has a long-standing collaboration with Siemens. AWS and Siemens are dedicated to enhancing Siemens’ customer experiences when they run Simcenter STAR-CCM+ apps on AWS. I am excited to walk you through the steps and the best practices for running Simcenter STAR-CCM+, the predominant CFD application from Siemens.

The Simcenter STAR-CCM+ application runs on an HPC system. This system is optimized with EFA and Amazon FSx for Lustre — all of which is managed by AWS ParallelCluster. AWS ParallelCluster simplifies the deployment process to such an extent that you can set up your HPC cluster with a high throughput parallel file system (Amazon FSx for Lustre), a high-throughput and low-latency network interface (EFA), and high-bandwidth network interconnects (100 Gbps using C5n instances) in less than 15 minutes.

Now that you have the services and solution overviews, we can get started. This blog post includes the following steps:

Creating an HPC infrastructure stack on AWS, which will include:
- How to set up AWS ParallelCluster for best performance
- How to enable EFA
- How to enable the Amazon FSx Lustre file system and how to use some basic Amazon FSx for Lustre for STAR-CCM+
- How to connect to a remote desktop session using NICE DCV
Installing the Simcenter STAR-CCM+ application and how to submit a Simcenter STAR-CCM+ job to HPC cluster.

The following diagram outlines the steps

Steps

Setting up the HPC infrastructure stack

Before you can run Simcenter STAR-CCM+, you need to build an HPC cluster first. Some best practices that you should consider when setting up a cluster on AWS include:

Turn off Hyper-Threading (HT): AWS instances have HT turned on by default.
Use EFA and a cluster placement group for your compute fleet to minimize the latency between nodes.
Select the right instance type for your compute fleet. Here, I use C5n.18xlarge because of its high-performance CPU, high-bandwidth bandwidth networking, and the EFA network interface capabilities.

With HPC best practices in mind, you can set up your AWS ParallelCluster

Set up AWS ParallelCluster:

Install AWS CLI or turn on AWS Cloud9 before installing AWS Parallel Cluster. I recommend using AWS Cloud9 because of its simplicity. Follow this tutorial to set up Cloud9. To install the AWS CLI, follow the steps shown here.
Verify that AWS Cloud9 or AWS CLI is working:
1. Create an S3 bucket

$aws s3 mb s3://benchmark-starccm
make_bucket: benchmark-starccm

*Note: As an example, I create an S3 bucket benchmark-starccm, however you should create an S3 bucket with a different name of your choice because the S3 bucket name must be globally unique.

Let’s download the STAR-CCM+ installation file and a case file, then upload them to the S3 bucket that we just created.

Download latest Simcenter STAR-CCM+ package from the Siemens portal. It will look something like this: STAR-CCM+15.02.003_01_linux-x86_64-2.12_gnu7.1.zip
Download the Le Mans case file. The Le Mans case is 104 Million cells, which even today is considered large for a CFD case. The file will look something like this: LeMans_104M.sim

After downloading the STAR-CCM+ software and LeMans case file, upload them to the S3 bucket created above

aws s3 cp STAR-CCM+15.02.003_01_linux-x86_64-2.12_gnu7.1.zip s3://benchmark-starccm
aws s3 cp LeMans_104M.sim s3://benchmark-starccm/

We will use this same S3 bucket to install the Simcenter STAR-CCM+ application later in this tutorial

With all the ground work done, we can now build our HPC cluster. For more detailed instructions, you can consult Getting Started with AWS ParallelCluster.

Install AWS ParallelCluster

pip install aws-parallelcluster

Configure AWS ParallelCluster with some basic network information such as AWS Region ID, VPC ID, Subnet ID

pcluster configure

Modify your ~/.parallelcluster/config file to include a cluster section that minimally includes the following:

[aws]
aws_region_name = us-east-2

[cluster default]
vpc_settings = public
key_name = <Key-Name>
initial_queue_size = 2
max_queue_size = 100
maintain_initial_size = true
placement_group = DYNAMIC
placement = cluster
master_instance_type = c5.xlarge
compute_instance_type = c5n.18xlarge
cluster_type = ondemand

base_os = centos7
tags = {"Name" : "STARCCM"}

enable_efa = compute
fsx_settings = fsxshared
disable_hyperthreading = true

dcv_settings = hpc-dcv

[vpc public]
master_subnet_id = subnet-<Subnet-ID>
vpc_id = vpc-<VPC-ID>

[global]
update_check = true
sanity_check = true
cluster_template = default

[dcv hpc-dcv]

enable = master

[fsx fsxshared]
shared_dir = /fsx
storage_capacity = 1200
imported_file_chunk_size = 1024
import_path = s3://benchmark-starccm

export_path = s3://benchmark-starccm

[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

Now, create your first HPC cluster with the name starccmby running

pcluster create starccm

Your HPC cluster should be ready in about 15 minutes.

While you’re waiting for your cluster to be ready, let’s take a deeper look at what some of the different parameters we used mean for our HPC cluster:

initial_queue_size: We will start with two compute instances after the HPC cluster is up.

max_queue_size: We will limit the maximum compute fleet to 100 instances. This allows us room to scale our jobs up to a large number of cores while putting a limit on the number of compute nodes to help control costs.

base_os: For this blog, we will select centos 7 as a base os. Currently we support Amazon Linux (alinux), Centos 7 (centos7), Ubuntu 16.04 (ubuntu1604), and Ubuntu 18.04 (ubuntu1804) with EFA.

master_instance_type: This can be any instance type. Here we choose c5.xlarge because it is inexpensive and relatively fast for the head node.

compute_instance_type: We select C5n.18xlarge because it is optimized for compute-intensive workloads and supports EFA for better scaling of HPC. Note that EFA is currently only available on c5n.18xlarge, c5n.metal, i3en.24xlarge, p3dn.24xlarge, inf1.24xlarge, m5dn.24xlarge, m5n.24xlarge, r5dn.24xlarge, r5n.24xlarge, and p3dn.24xlarge. See the docs for Currently supported instances.

placement_group: We use placement_group to ensure our instances are located as physically close to one another as possible to minimize the latency between compute nodes and take advantage of EFA’s low latency networking.

enable_efa: with just one configuration line, we can easily turn on EFA support for our HPC cluster.

dcv_settings = hpc-dcv: With AWS ParallelCluster 2.5.1 you can use NICE DCV to support your remote visualization needs.

disable_hyperthreading: This setting turns off hyper-threading on the cluster

[fsx fsxshared]: This section contains the settings to define your FSx for Lustre parallel file system, including the location where the shared directory will be mounted, the storage capacity for the filesystem, the chunk size for files to be imported, and the location from which the data will be imported. You can read more about FSx for Lustre here.

[dcv hpc-dcv]: This section contains the settings to define your remote visualization setup. You can read more about DCV with AWS ParallelCluster here.

After you set up your config file for AWS ParallelCluster, log in and verify that you can access the cluster’s head node

pcluster ssh starccm -i ~/path/to/ssh_key

Verify the compute nodes are up. We should see two c5n.18xlarge nodes.

$ qhost
HOSTNAME ARCH NCPU NSOC NCOR NTHR LOAD MEMTOT MEMUSE SWAPTO SWAPUS
global - - - - - - - - - -
ip-172-31-14-220 lx-amd64 36 2 36 36 0.49 184.6G 11.7G 0.0 0.0
ip-172-31-2-137 lx-amd64 36 2 36 36 0.45 184.6G 11.7G 0.0 0.0

Verify the EFA driver has been loaded successfully.

In order to verify if EFA is installed correctly you will need to ssh into one of compute nodes and run :

[centos@ip-172-31-14-220 ~]$ which mpirun
/opt/amazon/openmpi/bin/mpirun

[centos@ip-172-31-14-220 ~]$ fi_info -p efa

provider: efa
fabric: EFA-fe80::97:9fff:fe1e:4e78
domain: efa_0-rdm
version: 2.0
type: FI_EP_RDM
protocol: FI_PROTO_EFA

provider: efa
fabric: EFA-fe80::97:9fff:fe1e:4e78
domain: efa_0-dgrm
version: 2.0
type: FI_EP_DGRAM
protocol: FI_PROTO_EFA
provider: efa;ofi_rxd
fabric: EFA-fe80::97:9fff:fe1e:4e78
domain: efa_0-dgrm
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_RXD

At this point, EFA is verified.

Install Simcenter STAR-CCM+ application

Now that the HPC cluster using AWS ParallelCluster is set up, it’s time to install the Simcenter STAR-CCM+ application. In the prior steps, you uploaded a Simcenter STAR-CCM+ application and a case file to S3 bucket and used that S3 bucket as a source for the Amazon FSx for Lustre /fsx storage. As soon as the cluster created, the installation file and the case file will be available in /fsx

[centos@ip-172-31-0-70 fsx]$ cd /fsx
[centos@ip-172-31-0-70 fsx]$ ls
LeMans_104M.sim  STAR-CCM-14.06.013_01_linux-x86_64.tar.gz  STAR-CCM+14.06.013_01_linux-x86_64.tar.gz

As you can see, Amazon FSx for Lustre has already downloaded the case file from the S3 bucket to the /fsx partition, so now you can start install Simcenter STAR-CCM+ software using the following steps.

Install Simcenter STAR-CCM+: install Simcenter STAR-CCM+ on /fsx – a 1.2TB Lustre filesystem that you configured in a previous step.

cd /fsx
sudo unzip STAR-CCM+15.02.003_01_linux-x86_64-2.12_gnu7.1.zip
cd STAR-CCM+15.02.003_01_linux-x86_64-2.12_gnu7.1
./STAR-CCM+15.02.003_01_linux-x86_64-2.12_gnu7.1.sh
Select Installation Location : /fsx/Siemens

After following all the standard installation steps from Simcenter STAR-CCM+, the application should be installed at the following location:

/fsx/Siemens/15.02.003/STAR-CCM+15.02.003/

Test the installation

[centos@ip-172-31-0-185 fsx]$ /fsx/Siemens/15.02.003/STAR-CCM+15.02.003/star/bin/starccm+ -version
Simcenter STAR-CCM+ 2020.1 Build 15.02.003 (linux-x86_64-2.12/gnu7.1)

When the above code shows up, you correctly installed Simcenter STAR-CCM+, so now you can run the application.

Running Simcenter STAR-CCM+ on AWS ParallelCluster

Before I move on, let’s recap what you’ve have done so far.

You set up an HPC cluster using AWS ParallelCluster with compute-optimized C5n instances, an Amazon FSx for Lustre filesystem, and EFA-enabled networking.
You installed Simcenter STAR-CCM+ application

Now, let us create an SGE submission script and submit a job to the HPC cluster.

Create an SGE job submission script :

cd /fsx/

vi star-ccm.qsub

#!/bin/bash
#$ -N check
#$ -cwd
#$ -j Y
#$ -pe mpi 252

date

your_pod_key="your license key"
ccmp=" /fsx/Siemens/15.02.003/STAR-CCM+15.02.003/star/bin/starccm+”
case="/fsx/LeMans_104M.sim"

fabric="OFI"
tag="fsx-${fabric}"

${ccmp} \
-power \
-podkey $your_pod_key \
-licpath 1999@flex.cd-adapco.com \
-mpi openmpi \
-bs sge \
-benchmark:"-preits 40 -nits 20 -nps ${NSLOTS} -tag ${tag}" \
${case}

date

mkdir ${NSLOTS}new
mv *xml ${NSLOTS}new

Submit your Simcenter STAR-CCM+ job to your HPC cluster

qsub -V star-ccm.qsub

Now you have submitted an HPC job that requests 252 cores of c5n.18xlarge.

Check the status of your jobs by running

qstat -f

Simcenter STAR-CCM+ result analysis

Here are sample scaling results for the LeMans 104M cell benchmark case. As you can see, the Simcenter STAR-CCM+ result shows exciting performance and scaling results running on AWS. The performance is fast and the simulation scales very well with EFA-based networking and great CPU performance.

Connect to a NICE DCV session

AWS ParallelCluster is now natively integrated with NICE DCV. You can configure a NICE DCV session to visualize your Simcenter STAR-CCM+ result or connect to the application remotely.

As you will recall from when we configured our AWS ParallelCluster in the previous section, I named the DCV server as hpc-dcv. To create and connect to a NICE DCV session, just run:

pcluster dcv connect hpc-dcv -k <Key-Name>

After you connect to NICE DCV session, you will be able to access to a Linux desktop to work with STAR-CCM+ application.

Backup SIMCENTER STAR-CCM+ result to S3 bucket

After you finish your Simcenter STAR-CCM+ simulation, you can backup data in /fsx to your S3 bucket that you created from running your application. You can now use Data Repository Tasks. Data Repository Tasks represent bulk operations between your Amazon FSx for Lustre file system and your S3 bucket. One of the jobs is to export your changed file system contents back to its linked S3 bucket.

*Note: in order to use new Amazon FSx for Lustre feature, you will need to have AWS CLI version 1.16.309 or above.

In this case, I select to export the STAR-CCM+ application directory Siemens as an example.

Exit the HPC head node, and go back to your laptop or Cloud9 environment where you have configured your AWS CLI. Find out your Amazon FSx Lustre ID by running:

aws fsx describe-file-systems

After you find the Amazon FSx for Lustre ID, which looks simiar to fs-0d72d520f620d765a, create a backup of the data by running:

create-data-repository-task

aws fsx create-data-repository-task --file-system-id fs-0d72d520f620d765a --type EXPORT_TO_REPOSITORY --paths Siemens,testfsx --report Enabled=true,Scope=FAILED_FILES_ONLY,Format=REPORT_CSV_20191124,Path=s3://benchmark-starccm/

Explanation:

–file-system-id: your file system ID

–type EXPORT_TO_REPOSITORY: we will export the data back to the S3 bucket

–paths Siemens,testfsx: the directories you want to export to S3 bucket

Format=REPORT_CSV_20191124: note this is only name the Amazon FSx Lustre supports. Please keep it the same.

Check the status of the backup by running describe-create-data-repository-task

aws fsx describe-data-repository-tasks

As you can see, I can now use FSx for Lustre to install the Simcenter STAR-CCM+ application, and seamlessly move case data between on-premise to Amazon S3 and AWS HPC system. I can create a file system linked to a S3 bucket, create a FSx for Lustre filesystem, and export data back to their S3 bucket after running the CFD app.

Conclusion

Give it a try, set up an HPC environment, and let us know how it goes! If you need more information about running CFD and HPC cases on AWS you can find it on our HPC home page. Please feel free to contact us with questions that you might have.

AWS Compute Blog