亚马逊AWS官方博客

AWS ParallelCluster 3集成ANSYS CFD计算

简介

  • AWS ParallelCluster

AWS ParallelCluster是AWS支持的开源集群管理工具,可帮助您部署和管理高性能计算 (HPC) 集群。ParallelCluster是建立在开源 CfnCluster 项目的基础上,AWS ParallelCluster可以快速构建 HPC 计算环境。自动设置所需的计算资源和共享文件系统。可以在AWS ParallelCluster环境中使用批处理调度器AWS Batch或Slurm,旧版本ParallelCluster还支持PBS和SGE。AWS ParallelCluster便于快速启动概念验证部署和生产部署。也可以在 AWS ParallelCluster 基础之上构建更高级别的工作流程,例如 CFD高性能计算。

AWS ParallelCluster可以使用多个AWS HPC服务,例如图形展示的NICE DCV和高性能计算文件系统FSX Lustre。DCV可以使用在CFD前后处理上,典型的场景是工程师可以通过DCV使用CFD Post打开最终的计算模型,进行查看验证。也可以通过ICEM进行前处理操作。FSX Lustre提供符合高性能计算需求的带宽和延迟。

  • NICE DCV

NICE DCV 是一种高性能远程显示协议,为客户提供一种安全的方式,可以在各种网络条件下,将远程桌面和应用程序从任何云或数据中心流式传输到任何设备。借助 NICE DCV 和 Amazon EC2,客户可以在 EC2 实例上远程运行图形密集型应用程序,并将结果流式传输到客户端计算机上,从而无需昂贵的专用工作站。跨多种 HPC 工作负载的客户使用 NICE DCV 满足其远程可视化要求。在 Amazon EC2 上使用 NICE DCV 不会产生任何额外费用。您只需为用于运行和存储工作负载的 EC2 资源付费。

  • FSx for Lustre

FSx for Lustre 使启动和运行流行的高性能 Lustre 文件系统变得轻松且经济高效。您可以使用 Lustre 来处理如机器学习、高性能计算 (HPC)、视频处理和财务建模。

开源 Lustre 文件系统专为需要快速存储的应用程序而设计。Lustre 旨在解决快速、廉价地处理世界上不断增长的数据集的问题。这是一个广泛使用的文件系统,专为世界上速度最快的计算机而设计。它提供亚毫秒级的延迟、高达数百 GB的吞吐量以及高达数百万 IOPS。

作为一项完全托管的服务,Amazon FSx 可迅速地将 Lustre 用于存储速度至关重要的工作负载。FSx for Lustre 消除了设置和管理 Lustre 文件系统的传统复杂性,使您能够在几分钟内启动高性能文件系统。它还提供了多种部署选项,因此您可以根据需求优化成本。

FSx for Lustre 符合 POSIX 标准,因此您可以使用当前基于 Linux 的应用程序,而无需进行任何更改。可以像任何文件系统在 Linux 操作系统中一样工作。它还提供先写后读一致性,并支持文件锁定。

  • ANSYS Fluent

ANSYS Fluent是国际上比较流行的商用CFD软件包,在美国的市场占有率为60%,凡是和流体、热传递和化学反应等有关的工业均可使用。它具有丰富的物理模型、先进的数值方法和强大的前后处理功能,在航空航天、汽车设计、石油天然气和涡轮机设计等方面都有着广泛的应用。

  • Slurm

ParallelCluster 3 集成了Slurm和Batch作业调度系统,Slurm是适用于CFD作业调度。Slurm(Simple Linux Utility for Resource Management,http://slurm.schedmd.com/ )是开源的、具有容错性和高度可扩展的Linux集群超级计算系统资源管理和作业调度系统。超级计算系统可利用Slurm对资源和作业进行管理,以避免相互干扰,提高运行效率。所有需运行的作业,无论是用于程序调试还是业务计算,都可以通过交互式并行 srun 、批处理式 sbatch 或分配式 salloc 等命令提交,提交后可以利用相关命令查询作业状态等。

方案部署

安装ParallelCluster

前提条件

AWS ParallelCluster需要 Python 3.6 或更高版本。如果还没有安装,需要先从https://www.python.org/downloads/ 下载兼容的版本,进行安装。

$ python3

Python 3.7.10 (default, Jun  3 2021, 00:02:01) 
[GCC 7.3.1 20180712 (Red Hat 7.3.1-13)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

安装虚拟环境virtualenv

$ python3 -m pip install --upgrade pip

Defaulting to user installation because normal site-packages is not writeable
Collecting pip
  Downloading pip-22.2.1-py3-none-any.whl (2.0 MB)
     |████████████████████████████████| 2.0 MB 44.7 MB/s 
Installing collected packages: pip
Successfully installed pip-22.2.1

$ python3 -m pip install --user --upgrade virtualenv

Collecting virtualenv
  Downloading virtualenv-20.16.2-py2.py3-none-any.whl (8.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.8/8.8 MB 89.1 MB/s eta 0:00:00
Collecting distlib<1,>=0.3.1
  Downloading distlib-0.3.5-py2.py3-none-any.whl (466 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 467.0/467.0 kB 71.2 MB/s eta 0:00:00
Collecting importlib-metadata>=0.12
  Downloading importlib_metadata-4.12.0-py3-none-any.whl (21 kB)
Collecting platformdirs<3,>=2
  Downloading platformdirs-2.5.2-py3-none-any.whl (14 kB)
Collecting filelock<4,>=3.2
  Downloading filelock-3.7.1-py3-none-any.whl (10 kB)
Collecting typing-extensions>=3.6.4
  Downloading typing_extensions-4.3.0-py3-none-any.whl (25 kB)
Collecting zipp>=0.5
  Downloading zipp-3.8.1-py3-none-any.whl (5.6 kB)
Installing collected packages: distlib, zipp, typing-extensions, platformdirs, filelock, importlib-metadata, virtualenv
Successfully installed distlib-0.3.5 filelock-3.7.1 importlib-metadata-4.12.0 platformdirs-2.5.2 typing-extensions-4.3.0 virtualenv-20.16.2 zipp-3.8.1

创建virtualenv,并命名

$ python3 -m virtualenv ~/apc-ve

created virtual environment CPython3.7.10.final.0-64 in 850ms
  creator CPython3Posix(dest=/home/ec2-user/apc-ve, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/ec2-user/.local/share/virtualenv)
    added seed packages: pip==22.2.1, setuptools==63.2.0, wheel==0.37.1
  activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator

这个时候会在当前目录下生成文件夹 apc-ve

激活新的virtualenv

$ source ~/apc-ve/bin/activate

在虚拟环境下安装AWS ParallelCluster

$ python3 -m pip install --upgrade "aws-parallelcluster"

Collecting aws-parallelcluster
  Downloading aws_parallelcluster-3.2.0-py3-none-any.whl (424 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 425.0/425.0 kB 37.8 MB/s eta 0:00:00
Collecting aws-cdk.aws-batch!=1.153.0,~=1.137
  Downloading aws_cdk.aws_batch-1.167.0-py3-none-any.whl (333 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 333.6/333.6 kB 52.3 MB/s eta 0:00:00
Collecting jmespath~=0.10
  Downloading jmespath-0.10.0-py2.py3-none-any.whl (24 kB)
Collecting aws-cdk.aws-cloudwatch!=1.153.0,~=1.137
  Downloading aws_cdk.aws_cloudwatch-1.167.0-py3-none-any.whl (379 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 379.1/379.1 kB 44.9 MB/s eta 0:00:00
Collecting aws-cdk.core!=1.153.0,~=1.137
  Downloading aws_cdk.core-1.167.0-py3-none-any.whl (1.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.4/1.4 MB 95.1 MB/s eta 0:00:00

……

Collecting certifi>=2017.4.17
  Downloading certifi-2022.6.15-py3-none-any.whl (160 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 160.2/160.2 kB 41.5 MB/s eta 0:00:00
Collecting exceptiongroup
  Downloading exceptiongroup-1.0.0rc8-py3-none-any.whl (11 kB)
Collecting six>=1.5
  Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: publication, zipp, urllib3, typing-extensions, typeguard, tabulate, six, PyYAML, pyrsistent, pyparsing, pkgutil-resolve-name, MarkupSafe, jmespath, itsdangerous, inflection, idna, exceptiongroup, charset-normalizer, certifi, attrs, werkzeug, requests, python-dateutil, packaging, jinja2, importlib-resources, importlib-metadata, cattrs, marshmallow, jsonschema, jsii, click, botocore, s3transfer, flask, constructs, clickclick, aws-cdk.region-info, aws-cdk.cloud-assembly-schema, connexion, boto3, aws-cdk.cx-api, aws-cdk.core, aws-cdk.aws-signer, aws-cdk.aws-sam, aws-cdk.aws-imagebuilder, aws-cdk.aws-iam, aws-cdk.aws-codestarnotifications, aws-cdk.aws-acmpca, aws-cdk.assets, aws-cdk.aws-kms, aws-cdk.aws-events, aws-cdk.aws-codeguruprofiler, aws-cdk.aws-cloudwatch, aws-cdk.aws-autoscaling-common, aws-cdk.aws-ssm, aws-cdk.aws-sqs, aws-cdk.aws-s3, aws-cdk.aws-ecr, aws-cdk.aws-applicationautoscaling, aws-cdk.aws-sns, aws-cdk.aws-s3-assets, aws-cdk.aws-ecr-assets, aws-cdk.aws-logs, aws-cdk.aws-codecommit, aws-cdk.aws-stepfunctions, aws-cdk.aws-kinesis, aws-cdk.aws-ec2, aws-cdk.aws-fsx, aws-cdk.aws-elasticloadbalancing, aws-cdk.aws-efs, aws-cdk.aws-lambda, aws-cdk.aws-sns-subscriptions, aws-cdk.aws-secretsmanager, aws-cdk.aws-cloudformation, aws-cdk.custom-resources, aws-cdk.aws-codebuild, aws-cdk.aws-route53, aws-cdk.aws-globalaccelerator, aws-cdk.aws-dynamodb, aws-cdk.aws-certificatemanager, aws-cdk.aws-elasticloadbalancingv2, aws-cdk.aws-cognito, aws-cdk.aws-cloudfront, aws-cdk.aws-servicediscovery, aws-cdk.aws-autoscaling, aws-cdk.aws-apigateway, aws-cdk.aws-route53-targets, aws-cdk.aws-autoscaling-hooktargets, aws-cdk.aws-ecs, aws-cdk.aws-batch, aws-parallelcluster
Successfully installed MarkupSafe-2.1.1 PyYAML-5.4.1 attrs-21.4.0 aws-cdk.assets-1.167.0 aws-cdk.aws-acmpca-1.167.0 aws-cdk.aws-apigateway-1.167.0 aws-cdk.aws-applicationautoscaling-1.167.0 aws-cdk.aws-autoscaling-1.167.0 aws-cdk.aws-autoscaling-common-1.167.0 aws-cdk.aws-autoscaling-hooktargets-1.167.0 aws-cdk.aws-batch-1.167.0 aws-cdk.aws-certificatemanager-1.167.0 aws-cdk.aws-cloudformation-1.167.0 aws-cdk.aws-cloudfront-1.167.0 aws-cdk.aws-cloudwatch-1.167.0 aws-cdk.aws-codebuild-1.167.0 aws-cdk.aws-codecommit-1.167.0 aws-cdk.aws-codeguruprofiler-1.167.0 aws-cdk.aws-codestarnotifications-1.167.0 aws-cdk.aws-cognito-1.167.0 aws-cdk.aws-dynamodb-1.167.0 aws-cdk.aws-ec2-1.167.0 aws-cdk.aws-ecr-1.167.0 aws-cdk.aws-ecr-assets-1.167.0 aws-cdk.aws-ecs-1.167.0 aws-cdk.aws-efs-1.167.0 aws-cdk.aws-elasticloadbalancing-1.167.0 aws-cdk.aws-elasticloadbalancingv2-1.167.0 aws-cdk.aws-events-1.167.0 aws-cdk.aws-fsx-1.167.0 aws-cdk.aws-globalaccelerator-1.167.0 aws-cdk.aws-iam-1.167.0 aws-cdk.aws-imagebuilder-1.167.0 aws-cdk.aws-kinesis-1.167.0 aws-cdk.aws-kms-1.167.0 aws-cdk.aws-lambda-1.167.0 aws-cdk.aws-logs-1.167.0 aws-cdk.aws-route53-1.167.0 aws-cdk.aws-route53-targets-1.167.0 aws-cdk.aws-s3-1.167.0 aws-cdk.aws-s3-assets-1.167.0 aws-cdk.aws-sam-1.167.0 aws-cdk.aws-secretsmanager-1.167.0 aws-cdk.aws-servicediscovery-1.167.0 aws-cdk.aws-signer-1.167.0 aws-cdk.aws-sns-1.167.0 aws-cdk.aws-sns-subscriptions-1.167.0 aws-cdk.aws-sqs-1.167.0 aws-cdk.aws-ssm-1.167.0 aws-cdk.aws-stepfunctions-1.167.0 aws-cdk.cloud-assembly-schema-1.167.0 aws-cdk.core-1.167.0 aws-cdk.custom-resources-1.167.0 aws-cdk.cx-api-1.167.0 aws-cdk.region-info-1.167.0 aws-parallelcluster-3.2.0 boto3-1.24.44 botocore-1.27.44 cattrs-22.1.0 certifi-2022.6.15 charset-normalizer-2.1.0 click-8.1.3 clickclick-20.10.2 connexion-2.13.1 constructs-3.4.58 exceptiongroup-1.0.0rc8 flask-2.2.0 idna-3.3 importlib-metadata-4.12.0 importlib-resources-5.9.0 inflection-0.5.1 itsdangerous-2.1.2 jinja2-3.1.2 jmespath-0.10.0 jsii-1.63.2 jsonschema-4.9.0 marshmallow-3.17.0 packaging-21.3 pkgutil-resolve-name-1.3.10 publication-0.0.3 pyparsing-3.0.9 pyrsistent-0.18.1 python-dateutil-2.8.2 requests-2.28.1 s3transfer-0.6.0 six-1.16.0 tabulate-0.8.10 typeguard-2.13.3 typing-extensions-4.3.0 urllib3-1.26.11 werkzeug-2.2.1 zipp-3.8.1

安装Node Version Manager 和Node.js

AWS Cloud Development Kit (AWS CDK)模板生成会使用到Node Version Manager和Node.js。

$ curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.38.0/install.sh | bash

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 14926  100 14926    0     0   469k      0 --:--:-- --:--:-- --:--:--  485k
=> Downloading nvm as script to '/home/ec2-user/.nvm'

=> Appending nvm source string to /home/ec2-user/.bashrc
=> Appending bash_completion source string to /home/ec2-user/.bashrc
=> Close and reopen your terminal to start using nvm or run the following to use it now:

export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"  # This loads nvm
[ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion"  # This loads nvm bash_completion


$ chmod ug+x ~/.nvm/nvm.sh

$ source ~/.nvm/nvm.sh

$ nvm install --lts

Installing latest LTS version.
Downloading and installing node v16.16.0...
Downloading https://nodejs.org/dist/v16.16.0/node-v16.16.0-linux-x64.tar.xz...
################################################################################################################################################################################## 100.0%
Computing checksum with sha256sum
Checksums matched!
Now using node v16.16.0 (npm v8.11.0)
Creating default alias: default -> lts/* (-> v16.16.0)

$ node - version

验证AWS ParallelCluster安装正确

激活新的virtualenv

$ source ~/apc-ve/bin/activate

$ pcluster version

{
  "version": "3.2.0"
}

配置AWS ParallelCluster

$ aws configure

AWS Access Key ID [None]: AKIA5OZOUQ4F2T4IMAOS
AWS Secret Access Key [None]: XXX
Default region name [None]: cn-northwest-1 
Default output format [None]: 

$ pcluster configure --config cluster-config.yaml

INFO: Configuration file cluster-config.yaml will be written.
Press CTRL-C to interrupt the procedure.


Allowed values for AWS Region ID:
1. cn-north-1
2. cn-northwest-1
AWS Region ID [cn-northwest-1]: 
Allowed values for EC2 Key Pair Name:
1. LL-K2
EC2 Key Pair Name [LL-K2]: 
Allowed values for Scheduler:
1. slurm
2. awsbatch
Scheduler [slurm]: 
Allowed values for Operating System:
1. alinux2
2. centos7
3. ubuntu1804
4. ubuntu2004
Operating System [alinux2]: alinux2
Head node instance type [t2.micro]: c5.large
Number of queues [1]: 
Name of queue 1 [queue1]: 
Number of compute resources for queue1 [1]: 
Compute instance type for compute resource 1 in queue1 [t2.micro]: c5.xlarge
Maximum instance count [10]: 
Automate VPC creation? (y/n) [n]: 
Allowed values for VPC ID:
  #  id                     name                 number_of_subnets
---  ---------------------  -----------------  -------------------
  1  vpc-003630feddf7d2417  EKS                                  2
  2  vpc-013d1e62cfa405b8e  ECS                                  2
  3  vpc-0252e11202ae27e51                                       2
  4  vpc-9b64d8f2           HPC                                  3
VPC ID [vpc-003630feddf7d2417]: vpc-9b64d8f2 
Automate Subnet creation? (y/n) [y]: 
Allowed values for Availability Zone:
1. cn-northwest-1a
2. cn-northwest-1b
3. cn-northwest-1c
Availability Zone [cn-northwest-1a]: 
Allowed values for Network Configuration:
1. Head node in a public subnet and compute fleet in a private subnet
2. Head node and compute fleet in the same public subnet
Network Configuration [Head node in a public subnet and compute fleet in a private subnet]: 
Creating CloudFormation stack...
Do not leave the terminal until the process has finished.
Stack Name: parallelclusternetworking-pubpriv-20220729030718 (id: arn:aws-cn:cloudformation:cn-northwest-1:925126395659:stack/parallelclusternetworking-pubpriv-20220729030718/b846e230-0eeb-11ed-979c-0a9d1a8a4fe6)
Status: parallelclusternetworking-pubpriv-20220729030718 - CREATE_COMPLETE      
The stack has been created.
Configuration file written to cluster-config.yaml
You can edit your configuration file or simply run 'pcluster create-cluster --cluster-configuration cluster-config.yaml --cluster-name cluster-name --region cn-northwest-1' to create your cluster.

创建CFD集群

配置文件

按照HPC/CFD运行需要修改cluster-config.yaml,增加前后处理所需的DCV远程可视化,还有流体计算所需的高性能计算文件系统Fsx Lustre。

1、NICE DCV

Dcv:
    Enabled: true

2、Fsx Lustre

SharedStorage:
  - MountDir: /fsx
    Name: ParallelFileSystem
    StorageType: FsxLustre
    FsxLustreSettings:
      StorageCapacity: 1200
      DeploymentType: PERSISTENT_1
      ImportedFileChunkSize: 1024
      ExportPath: s3://plljdi-fs1/export
      ImportPath: s3://plljdi-fs1
      PerUnitStorageThroughput: 200

当前ANSYS Fluent支持Centos 7操作系统,Amazon Linux 2不在ANSYS官方认证的系统里面。

创建集群

$ pcluster create-cluster --cluster-name cfd-cluster --cluster-configuration cfd-cluster-config.yaml

{
  "cluster": {
    "clusterName": "cfd-cluster",
    "cloudformationStackStatus": "CREATE_IN_PROGRESS",
    "cloudformationStackArn": "arn:aws-cn:cloudformation:cn-northwest-1:925126395659:stack/test-cluster/348e1c40-0eed-11ed-b3f5-0a96b85a5424",
    "region": "cn-northwest-1",
    "version": "3.1.4",
    "clusterStatus": "CREATE_IN_PROGRESS"
  }
}

查询集群信息

$ pcluster describe-cluster --cluster-name cfd-cluster

{
  "creationTime": "2022-07-29T10:31:33.608Z",
  "headNode": {
    "launchTime": "2022-07-29T10:40:14.000Z",
    "instanceId": "i-0e3c4967953c806a7",
    "publicIpAddress": "52.83.49.88",
    "instanceType": "c5.large",
    "state": "running",
    "privateIpAddress": "172.31.48.96"
  },
  "version": "3.1.4",
  "clusterConfiguration": {
    "url": "https://parallelcluster-02fb13f6f8ec970c-v1-do-not-delete.s3.cn-northwest-1.amazonaws.com.cn/parallelcluster/3.1.4/clusters/cfd-cluster-7p51jnbemquummo3/configs/cluster-config.yaml?versionId=sf6OxDbpIGYPjmrRfSSArCU5YRUHzCqo&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIA5OZOUQ4F2T4IMAOS%2F20220805%2Fcn-northwest-1%2Fs3%2Faws4_request&X-Amz-Date=20220805T021305Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=f7bd1e1e31bdcc9d3bf7b260d68f418e39a7239fdf4baf0983cb1e399cdea35e"
  },
  "tags": [
    {
      "value": "3.1.4",
      "key": "parallelcluster:version"
    }
  ],
  "cloudFormationStackStatus": "CREATE_COMPLETE",
  "clusterName": "cfd-cluster",
  "computeFleetStatus": "RUNNING",
  "cloudformationStackArn": "arn:aws-cn:cloudformation:cn-northwest-1:925126395659:stack/cfd-cluster/9dc591c0-0f29-11ed-a5cd-02357b891a1c",
  "lastUpdatedTime": "2022-07-29T10:31:33.608Z",
  "region": "cn-northwest-1",
  "clusterStatus": "CREATE_COMPLETE"
}

$ pcluster list-clusters --query 'clusters[?clusterName==`cfd-cluster`]'

[
  {
    "clusterName": "cfd-cluster",
    "cloudformationStackStatus": "CREATE_IN_PROGRESS",
    "cloudformationStackArn": "arn:aws-cn:cloudformation:cn-northwest-1:925126395659:stack/cfd-cluster/f7316cd0-1464-11ed-8f62-0aa55a928096",
    "region": "cn-northwest-1",
    "version": "3.1.4",
    "clusterStatus": "CREATE_IN_PROGRESS"
  }
]

登陆集群

$ pcluster ssh --cluster-name cfd-cluster -i ~/LL-K2.pem

检查Slurm集群状态

sinfo

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
queue1*      up   infinite     10  idle~ queue1-dy-c5xlarge-[1-10]

sinfo -l

Fri Aug 05 02:56:34 2022
PARTITION AVAIL  TIMELIMIT   JOB_SIZE ROOT OVERSUBS     GROUPS  NODES       STATE NODELIST
queue1*      up   infinite 1-infinite   no       NO        all     10       idle~ queue1-dy-c5xlarge-[1-10]

squeue

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

srun -n4 -l hostname

0: queue1-dy-c5xlarge-1
2: queue1-dy-c5xlarge-1
3: queue1-dy-c5xlarge-1
1: queue1-dy-c5xlarge-1

DCV登陆

DCV dcv-connect参数
pcluster dcv-connect [-h]
                 --cluster-name CLUSTER_NAME 
                [--debug]
                [--key-path KEY_PATH]
                [--region REGION]
                [--show-url]

$ pcluster dcv-connect --cluster-name cfd-cluster --key-path ~/LL-K2.pem --show-url

Please use the following one-time URL in your browser within 30 seconds:

https://52.83.49.88:8443?authToken=Xh92zh9pJ3bWK1Sn_2gzdUEnf4GwjYWYyMmh2bWSq4n8Pm4jUWWbqCOuBG6CdWBLFpPwZLmi7WC8PM7t44DWwL9Lr85Cu_QWTaEg-A9tywg3TjA2waXRzQhhI8-URnDWfTpC8l6Od5IkaUyiAjqybRfK2a41yYHNYSYUc3uWL_UNKYgjjoqCjvwFyBpKa0WGo88mODGpLkyWNhU6dqiWTK-BMqbSXl3SttPQOgge6YIwvSyKB28rmP0JoyC4SkvN#8DWPj4h0HXiKPbh1yZ69

打开浏览器,通过链接登陆集群管理节点。CFD前后处理阶段可以通过DCV登陆在管理节点进行,可以根据CFD前后处理资源需求,配置带有GPU的机器。

安装Fluent软件

从ANSYS官方拿到安装介质和授权文件,通过DCV登陆到管理节点,将软件安装到共享存储Fsx Lustre目录下,这样所有的计算节点都能运行Fluent相关组件。按照安装提示往下走。

安装好之后,配置License访问端口。修改 ansyslmd.ini 文件,将以下两条记录添加进去。

SERVER=1055@licenseServer
ANSYSLI_SERVERS=2325@licenseServer

运行Fluent和CFD-Post软件

运行Fluent

通过 NICE DCV 登陆,然后运行 /fsx/apps/ansys_inc/v195/fluent/bin/fluent

用户可以通过 Fluent 来进行 CFD 的仿真模拟,因为当前 Fluent GUI 还不支持 Slurm 调度,可以通过脚本集成的方式,把 Fluent 作业提交给 Slurm sbatch。

运行CFD-Post

 

在Amazon Linux 2 下,需要正确设置 LD_LIBRARY_PATH 环境变量,因为可能会存在一些lib库,运行环境需要指定的。

export LD_LIBRARY_PATH=/fsx/apps/ansys_inc/v195/commonfiles/CFX/support/fluentio/lib/linx64/:$LD_LIBRARY_PATH

运行 /fsx/apps/ansys_inc/v195/CFD-Post/bin/cfdpost,通过 CFD-Post 查看模型仿真计算结果。例如 perf_IndyCar.res 结果文件。

资源回收

当我们不在需要计算环境的情况下,需要删除 CFD 集群。

pcluster delete-cluster --region cn-northwest-1 --cluster-name cfd-cluster

通过AWS Console,删除CloudFormation networking stack

删除VPC,如果是新建的 VPC。

本篇作者

林磊

资深高性能计算行业和 SaaS 行业专家。毕业于中国科学技术大学和中科院软件研究所。加入AWS之前,曾就职于 IBM 和 ANSYS China,主持过多个超算和 EDA 和 CAE 高性能系统建设。作为产品经理参与 CAE Workspace 平台研发工作(调度系统)。研究生期间,参与过分布式密码计算项目,该项目由国家自然科学基金支持。