地震波3D建模开源实现的云上HPC构架和落地仿真

1. 引言

高性能计算（HPC）的类别很多，按扩展方式和耦合程度，主要可以分为两类：一类称为纵向扩展或紧耦合的工作负载，如流体动力学（Fluid Dynamics）、天气建模（Weather Modelling）和油藏模拟（Reservoir Simulation）；另一类称为横向扩展或松耦合的工作负载，如财务风险建模（Financial Risk Modeling）、基因组学（Genomics）、地震处理（Seismic Processing）和药物发现（Drug Discover）。这两类工作负载都需要大量的计算能力、应用程序编排和高性能存储，紧耦合的工作负载也需要较高的网络性能。除此之外，还有可视化能力。

本文将聚焦“地震处理”（Seismic Processing）的主题，通过对地震波建模SPECFEM3D和EFISPEC3D的两种开源实现在亚马逊云科技HPC架构上的落地介绍，展示了Amazon ParallelCluster的集群管理能力，使用置放群组、Elastic Fabric Adapter（EFA）和Amazon FSx for Lustre服务优化，满足了高性能计算的要求。

2. 地震处理和地震波3D建模

随着世界对化石燃料的依赖程度越来越高，“地震处理”成了一些石油和天然气公司重点技术投资方向。软件开发人员在这个领域亦可发挥重要作用，因为软件开发人员能够创建这些公司需要的工具来理解他们收集的数据。

什么是“地震处理”（Seismic Processing）？这个词很容易和地震、地震学相混淆，以为是研究地震发生时的处理工作。其实不然，地震处理是使用来自勘探地球物理技术的反射波，测量地球地下的物理特性。地震处理的主要目的是协助检测有用的地质矿床的存在和位置。所以在定位地下有价值的矿床时，地震处理是一个很好的工具。

石油和天然气公司勘探碳氢化合物，地震成像是地球的 MRI 或 CT 扫描，通过数值仿真方法预测地震波传播模式，寻找和定位新油井，然后利用油藏模拟，分析油、气和水在有井的情况下如何在地下流动，制定提取碳氢化合物的策略。

高性能计算和数值技术的最新进展促进了地震波传播的3D仿真，分辨率和精度达到了前所未有的水平。光谱元素法（SEM）在计算流体动力学中已经使用了二十多年，但直到最近才在地震学中流行。这是一种高阶有限元近似计算，由Faccioli等人在地震学中首次引入，此后便成为当今使用最广泛的地震波3D数值仿真方法。

目前采用SEM对地震波进行3D建模的开源实现，主要有两种，SPECFEM3D和EFISPEC3D。

2.1 SPECFEM3D

源代码地址：https://github.com/geodynamics/specfem3d

SPECFEM3D是专用于地震波建模的领先软件包，主要作者D.Komatitsch曾获得2003年的Gordon Bell超级计算奖，以表彰其在多PB系统高性能计算上的突破。该代码是在Fortran95中实现的，依赖于MPI库，其中SEM的计算量很大，占应用程序运行时间的85%。该代码可模拟地震波在沉积盆地或其他区域地质模型中的传播方式，也可用于无损检测。

2.2 EFISPEC3D

源代码地址：http://efispec.free.fr/download/EFISPEC3D.tgz

EFISPEC3D由法国地质调查局（BRGM）自2009年以来与英特尔等公司和研究机构合作开发，使用连续Galerkin SEM方法求解三维运动方程实现地震波仿真，并通过MPI实现并行化。EFISPEC3D主要用Fortran95编写的，使用英特尔FORTRAN编译器进行编译。

在地震波3D建模开源实现的核心基础上，通过按需HPC平台、高性能存储以及协助工作流，可以大大简化原有的“地震处理”解决方案。因此，石油和天然气公司能够更快地做出油井规划决策，降低油井开发风险，提高人员效率，减少IT开支。

下面将展示如何利用亚马逊云科技提供的服务，助力地震波3D建模和地震处理解决方案。

3. 亚马逊云科技HPC服务助力地震波3D建模

引言中提到，高性能计算工作负载都需要大量的计算能力、应用程序编排和高性能存储，以及较高的网络性能和可视化能力。下图展示了亚马逊云科技在中国提供的HPC服务，可助力“地震处理”的解决方案。

高性能计算的要求	亚马逊云科技提供的HPC服务
计算能力	Amazon EC2 (CPU, GPU, FPGA) Amazon EC2 Spot
应用编排能力	Amazon Batch Amazon ParallelCluster NICE EnginFrame Amazon 扩展计算
高性能存储	Amazon EBS (PIOPS) Amazon FSx for Lustre Amazon EFS Amazon S3
网络性能	置放群组增强型网络 Elastic Fabric Adapter (EFA)
可视化能力	NICE DCV

利用上述HPC服务，针对地震波3D建模的要求，提出了以下的实现架构：

该架构由Amazon ParallelCluster进行集群管理，并使用置放群组、Elastic Fabric Adapter（EFA）和Amazon FSx for Lustre服务进行了优化：

同一集群置放群组中的实例可针对 TCP/IP 流量享受更高的吞吐量限制；
Amazon FSx for Lustre提供了完全托管的高性能文件系统，允许用户以高达每秒数百 GB 的吞吐量和数百万的 IOPS 的速度，低延迟访问和更改来自 Amazon S3 或本地的数据；
EFA 独特的操作系统旁路联网机制，为实例间通信提供了低延迟、低抖动的通道。这使紧密耦合的 HPC 或分布式机器学习应用程序能够扩展到数千个内核，从而加快应用程序的运行速度；
Amazon ParallelCluster使用高吞吐量并行文件系统、高吞吐量低延迟的网络接口和高带宽网络互连，简化了 HPC 集群的部署和编排管理；
借助 NICE DCV 和Amazon EC2，远程运行图形密集型应用程序，将其用户界面流式传输到客户端桌面，无需昂贵的专用工作站，满足远程可视化需求。

借助上述云上HPC架构，下面讲针对地震波3D建模的两种开源实现（SPECFEM3D和EFISPEC3D）分别进行落地实施。

3.1 Amazon Parallel Cluster的部署和启用

Amazon ParallelCluster是一款由亚马逊云科技支持的开源集群管理工具，使科学家、研究人员和 IT 管理员可以轻松地在亚马逊云科技中部署和管理高性能计算 (HPC) 集群，并通过Python Package Index（PyPI）发布，可以通过pip安装，只需要为运行应用程序所需的亚马逊云科技资源付费即可。ParallelCluster利用CloudFormation构建HPC集群环境。这里我们选择的是最新的Amazon Parallel Cluster 3.1.4。

# install parallel cluster
cat > ${HOME}/pcluster-install.sh << EOF
#!/bin/bash

if ! command -v pcluster &> /dev/null
then
  echo ">> pcluster is missing, reinstalling it"
  sudo pip3 install aws-parallelcluster>=3 -i https://pypi.tuna.tsinghua.edu.cn/simple
else
  echo ">> Pcluster \$(pcluster version |jq .version) found, nothing to install"
fi
EOF
chmod +x ${HOME}/pcluster-install.sh
echo "bash ${HOME}/pcluster-install.sh" >> ~/.bashrc
bash ${HOME}/pcluster-install.sh

# install parallel cluster dependency *node*
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.38.0/install.sh | bash
chmod ug+x ~/.nvm/nvm.sh
source ~/.nvm/nvm.sh
nvm install --lts
node --version

# install other utilities
cat > ${HOME}/prereq-install.sh << EOF
if ! command -v sponge &> /dev/null
then
  echo ">> sponge is missing, reinstalling prerequisites"
  sudo amazon-linux-extras install epel -y
  sudo yum --enablerepo epel install -y moreutils
  sudo yum install -y jq
  sudo pip3 install wildq -i https://pypi.tuna.tsinghua.edu.cn/simple
else
  echo ">> prerequisites are installed, nothing to do"
fi
EOF
chmod +x ${HOME}/prereq-install.sh
echo "bash ${HOME}/prereq-install.sh" >> ~/.bashrc
bash ${HOME}/prereq-install.sh

# configure AWS region
export AWS_REGION=cn-northwest-1
echo "export AWS_REGION=${AWS_REGION}" |tee -a ~/.bashrc

# generate a new key-pair
export SSH_KEY=pc-key-$(uuidgen --random | cut -d'-' -f1)-$(date +%F)
echo "export SSH_KEY=${SSH_KEY}" |tee -a ~/.bashrc
mkdir -p ~/.ssh
aws ec2 create-key-pair --key-name ${SSH_KEY} --query KeyMaterial --output text --region=${AWS_REGION} > ~/.ssh/${SSH_KEY}
chmod 600 ~/.ssh/${SSH_KEY}

# setup a new VPC environment
VPC_ID=$(aws ec2 describe-vpcs --filters Name=isDefault,Values=true --query "Vpcs[].VpcId" --region ${AWS_REGION} | jq -r '.[0]')
SUBNET_ID=$(aws ec2 describe-subnets --region=${AWS_REGION} --filters "Name=vpc-id,Values=$VPC_ID" --query 'Subnets[*].SubnetId' \
  | jq -r --arg i $(($RANDOM % 2)) '.[$i|tonumber]')
echo "export VPC_ID=${VPC_ID}"|tee -a ~/.bashrc
echo "export SUBNET_ID=${SUBNET_ID}"|tee -a ~/.bashrc

# initialize cluster-config.yaml
echo "Region: ${AWS_REGION}" \
    > cluster-config.yaml 

# configure operating system
cat cluster-config.yaml \
    |wildq -i yaml -M '.Image.Os = "alinux2"' \
    |sponge cluster-config.yaml

# configure Amazon Fsx for Lustre
cat cluster-config.yaml \
      |wildq -i yaml -M '.SharedStorage += [{"MountDir": "/fsx", "Name": "fsx", "StorageType": "FsxLustre"}]' \
      |wildq -i yaml -M '(.SharedStorage[]| select(.Name=="fsx").FsxLustreSettings.StorageCapacity) = 1200' \
      |wildq -i yaml -M '(.SharedStorage[]| select(.Name=="fsx").FsxLustreSettings.DeploymentType) = "SCRATCH_2"' \
      |sponge cluster-config.yaml

# configure scheduler
cat cluster-config.yaml \
      |wildq -i yaml -M '.Scheduling.Scheduler = "slurm"' \
      |wildq -i yaml -M '.Scheduling.SlurmSettings.ScaledownIdletime = 10' \
      |wildq -i yaml -M '.Scheduling.SlurmSettings.Dns.DisableManagedDns = false' \
      |sponge cluster-config.yaml

# configure networking
cat cluster-config.yaml \
      |wildq -i yaml -M '.HeadNode.Networking.SubnetId = "SUBNET_ID"' \
      |wildq -i yaml -M '.HeadNode.Ssh.KeyName = "SSH_KEY"' \
      |wildq -i yaml -M '.HeadNode.Imds.Secured = true' \
      |sed -e "s/SSH_KEY/${SSH_KEY}/" \
      |sed -e "s/SUBNET_ID/${SUBNET_ID}/" \
      |sponge cluster-config.yaml

# configure local storage
cat cluster-config.yaml \
      |wildq -i yaml -M '.HeadNode.LocalStorage.EphemeralVolume.MountDir = "/local/ephemeral"' \
      |sponge cluster-config.yaml

3.2 在Amazon EC2 (Graviton2 CPU) 上实现SPECFEM3D仿真

Amazon Graviton2 处理器由亚马逊云科技使用 64 位 Arm Neoverse 内核定制构建，旨在为在Amazon EC2中运行的工作负载提供最佳性价比。与基于 x86 的同类实例相比，可为各种工作负载提供高达40%的性价比提升。目前在中国区，Graviton2 处理器的EC2实例暂时无法支持EFA，全球区域已经可以。因此对应的Parallel Cluster配置需做一些修改。

# input cluster name
read -p "Name of the cluster you want to create: " ARM_CLUSTER_NAME
Name of the cluster you want to create: SPECFEM3D-ARM

# copy cluster configuration template
echo "export ARM_CLUSTER_NAME=${ARM_CLUSTER_NAME}"| tee -a ~/.bashrc
/bin/cp -fv cluster-config.yaml ${ARM_CLUSTER_NAME}.yaml

# configure headnode instance type and root volume size
cat ${ARM_CLUSTER_NAME}.yaml \
      |wildq -i yaml -M '.HeadNode.InstanceType = "c6g.xlarge"' \
      |wildq -i yaml -M '.HeadNode.LocalStorage.RootVolume.Size = 50' \
      |sponge ${ARM_CLUSTER_NAME}.yaml

# configure slurm queues - basic
cat ${ARM_CLUSTER_NAME}.yaml \
      |wildq -i yaml -M '.Scheduling.SlurmQueues[0].Name = "cpu"' \
      |wildq -i yaml -M '.Scheduling.SlurmQueues[0].CapacityType = "ONDEMAND"' \
      |wildq -i yaml -M '.Scheduling.SlurmQueues[0].ComputeSettings.LocalStorage.RootVolume.Size = 50' \
      |wildq -i yaml -M '.Scheduling.SlurmQueues[0].ComputeSettings.LocalStorage.EphemeralVolume.MountDir = "/local/ephemeral"' \
      |wildq -i yaml -M '.Scheduling.SlurmQueues[0].Networking.SubnetIds = ["SUBNET_ID"]' \
      |wildq -i yaml -M '.Scheduling.SlurmQueues[0].Networking.PlacementGroup.Enabled = true' \
      |sed -e "s/SUBNET_ID/${SUBNET_ID}/" \
      |sponge ${ARM_CLUSTER_NAME}.yaml

# configure slurm queues - advanced
cat ${ARM_CLUSTER_NAME}.yaml \
      |wildq -i yaml -M '.Scheduling.SlurmQueues[0].ComputeResources += [{"Name": "c6g-xl", "InstanceType": "c6g.xlarge "}]' \
      |wildq -i yaml -M '(.Scheduling.SlurmQueues[0].ComputeResources[] | select(.Name=="c6g-xl").MinCount) = 0' \
      |wildq -i yaml -M '(.Scheduling.SlurmQueues[0].ComputeResources[] | select(.Name=="c6g-xl").MaxCount) = 3' \
      |wildq -i yaml -M '(.Scheduling.SlurmQueues[0].ComputeResources[] | select(.Name=="c6g-xl").DisableSimultaneousMultithreading) = false' \
      |wildq -i yaml -M '(.Scheduling.SlurmQueues[0].ComputeResources[] | select(.Name=="c6g-xl").Efa.Enabled) = false' \
      |wildq -i yaml -M '(.Scheduling.SlurmQueues[0].ComputeResources[] | select(.Name=="c6g-xl").Efa.GdrSupport) = false' \
      |sponge ${ARM_CLUSTER_NAME}.yaml

下面进行Parallel Cluster构建。

pcluster create-cluster --cluster-name ${ARM_CLUSTER_NAME} --cluster-configuration ${ARM_CLUSTER_NAME}.yaml
watch -n 5 pcluster list-clusters --region ${AWS_REGION}

这项工作，基于SPECFEM3D Cartesian的master分支最新代码，在Amazon Graviton2 处理器的EC2实例中，对该应用程序的编译，无需修改源代码。

# connect to parallel cluster via ssh
pcluster ssh --cluster-name ${ARM_CLUSTER_NAME} -i ~/.ssh/${SSH_KEY} --region ${AWS_REGION}

# pull specfem3d code from master branch
git clone https://github.com/geodynamics/specfem3d.git

# compile specfem3d code
cd specfem3d-master/
./configure FC=gfortran CC=gcc MPIFC=mpif90 --with-mpi
make

# run sample simulation
./run_this_example.sh

如果看到以下的示例仿真结果，那么恭喜成功完成地震波3D建模的SPECFEM3D示例仿真。

3.3 在Amazon EC2 (Intel CPU) 实现EFISPEC3D仿真

目前在中国区，Intel处理器的特定EC2实例支持EFA，其中生成的示例Parallel Cluster配置如下：

HeadNode:
  Imds:
    Secured: true
  InstanceType: g4dn.8xlarge
  LocalStorage:
    EphemeralVolume:
      MountDir: /local/ephemeral
    RootVolume:
      Size: 50
  Networking:
    SubnetId: subnet-2dd03844
  Ssh:
    KeyName: pc-key-c3d67903-2022-07-12
Image:
  Os: alinux2
Region: cn-northwest-1
Scheduling:
  Scheduler: slurm
  SlurmQueues:
    - CapacityType: ONDEMAND
      ComputeResources:
        - DisableSimultaneousMultithreading: false
          Efa:
            Enabled: true
            GdrSupport: false
          InstanceType: g4dn.8xlarge
          MaxCount: 3
          MinCount: 0
          Name: g4dn-8xl
      ComputeSettings:
        LocalStorage:
          EphemeralVolume:
            MountDir: /local/ephemeral
          RootVolume:
            Size: 50
      Name: cpu
      Networking:
        PlacementGroup:
          Enabled: true
        SubnetIds:
          - subnet-2dd03844
  SlurmSettings:
    Dns:
      DisableManagedDns: false
    ScaledownIdletime: 10
SharedStorage:
  - FsxLustreSettings:
      DeploymentType: SCRATCH_2
      StorageCapacity: 1200
    MountDir: /fsx
    Name: fsx
    StorageType: FsxLustre

下面进行Parallel Cluster构建。

pcluster create-cluster --cluster-name ${x86_CLUSTER_NAME} --cluster-configuration ${x86_CLUSTER_NAME}.yaml
watch -n 5 pcluster list-clusters --region ${AWS_REGION}

这项工作，基于EFISPEC3D的代码，在Intel 处理器的特定EC2实例中，使用Intel OneAPI 编译器对该应用程序进行编译，且无需修改源代码。

# install intel oneapi compiler
sudo yum update -y
sudo -E yum autoremove intel-hpckit intel-basekit
tee > /tmp/oneAPI.repo << EOF
[oneAPI]
name=Intel(R) oneAPI repository
baseurl=https://yum.repos.intel.com/oneapi
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://yum.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
EOF
sudo mv /tmp/oneAPI.repo /etc/yum.repos.d
sudo yum install -y intel-oneapi-compiler-dpcpp-cpp-and-cpp-classic.x86_64 intel-oneapi-mpi-devel.x86_64 intel-oneapi-mpi.x86_64 intel-oneapi-runtime-compilers.x86_64 intel-oneapi-compiler-fortran.x86_64
. /opt/intel/oneapi/setvars.sh

# download EFISPEC3D code and compile
cd /fsx
wget http://efispec.free.fr/download/EFISPEC3D.tgz
tar -xvf EFISPEC3D.tgz
. /opt/intel/oneapi/setvars.sh
module load intelmpi
cd /fsx/EFISPEC3D
make efispec version=1.0

# prepare to run sample simulation
cp -r /fsx/EFISPEC3D/docs/tutorials/t01 /fsx/t01
cd /fsx/t01
cat > t01.cfg << EOF
!*******************************************
!time information
!*******************************************
duration of simulation         = 10.0
time step                      = 3.0e-3

!*******************************************
!receivers' output
!*******************************************
receiver saving increment      = 10

!*******************************************
!snapshot output
!*******************************************
snapshot saving  increment     = 50
snapshot space   increment     = 250.0
snapshot displacement          = .false.
snapshot velocity              = .true.
snapshot acceleration          = .false.

!*******************************************
!boundary absorption information
!*******************************************
boundary absorption            = .true.

!*******************************************
!medium information
!*******************************************
number of material             = 2

material                       = 1
vs                             = 2000.0
vp                             = 4000.0
rho                            = 2600.0
Qs                             =   10.0
Qp                             =  100.0
Qf                             =    1.0

material                       = 2
vs                             = 3464.0
vp                             = 6000.0
rho                            = 2700.0
Qs                             =   10.0
Qp                             =  100.0
Qf                             =    1.0

EOF

# configure simulation job
cd /fsx/t01
cat > efispec-test.slurm << EOF
#!/bin/bash
#SBATCH --output=%x_%j.out
#SBATCH --ntasks-per-node=2
#SBATCH --nodes=2
#SBATCH --job-name=efispec3d-test

. /opt/intel/oneapi/setvars.sh
module load intelmpi
export I_MPI_DEBUG=5

mpirun /fsx/EFISPEC3D/bin/efispec3d_1.0_sse.exe
EOF

# submit simulation job
sbatch efispec-test.slurm

# check the progress and result
cd /fsx/t01
tail -f t01.lst

starting time loop

 -->time of simulation
        0.0000000E+00
        0.3000000E+01
        0.6000000E+01
        0.9000000E+01

writing snapshots of peak ground motion...
done

elapsed time for initialization        =   0.7980447E+01 s
elapsed time for time loop computation =   0.2723921E+03 s
total elapsed time for computation     =   0.2803726E+03 s

average time per time step and per hexa (order 4) for the simulation
 -->cpu        0   0.1289477E-04 s
 -->cpu        1   0.1289442E-04 s
 -->cpu        2   0.1289443E-04 s
 -->cpu        3   0.1289450E-04 s

4. 总结

本文从“地震处理”（Seismic Processing）的基本概念出发，介绍了地震波3D建模的两种开源实现，并借助亚马逊云科技的HPC架构，通过Amazon ParallelCluster的集群管理，使用置放群组、Elastic Fabric Adapter（EFA）和Amazon FSx for Lustre服务优化，实现了高并发和低延迟的性能要求，未来借助NICE DCV和Amazon EC2，满足远程可视化的需求。

亚马逊AWS官方博客