Price-Performance Analysis of Amazon EC2 GPU Instance Types using NVIDIA’s GPU optimized seismic code

This post was contributed by Jyothi Venkatesh, HPC Solutions Architect at AWS, Karthik Raman, Sr. HPC Solutions Architect at AWS, Maxime Hugues, Principal HPC Solutions Architect at AWS, and Fatmir Hoxha, Sr. Solutions Architect at NVIDIA.

Kirchhoff Time and Depth Migration

Seismic imaging is the process of positioning the Earth’s subsurface reflectors. It transforms the seismic data recorded in time at the Earth’s surface to an image of the Earth’s subsurface. This is done by back-propagating data from time to space in a given velocity model. Kirchhoff depth migration is a well-known technique used in geophysics for seismic imaging. The fundamental difference between time and depth migration is that for every output location, the velocity model for time migration varies along depth only.

Once the travel-times are computed, both Kirchhoff time and depth migration produce an image with higher resolution and generate an image of the subsurface for a subset class of the data. This process provides valuable information about the petrophysical properties of the rocks and helps to determine how accurate the velocity model is.

Computing Kirchhoff time and depth migration is memory bandwidth sensitive. It also depends on how fast the input data is copied to the accelerator memory. The problem is pleasingly parallel, meaning that individual processes do not need to communicate with others and all computations happen at the same time.

With an increasing demand for data processing, Kirchhoff migration methods on GPUs decrease computing cost and time to results.

What is NVIDIA’s GPU-optimized seismic code?

NVIDIA developed a collection of seismic algorithms meticulously optimized for GPUs. This includes Forward Propagators for Reverse Time Migration (1, 2, 3 pass ISO, VTI, TTI), Kirchhoff Depth Migration, Kirchhoff Time Migration, Reverse Time Migration with Compression, Compression with DCT and OpenVDS Samples and more. The objective is to provide seismic algorithms with all the GPU best practices and optimizations available to maximize hardware price/performance. They are programmed in C++/CUDA and are very close to a production version. They are self-contained, read seismic input data and velocity models, and produce a depth or time imaged output. NVIDIA does not open-source these algorithms but encourages companies to use them as the foundation for derivative works or to benchmark against their own algorithm performance. Out of these applications, this blog focuses on Kirchhoff Depth and Time Migration evaluation on AWS.

Benefits of running NVIDIA’s GPU optimized seismic code on AWS

Running seismic imaging on AWS provides geophysicists the ability to balance time and cost using different GPU-based Amazon EC2 instance types.

In addition, these compute resources can be dynamically allocated when needed and deprovisioned when processing is finished. You can then switch to a different Amazon EC2 instance type, like a compute-optimized instance type for the post-processing, giving you flexibility in hardware choice and reducing cost.

Along with compute, you can also provision a high performance and managed file system with Amazon FSx for Lustre in one-click. Amazon FSx for Lustre can also be linked to an Amazon S3 bucket to seamlessly move data back and forth between Lustre and S3 while providing a familiar and performant file system. You can leverage Amazon S3 to store or archive input and output data with tiering using Amazon S3 Storage Classes for long term storage. It also enables you to share data around the globe in an easy and fast manner using the AWS network backbone.

Configuration/Solution

AWS offers multiple GPU accelerated instances for high performance computing, machine learning, and graphics intensive workloads. For example, Amazon EC2 P4d instances provide 8 NVIDIA A100 Tensor Core GPUs with 40 GB memory each, interconnected by NVLink and a 400 Gbps networking throughput per instance. Amazon EC2 P3 instances provide up to 8 NVIDIA Tensor Core V100 GPUs with up to 32GB of memory each, up to 100 Gbps of networking throughput, and supports NVLink for GPU peer-to-peer communication. Amazon EC2 G4dn instances offer a cost-effective solution and supports up to 4 NVIDIA T4 Tensor Core GPUs with 16GB of memory each, and up to 50 Gbps of networking throughput. G4dn metal instances support 8 NVIDIA T4 Tensor Core GPUs and 100 Gbps of networking throughput. For more details on the GPU based EC2 instances please take a look here under Accelerated Computing.

Kirchhoff Time and Depth Migration applications from this collection were evaluated on three Amazon EC2 instances, g4dn.12xlarge, p3dn.24xlarge and p4d.24xlarge. These three instances are equipped with local NVMe storage and the performance data is presented in this blog. The tests were run with the local storage on each of these instances. The configuration details are listed in the following table.


Instances	GPUs	vCPUs (Cores)	Memory (GiB)	Processor	Network (Gbps)	Instance Storage
g4dn.12xlarge	4 x NVIDIA T4	48 (24)	192	Intel Xeon Cascade Lake	50	1×900 GB NVMe SSD
p3dn.24xlarge	8 x NVIDIA V100	96 (48)	768	Intel Xeon Skylake	100	2 x 900 GB NVMe SSD
p4d.24xlarge	8 x NVIDIA A100	96 (48)	1152	Intel Xeon Cascade Lake	400	8 x 1 TB NVMe SSD

In addition to the tests on these local NVMe drives, each of these instances were attached with different Amazon Elastic Block Storage (EBS) volume types like gp2 and io1 for performance comparison. These tested storage options were found to have near identical performance and a delta of < 0.3%. Customers that have data storage requirements higher than the size of local NVMe drives in an EC2 instance can attach EBS volumes and expand capacity based on their requirements. Figure 1 shows an AWS architecture diagram that represents the test environment.

Figure 1. AWS Architecture used for the price-performance benchmark study.

Customers can also take advantage of AWS ParallelCluster to run their seismic workloads with the NVIDIA’s GPU optimized seismic code. AWS ParallelCluster is an AWS supported open-source cluster management tool that helps to deploy and manage high performance computing (HPC) clusters in the AWS Cloud. It automatically sets up the required compute resources and shared filesystem. AWS ParallelCluster can be used with batch schedulers, such as AWS Batch and Slurm. With the help of ParallelCluster, customers can run these applications on different instance types and at scale. In the same environment you can also configure a high-performance parallel file system like Amazon FSx for Lustre (this can be used as scratch instead of local NVMe for larger datasets). You can also make use of the multi-queue feature supported by AWS ParallelCluster to configure separate job queues for compute and visualization with different types of Amazon EC2 instances.

How to run the NVIDIA’s GPU optimized seismic code on AWS

For access to NVIDIA’s GPU optimized seismic code, an NDA must first be in place. Please reach out to Reynaldo Gomez at reynaldog@NVIDIA.com to get started.
After this step you can refer to instructions on GitHub on how to run this on AWS.

Performance and Price Comparison

Below we discuss the performance of Kirchhoff Time and Depth Migration, part of NVIDIA’s GPU optimized seismic code on AWS. This code also includes a sample data reader script as well as instructions to download and configure public datasets (for example the BP dataset). Seismic data across customers are of different data formats. However, to provide a generic platform, the Depth and Time Migration algorithms expect the input data to be in binary format. The sample data reader utility included supports conversion of the SEG-Y data format to binary format. The data reader utility needs to be modified to convert other data formats into binary format.

We have used the 2007 BP Anisotropic Velocity Benchmark dataset to evaluate Kirchhoff Depth Migration and 2004 BP Velocity Estimation Benchmark dataset to evaluate Kirchhoff Time Migration. A sample parameter file is included in the data readers package and is customizable by customers to best suit their workload. This file defines the key parameters used for running the model such as number of GPUs, number of traces etc.

Figures 2 and 3 represent performance data to migrate 1.3M traces in case of depth migration and 1.6M traces in case of time migration respectively. Local NVMe storage was used for both applications and on all instances tested as discussed in the previous section.
The p4d.24xlarge instance based on NVIDIA A100 Tensor Core GPUs shows up-to a 4.6x improvement for depth migration and up-to a 7x improvement for time migration compared to the g4dn.12xlarge instance based on NVIDIA T4 Tensor Core GPU with the same number of GPUs.

Figure 2. Kirchhoff Depth Migration Time

Figure 3. Kirchhoff Time Migration Time

Cost Comparisons

Figure 4 shows cost comparisons of running Kirchhoff Depth Migration on the Amazon EC2 instances tested. Cost estimation for migrating 130M traces for depth migration is done by extrapolating migration time measured for 1.3M traces from data in Figure 2.

Figure 5 shows cost comparisons of running Kirchhoff Time Migration on the Amazon EC2 instances tested. Cost estimation for migrating 160M traces for depth migration is done by extrapolating migration time measured for 1.6M traces from data in Figure 3.

All pricing considered here is AWS Public On-Demand Pricing of these instances in the us-east-1 Region of AWS.

From Figure 4, p4d.24xlarge with A100 GPUs shows a price performance benefit of up to 11% compared to g4dn.12xlarge and up to 34% compared to p3dn.24xlarge for depth migration.

From Figure 5, p4d.24xlarge shows a price performance benefit of up to 37% compared to g4dn.12xlarge and up to 43% compared to p3dn.24xlarge for time migration.

Figure 4. Kirchhoff Depth Migration Cost

Figure 5. Kirchhoff Time Migration Cost

Conclusion

This blog showcases the performance of NVIDIA’s GPU optimized seismic code on AWS. The data presented in this blog shows that Amazon EC2 p4d.24xlarge instances based on NVIDIA A100 Tensor Core GPUs shows performance improvement of up-to 4.6x for Kirchhoff Depth migration and up-to 7x for Kirchhoff Time migration compared to g4dn.12xlarge instances based on NVIDIA T4 Tensor Core GPUs.

There is a potential to further improve the data transfer throughput by 10x by combining GPUDirect Storage and GPU Data compression. As a part of future work, there is an ongoing effort to implement this as part of NVIDIA’s GPU optimized seismic code. Future work also includes evaluation of performance of other samples such as Reverse Time Migration (RTM) with compression and Forward Propagators for RTM on AWS GPU based instances.

Special Acknowledgement: Special thanks to Fatmir Hoxha, Sr. Solutions Architect at NVIDIA for co-authoring this post. Fatmir holds a PhD in Geoscience from University of Pau (France). In 1996, he started his career with CGG France as Researcher Geophysicist. In 2007 as VP of RD at SeismicCity, he wrote RTM and acoustic wave propagation Modelling for the GPU cards. He worked for 5 years at BP – HPC center and for 1 year he helped PGS, as principal software developer, with the GPU developments. He works now for NVIDIA as Senior Solution Architect and takes advantages of his 25+ years of experience in depth imaging, to write new codes and optimize existing ones for Energy business worldwide.

AWS HPC Blog