AWS HPC Blog
Building deep learning models for geoscience using MATLAB and NVIDIA GPUs on Amazon EC2 (Part 2 of 2)
This is a two-part blog post. In part 1 of the blog we discussed the workflow for developing AI models using MATLAB for seismic interpretation. Today, in part 2, we will discuss the various compute resources leveraged from AWS and NVIDIA for developing the models.
These posts were contributed by Akhilesh Mishra, Medical Devices Industry Manager at MathWorks.
Accelerating the workflow using NVIDIA GPUs on Amazon Elastic Compute Cloud (Amazon EC2)
The entire workflow involving the computation of five channel wavelet-based features and training the deep learning model had two challenges for computations – limited physical memory and time taken to train the deep learning models. The process of coming up with an algorithm which has a reasonable accuracy required multiple iterations of trying different signal preprocessing techniques combined with training various deep learning architectures. As an example, understanding which discrete wavelet (e.g. Daubechies, Fejér-Korovkin, Coiflets, Symlets etc.) decomposition was better with different architectures of deep learning layers (using LSTMs, GRUs or BiLSTMs), required us to setup multiple experiments running simultaneously. An additional layer of complexity is added by the process of hyperparameter tuning required to optimize a given deep learning network architecture. With the entire training dataset being quite large (782×590 decomposed seismic traces of size 1006x3x3x5 samples each) and multiple experiments running in parallel, it was imperative for us to utilize a powerful resource for our computation needs.
MathWorks offers various cloud services to speed up these development processes by providing on-demand access to enhanced compute resources, software tools, and reliable data storage. As of today, we can easily scale our long-running computations and simulations to CPUs, GPUs, or compute clusters in the cloud. In the section below we will describe the three different cloud offerings which were leveraged for this work, namely:
- MATLAB Cloud Center
- MATLAB AWS Reference Architecture
- MATLAB container on NVIDIA NGC Catalog
MATLAB Cloud Center
MATLAB Cloud Center enables you to create, access, and manage public cloud resources. It allows users to easily launch a Windows or Linux machine with MATLAB installed that can be accessed remotely.
To get started, you need an Amazon Web Services (AWS) and a MathWorks account. The Cloud Center has a guided process to authorize and add credentials for multiple AWS accounts and allow Cloud Center to manage cloud resources on user’s behalf. You can use an AWS Identity and Access Management (IAM) role to establish a trusted relationship between the user’s AWS account and the account belonging to MathWorks Cloud Center. After completing authorization, you can now create an Amazon EC2 instance with MATLAB installed. It is worth mentioning that in this blog we are describing how to run MATLAB directly on Amazon EC2 instances, which is the workflow utilized for training the seismic facies classification models. There is a separate workflow for connecting the desktop/laptop version of MATLAB to Amazon EC2 clusters using MATLAB Parallel Server, which is covered in details in this blog article.
Figure 2 shows the first step for creating the Amazon EC2 instance after the MATLAB Cloud Center has been associated with the AWS account. The creation and configuration of EC2 happens directly on MATLAB Cloud Center as opposed to the AWS console.
For the second step we need to configure the instance. This step gives us various options including the EC2 instance type. AWS offers various instances such as memory optimized, storage optimized, accelerated computing etc. The full details of all the AWS instances supported with MathWorks Cloud Center can be found here – https://www.mathworks.com/help/cloudcenter/ug/choose-supported-ec2-instance-machine-types.html. For our problem we selected the p3d.2xlarge instance which has one NVIDIA V100 GPU with 5,120 CUDA cores, 640 Tensor cores and 8 vCPUs, which was optimal for massive scaling of our deep learning training. Figure 3 below shows the second step for setting up the configuration. Note that there are two options for accessing the instance – opening MATLAB directly from browser using NICE DCV or launching MATLAB using the remote desktop connection. NICE DCV is a high-performance remote display protocol provided by AWS to render remote desktops sessions streaming from any cloud over various network conditions.
The next step is where the instance is configured, which takes 5-15 minutes. Once ready we can directly launch the instance by clicking on the access option under the Actions tab as show in Figure 4.
Once launched, you can log in to the DCV via “Access Your Cloud Resource” console in the Cloud Center using our credentials setup in Step 2. Once you click “Copy and Connect”, it will redirect to the DCV login console and allow the user to access MATLAB in the browser itself. The MATLAB R2022a release selected in the second step of the setup is already linked to our MathWorks account, and it automatically checks out all the toolboxes which our license is subscribed to. Figure 5 shows MATLAB launched through DCV with the GPU information of the instance which is the Tesla V-100 for p3d.2xlarge instance selected.
MATLAB Cloud Center allows us to configure the instance to automatically shut down when not in use, and we can also enable it to automatically resize based on our needs.
Once inside MATLAB, we were able to run multiple training sessions using Experiment Manager app to configure the hyperparameters of the various deep learning network architectures while iterating through the various wavelet-based training data features. We used a Bayesian optimization technique to tune the hyperparameters of various deep learning networks. MATLAB has built-in support for the NVIDIA CUDA Deep Neural Network (cuDNN) libraries for accelerated training and predictions. This enabled us to seamlessly leverage NVIDIA GPUs for deep learning training on the cloud. Figure 6 shows the experiment manager app running the multiple deep learning experiments setup using the cloud GPU resources.
MATLAB AWS Reference Architecture
A MATLAB AWS reference architecture is a collection of pre-built virtual machines with MathWorks software pre-installed in them. The reference architecture also contains the necessary AWS CloudFormation templates that define how the infrastructure will be deployed e.g. with load balancers, storage systems, etc. With the CloudFormation template the user has full control over the resources being created.
This is an alternate cloud resource which we leveraged for training deep learning networks on the seismic data. The reference architecture is available on the MathWorks GitHub page.
The MATLAB container image on reference architecture could be accessed remotely using a web browser or via VNC connection. The advantage of using a reference architecture is that these architectures can be adapted, extended, and customized to incorporate specific IT needs. Figure 7 shows the snippet of the MathWorks AWS reference architecture page with different versions of MATLAB available for its customers.
We used the Amazon EC2 p4d.24xlarge instance with reference architecture, which is more powerful than the p3 instance leveraged from the Cloud Center workflow. With eight NVIDIA A100 GPUs available on a single instance and a higher total instance memory (1152 GB instance memory, 320 GB GPU memory), it was possible to launch multiple experiments in parallel to iterate over different combinations wavelet features with diverse deep learning architectures and get outputs very quickly. As there were 8 GPUs available, we were able to launch 8 different instances of MATLAB and run simultaneous experiments for deep learning training all at once. As an example, using the MATLAB AWS reference architecture the overall speed up achieved on the EC2 P4d instance compared to running an experiment on a laptop with an NVIDIA Titan V GPU was 48X; 6X speed up for each experiment and 8 experiments running on eight NVIDIA A100 GPUs simultaneously.
MATLAB Container on NVIDIA NGC Catalog
An alternative approach to using MATLAB to accelerate end-to-end AI workloads using NVIDIA GPUs on the AWS cloud is to leverage the NVIDIA NGC Catalog, which is a catalog of GPU-optimized AI and HPC software containers that can be accessed on the AWS Marketplace. The MATLAB Deep Learning Container in the NGC Catalog provides algorithms, pretrained models, and apps to create, train, visualize, and optimize deep neural networks. Figure 8 shows the NGC catalog page for accessing MATLAB.
We can use this MATLAB deep learning container to automatically generate C and CUDA code using the GPU Coder toolbox for optimizing deployment on NVIDIA GPUs. The functions for wavelet feature extraction and deep learning inference are both supported for CUDA code generation, and the speed up achieved for the final algorithm running on a single GPU instance was 7X times than running it on just CPU.
Most oil and gas companies need to process 100s of these datasets which makes CPU usage impractical. Access to GPU-acceleration at scale on AWS simplifies and accelerates the development and deployment of AI-enabled applications like seismic facies classification we demonstrated in this blog series. By fine-tuning pretrained models with custom data through a UI-based, guided workflow, enterprises can produce highly accurate models in hours rather than months, improving total cost of ownership and eliminating the need for deep AI expertise. The combination of data, computational power, and the flexibility to scale on-demand are critical to advanced seismic imaging techniques and unlocking the full value of digitalization in the oil and gas industry.
In this blog post, we covered how you can easily develop and deploy AI-based seismic facies classification by leveraging the computational power and scale of NVIDIA GPUs on Amazon EC2. MathWorks offers various options for scaling the algorithms on the cloud-based high-performance computing resources for faster prototyping and development of AI algorithms with large 3-D seismic datasets. In part I of the blog we discussed the detailed workflow for developing AI models with MATLAB.
The entire MATLAB code for training and inference of the above work can be downloaded from the following git repository:
For more information, see the following resources:
- MATLAB Tech Talks: Understanding Wavelets
- Get Started with Wavelet Toolbox
- MATLAB for Deep Learning
- Classify Time-Series using Wavelet Analysis and Deep Learning
- MATLAB Parallel Cloud Computing on AWS
The content and opinions in this blog are those of the third-party author and AWS is not responsible for the content or accuracy of this blog.