Integrating an Inferencing Pipeline with NVIDIA DeepStream and the G4 Instance Family
Contributed by: Amr Ragab, Business Development Manager, Accelerated Computing, AWS and Kong Zhao, Solution Architect, NVIDIA Corporation
AWS continually evolves GPU offerings, striving to showcase how new technical improvements created by AWS partners improve the platform’s performance.
One result from AWS’s collaboration with NVIDIA is the recent release of the G4 instance type, a technology update from the G2 and G3. The G4 features a Turing T4 GPU with 16GB of GPU memory, offered under the Nitro hypervisor with one GPU to 4 GPUS per node. A bare metal option will be released in the coming months. It also includes up to 1.8 TB of local non-volatile memory express (NVMe) storage and up to 100 Gbps of network bandwidth.
The Turing T4 is the latest offering from NVIDIA, accelerating machine learning (ML) training and inferencing, video transcoding, and other compute-intensive workloads. With such a diverse array of optimized directives, you can now perform diverse accelerated compute workloads on a single instance family.
NVIDIA has also taken the lead in providing a robust and performant software layer in the form of SDKs and container solutions through the NVIDIA GPU Cloud (NGC) container registry. These accelerated components, combined with AWS elasticity and scale, provide a powerful combination for performant pipelines on AWS.
NVIDIA DeepStream SDK
This post focuses on one such NVIDIA SDK: DeepStream.
The DeepStream SDK is built to provide an end-to-end video processing and ML inferencing analytics solution. It uses the Video Codec API and TensorRT as key components.
DeepStream also supports an edge-cloud strategy to stream perception on the edge and other sensor metadata into AWS for further processing. An example includes wide-area consumption of multiple camera streams and metadata through the Amazon Kinesis platform.
Another classic workload that can take advantage of DeepStream is compiling the model artifacts resulting from distributed training in AWS with Amazon SageMaker Neo. Use this model on the edge or on an Amazon S3 video data lake.
If you are interested in exploring these solutions, contact your AWS account team.
Set up programmatic access to AWS to instantiate a g4dn.2xlarge instance type with Ubuntu 18.04 in a subnet that routes SSH access. If you are interested in the full stack details, the following are required to set up the instance to execute DeepStream SDK workflows.
- An Ubuntu 18.04 Instance with:
- NVIDIA Turing T4 Driver (418.67 or latest)
- CUDA 10.1
When the instance is up, connect with SSH and pull the latest DeepStream SDK Docker image from the NGC container registry.
If your instance is running a full X environment, you can pass the authentication and display to the container to view the results in real time. However, for the purposes of this post, just execute the workload on the shell.
Go to the
The following configuration files are included in the package:
- source30_1080p_resnet_dec_infer_tiled_display_int8.txt: This configuration file demonstrates 30 stream decodes with primary inferencing.
- source4_1080p_resnet_dec_infer_tiled_display_int8.txt: This configuration file demonstrates four stream decodes with primary inferencing, object tracking, and three different secondary classifiers.
- source4_1080p_resnet_dec_infer_tracker_sgie_tiled_display_int8_gpu1.txt: This configuration file demonstrates four stream decodes with primary inferencing, object tracking, and three different secondary classifiers on GPU 1.
- config_infer_primary.txt: This configuration file configures an nvinfer element as the primary detector.
- config_infer_secondary_carcolor.txt, config_infer_secondary_carmake.txt, config_infer_secondary_vehicletypes.txt: These configuration files configure an nvinfer element as the secondary classifier.
- iou_config.txt: This configuration file configures a low-level Intersection over Union (IOU) tracker.
- source1_usb_dec_infer_resnet_int8.txt: This configuration file demonstrates one USB camera as input.
The following sample models are provided with the SDK.
|Model||Model type||Number of classes||Resolution|
|Primary Detector||Resnet10||4||640 x 368|
|Secondary Car Color Classifier||Resnet18||12||224 x 224|
|Secondary Car Make Classifier||Resnet18||6||224 x 224|
|Secondary Vehicle Type Classifier||Resnet18||20||224 x 224|
Edit the configuration file
source30_1080p_dec_infer-resnet_tiled_display_int8.txt to disable
[sink0] and enable
[sink1] for file output. Save the file, then run the DeepStream sample code.
You get performance data on the inferencing workflow.
The output video file,
out.mp4, is under the current folder and can be played after download.
Extending the architecture further, you can make use of AWS Batch to execute an event-driven pipeline.
Here, the input file from S3 triggers an Amazon CloudWatch event, standing up a G4 instance with a DeepStream Docker image, sourced in Amazon ECR, to process the pipeline. The video and ML analytics results can be pushed back to S3 for further processing.
With this basic architecture in place, you can execute a video analytics and ML inferencing pipeline. Future work can also include integration with Kinesis and cataloging DeepStream results. Let us know how it goes working with DeepStream and the NVIDIA container stack on AWS.