Building machine learning solutions faster with the NVIDIA NGC catalog in AWS Marketplace

Machine learning (ML) has transformed many industries as organizations adopt Artificial Intelligence (AI) to improve their operational efficiencies, increase customer satisfaction, and gain a competitive edge. However, the process of training, optimizing, and running ML models to build AI-powered applications is complex and requires expertise.

The NVIDIA NGC catalog provides graphics processing unit (GPU)-optimized AI software including frameworks, pre-trained models, and industry-specific software development keys (SDKs) that accelerate workflows. This software allows data engineers, data scientists, developers, and DevOps teams to focus on building and deploying their AI solutions faster.

The NVIDIA NGC catalog is the inaugural partner software storefront in AWS Marketplace. This milestone brings over 20 containers, including AI frameworks and SDKs supporting computer vision, natural language processing, recommendation, and medical imaging.

In this blog post, Chris, Ryan, Sarah, and I demonstrate how to discover and deploy NVIDIA’s Triton Inference Server to run an object detection service. Object detection is a computer vision technique that allows you to identify objects in images or videos. It can be used to count, locate, and label objects in a scene for complex image classification. In this walkthrough, we show how to set up, run, and test the object detection service with a sample image of a coffee mug.

The NGC catalog in AWS Marketplace includes listings such as NVIDIA Triton Inference Server and can be launched directly on various AWS services.

Prerequisites

An active AWS account
IAM roles and policies with AmazonEC2ContainerRegistryFullAccess to access AWS services. For more information, see Adding and removing IAM identity permissions in the AWS Identity and Access Management User Guide.
A launched Amazon EC2 instance that uses the NVIDIA Deep Learning AMI. The instance must be powered by NVIDIA GPUs, either the P4d, P3 or G4dn instance type, and use the NVIDIA Deep Learning AMI. To launch the instance, follow these steps:
- Follow the instructions in the NGC on AWS Virtual Machines documentation.
- In a terminal window, connect to the EC2 instance via SSH.
- Install the AWS Command Line Interface (AWS CLI) with the following commands:

sudo apt-get install python-pip

sudo pip install awscli

aws configure

- Keep your SSH terminal window open for Step 1.

Solution overview

Step 1: Pull the Triton Inference Server container from the NVIDIA NGC catalog in AWS Marketplace.

To pull the Triton Inference Server container, do the following:

A. Subscribe to the software

Navigate to the NVIDIA NGC catalog in AWS Marketplace.
Choose Triton Inference Server.
On the product page upper right, choose Continue to Subscribe.
On the configuration page, for Delivery Method, choose Triton Inference Server and, for Software Version, choose the most recent version.
Choose Continue to Launch. This takes you to the launch screen.

B. Pull the container into your launched EC2 instance

On the launch screen from step 1.A.5, choose View Container Image Details. This opens a popup with pull command instructions. In this popup, step 1 is a command to authenticate your Docker client to your Amazon Elastic Container Registry (Amazon ECR). Step 2 lists the Docker container URI. You use both of these in the next steps. Refer to the following screenshot.
If you closed your EC2 terminal window from the Prerequisites section, open a new one.
Authenticate your Docker client to the Amazon ECR registry. To do this, in the EC2 instance terminal window you opened in step 1.B.2, copy and run the first command from the View Container Image Details popup from step 1.B.1.
From the popup in step 1.B.1, copy the Docker container URI.
To pull the container into your EC2 instance terminal window, run the following command in the terminal window, replacing [container URI from Step 1.B.2] with your pasted Docker container URI from step 1.B.4:

docker pull [container URI from Step 1.B.2]

If successful, it returns a message similar to this one:

20.11-py3: Pulling from nvidia/containers/nvidia/tritonserver

Digest: sha256:2e7e43190b375031ce804228fd1a1544aa8c48a1db8ffb82b21fa33051cdfdbe

Status: Downloaded newer image for 970825985650.dkr.ecr.us-east-1.amazonaws.com/nvidia/containers/nvidia/tritonserver:20.11-py3

Step 2: Download a pretrained model and create an object detection service

To create the object detection inference service, you need a pretrained model for object detection. I downloaded the Dense Convolutional Network (DenseNet) model, based on an ONNX Runtime backend. ONNX Runtime has the capability to train existing models through its optimized backend.

To set up your object detection service, do the following:

A. Create a repository structure compatible with the Triton container you subscribed to in Step 1. To do this, in the EC2 instance’s terminal window, run the following command:

mkdir -p model_repository/densenet_onnx/1

B. To download the DenseNet model, run the following command:

wget -O model_repository/densenet_onnx/1/model.onnx https://contentmamluswest001.blob.core.windows.net/content/14b2744cf8d6418c87ffddc3f3127242/9502630827244d60a1214f250e3bbca7/08aed7327d694b8dbaee2c97b8d0fcba/densenet121-1.2.onnx

If successful, it returns a message similar to this one:

2020-12-18 20:25:46 (14.1 MB/s) - ‘model_repository/densenet_onnx/1/model.onnx’ saved [32719461/32719461]

C. To download the associated Triton configuration files for this particular model, run the following command:

wget -O model_repository/densenet_onnx/config.pbtxt https://raw.githubusercontent.com/triton-inference-server/server/master/docs/examples/model_repository/densenet_onnx/config.pbtxt

If successful, it returns a message similar to this one:

2020-12-18 20:30:49 (35.5 MB/s) - ‘model_repository/densenet_onnx/config.pbtxt’ saved [387/387]

D. To download a list of over 1,000 labels that the DenseNet model is trained to classify objects with, run the following command:

wget -O model_repository/densenet_onnx/densenet_labels.txt https://raw.githubusercontent.com/triton-inference-server/server/master/docs/examples/model_repository/densenet_onnx/densenet_labels.txt

If successful, it returns a message similar to this one:

2020-12-18 20:33:10 (78.0 MB/s) - ‘model_repository/densenet_onnx/densenet_labels.txt’ saved [10311/10311]

E. To deploy the DenseNet model to serve object detection request using the Triton Inference Server container, run the following command:

docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/model_repository:/models <container URI from Step 1.B.2 above> tritonserver --model-repository=/models

If successful, it returns a message similar to this one:

I1218 20:14:59.358545 1 grpc_server.cc:3979] Started GRPCInferenceService at 0.0.0.0:8001

I1218 20:14:59.361457 1 http_server.cc:2717] Started HTTPService at 0.0.0.0:8000

I1218 20:14:59.403923 1 http_server.cc:2736] Started Metrics Service at 0.0.0.0:8002

Congratulations! Your object detection service is set up. Triton now displays the three API endpoints where it can receive inference requests using HTTP/REST or GRPC protocols or C API.

Keep the Triton container running in this terminal window. While you will take no further action in this window, do not close it. In the next step, you will send real-time object detection inference requests to the Triton Inference service running in this terminal window.

Step 3: See your model in action

To send inference requests to the object detection model, you need the Triton Inference Server – Client SDK container. Triton Inference Server provides a data center inference solution optimized for NVIDIA GPUs. It maximizes inference utilization and performance on GPUs via an HTTP or gRPC endpoint, allowing remote clients to request inference for any model that is being managed by the server, as well as providing real-time metrics on latency and requests. The Triton Inference Server – Client SDK can be used to build end-user client applications that can make inference requests to the server.

To get this container, follow these steps.

A. Launch Triton Inference Server

Navigate to the NVIDIA NGC catalog in AWS Marketplace.
At the center of the page in the Search bar, enter Triton Inference Server. Choose the Triton Inference Server card and select Triton Inference Server.
On the product page upper right, choose Continue to Subscribe, and then Continue to Configuration.
On the configuration page, for Delivery Method, choose Triton Inference Server – Client SDK variant of the product and for Software Version, choose the most recent version.
Choose Continue to Launch.
On the launch screen, in the middle of the page, choose View Container Image Details. Copy the container URI from Step 2 in the pop-up.

B. Pull the Triton Inference Server – Client SDK into your launched EC2 instance

With the terminal window from step 2 still open, open an additional terminal window. All your actions in this step take place in this new window, but the step 2 terminal window must remain open to keep the Triton server application running.
Connect to the same EC2 instance created as per the pre-requisites section via SSH in this new terminal window.
In your new terminal window, pull the Triton Inference Server – Client SDK by running the following command.

docker pull <container URI from Step 3.A.6>

If successful, it returns a message similar to this one:

20.11-py3-clientsdk: Pulling from nvidia/containers/nvidia/tritonserver

Digest: sha256:2e7e43190b375031ce804228fd1a1544aa8c48a1db8ffb82b21fa33051cdfdbe

Status: Downloaded newer image for 709825985650.dkr.ecr.us-east-1.amazonaws.com/nvidia/containers/nvidia/tritonserver:20.11-py3-clientsdk

In the same terminal window, run the Triton Inference Server – Client SDK container by running the following command:

docker run -it --rm --net=host <container URI from Step 3.A.6>

If successful, you see the # prompt.

To test your object detection service, send an inference request to the object detection service by running the following command:

./install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg

The Triton Inference Server – Client SDK container has the below example image of a coffee mug preloaded to test inference. When you successfully run the command in step 3.B.5, the object detection service accurately tags the image with COFFEE MUG, CUP, and COFFEEPOT labels. Refer to the following image of a black coffee mug with the NVIDIA logo on it.

If successful, , it returns a message similar to this one:

Request 0, batch size 1

Image '/workspace/images/mug.jpg':

15.346228 (504) = COFFEE MUG

13.224319 (968) = CUP

10.422960 (505) = COFFEEPOT

You have now confirmed that your newly installed object detection service is working properly.

Cleanup

After completing this walkthrough, to avoid additional usage charges, stop any EC2 Instances you have started.

Conclusion

In this walkthrough, we showed how to set up, run, and test an object detection service with a sample image. We demonstrated how to build and deploy an AI-powered solution with the NVIDIA NGC catalog in AWS Marketplace. Deploying an object detection service with Triton Inference Server is just one example, and you can follow similar steps to discover, access, and deploy other NVIDIA AI software.

Explore performance-optimized software from the NVIDIA NGC catalog in AWS Marketplace today.

About the authors

Abhilash Somasamudramath is a Product Manager at NVIDIA focused on AI, ML, Deep Learning and High Performance Computing (HPC) software.

Chris Popp is a Senior Partner Solutions Architect with AWS Marketplace. In his role, he works with customers to understand their goals and challenges and gives prescriptive guidance to achieve their objective with AWS services.

Ryan Vanderwerf is a Partner Solutions Architect focusing on Internet of Things (IoT), AI, ML and Edge Computing.

Sarah Jack is a Category Manager for HPC at AWS. She enjoys working with vendors in HPC, AI, ML, and storage space to build new and innovative solutions on AWS.

AWS Marketplace

Building machine learning solutions faster with the NVIDIA NGC catalog in AWS Marketplace

Prerequisites

Solution overview

Step 1: Pull the Triton Inference Server container from the NVIDIA NGC catalog in AWS Marketplace.

Step 2: Download a pretrained model and create an object detection service

Step 3: See your model in action

Cleanup

Conclusion

Resources

Follow