Sign in
Your Saved List Become a Channel Partner Sell in AWS Marketplace Amazon Web Services Home Help

Triton Inference Server

Triton Inference Server

By: NVIDIA Latest Version: 22.04

This version has been removed and is no longer available to new customers.

Product Overview

Triton Inference Server is an open source inference serving software that lets teams deploy trained AI models from any framework on GPU or CPU infrastructure. It is designed to simplify and scale inference serving.
Triton Inference Server supports all major frameworks like TensorFlow, TensorRT, PyTorch, ONNX Runtime and even custom framework backend. It provides AI researchers and data scientists the freedom to choose the right framework.

Also available as a Docker container, it integrates with Kubernetes for orchestration and scaling and exports Prometheus metrics for monitoring. It helps IT/DevOps streamline model deployment in production.
NVIDIA Triton Inference Server can load models from local storage or AWS S3. As the models are retrained continuously with new data, developers can easily update models without restarting the inference server and without any disruption to the application.

Triton Inference Server runs multiple models from the same or different frameworks concurrently on a single GPU using CUDA Streams. In a multi-GPU server, it automatically creates an instance of each model on each GPU. All these increase GPU utilization without any extra coding from the user.

The inference server supports low latency real time inferencing, batch inferencing to maximize GPU/CPU utilization. It also has built-in support for audio streaming input for streaming inference. It also supports model ensembles - pipeline of models.

The 21.06 release of Triton was built against the wrong commit of the FIL (Forest Inference Library) backend code, causing an incompatible version of RAPIDS to be used instead of the intended RAPIDS 21.06 stable release. This issue is fixed in the new 21.06.1 container released on NGC. Although the Triton server itself and other integrated backends will work, the FIL backend will not work in the 21.06 Triton container. To use the FIL backend in Triton, please use the 21.06.1 container.





Operating System


Delivery Methods

  • Container

Pricing Information

Usage Information

Support Information

Customer Reviews