Sign in

Sign in

or

Create a new account

Categories

What is AWS Marketplace Procurement Governance and Entitlement Cost Management How to Sell

Infrastructure Software Backup & Recovery Data Analytics High Performance Computing Migration Network Infrastructure Operating Systems Security Storage

DevOps Agile Lifecycle Management Application Development Application Servers Application Stacks Continuous Integration and Continuous Delivery Infrastructure as Code Issue & Bug Tracking Monitoring Log Analysis Source Control Testing

Business Applications Blockchain Collaboration & Productivity Contact Center Content Management CRM eCommerce eLearning Human Resources IT Business Management Project Management

Machine Learning Human Review Services ML Solutions Data Labeling Services Computer Vision Natural Language Processing Speech Recognition Text Image Video Audio Structured Intelligent Automation Generative AI

Data Products Financial Services Data Healthcare & Life Sciences Data Media & Entertainment Data Telecommunications Data Gaming Data Automotive Data Manufacturing Data Resources Data Retail, Location & Marketing Data Public Sector Data Environmental Data

IoT Analytics Applications Device Connectivity Device Management Device Security Industrial IoT Smart Home & City

Professional Services Assessments Implementation Managed Services Premium Support Training

Industries Education & Research Financial Services Healthcare & Life Sciences Media & Entertainment Industrial Energy Automotive

Cloud Operations Cloud Governance Cloud Financial Management

Delivery Methods Amazon Machine Image Amazon SageMaker AWS Data Exchange CloudFormation Stack Container Image Helm Chart Add-on for Amazon EKS Private Image Build Professional Services SaaS

Solutions AWS Well-Architected Business Applications Data & Analytics Data Products DevOps Infrastructure Software Internet of Things Machine Learning Migration Security

Industry ??industrySolutions.dropdown.advertising_and_marketing_en??Energy ??industrySolutions.dropdown.engineering_construction_and_real_estate_en??Financial Services Healthcare & Life Industrial ??industrySolutions.dropdown.life_sciences_en??Media & Entertainment Nonprofit ??industrySolutions.dropdown.power_and_utility_en??Public Health Public Sector ??industrySolutions.dropdown.retail_en????industrySolutions.dropdown.sustainability_en??Telecommunications

AWS Service Integrations AWS Control Tower AWS PrivateLink Pre-trained Amazon SageMaker Models

AWS IQ Websites & Mobile Applications Databases & Analytics Networking & Security Machine Learning Productivity & Collaboration Cost Optimization Other

Resources Analyst Reports Blogs Customer Success Stories Events Implementation Guides Videos Webinars Whitepapers

Your Saved List

Become a Channel Partner Sell in AWS Marketplace Amazon Web Services Home Help

Triton Inference Server

Triton Inference Server

By: NVIDIA Latest Version: 22.04

Linux/Unix

Linux/Unix

This version has been removed and is no longer available to new customers.

Product Overview

Triton Inference Server is an open source inference serving software that lets teams deploy trained AI models from any framework on GPU or CPU infrastructure. It is designed to simplify and scale inference serving.
Triton Inference Server supports all major frameworks like TensorFlow, TensorRT, PyTorch, ONNX Runtime and even custom framework backend. It provides AI researchers and data scientists the freedom to choose the right framework.

Also available as a Docker container, it integrates with Kubernetes for orchestration and scaling and exports Prometheus metrics for monitoring. It helps IT/DevOps streamline model deployment in production.
NVIDIA Triton Inference Server can load models from local storage or AWS S3. As the models are retrained continuously with new data, developers can easily update models without restarting the inference server and without any disruption to the application.

Triton Inference Server runs multiple models from the same or different frameworks concurrently on a single GPU using CUDA Streams. In a multi-GPU server, it automatically creates an instance of each model on each GPU. All these increase GPU utilization without any extra coding from the user.

The inference server supports low latency real time inferencing, batch inferencing to maximize GPU/CPU utilization. It also has built-in support for audio streaming input for streaming inference. It also supports model ensembles - pipeline of models.

The 21.06 release of Triton was built against the wrong commit of the FIL (Forest Inference Library) backend code, causing an incompatible version of RAPIDS to be used instead of the intended RAPIDS 21.06 stable release. This issue is fixed in the new 21.06.1 container released on NGC. Although the Triton server itself and other integrated backends will work, the FIL backend will not work in the 21.06 Triton container. To use the FIL backend in Triton, please use the 21.06.1 container.

Version

22.04

By

NVIDIA

Categories

Operating System

Linux

Delivery Methods

Container

Highlights

Support for Multiple Frameworks - Triton Inference Server supports all major frameworks like TensorFlow, TensorRT, PyTorch, ONNX Runtime and even custom framework backend.
High Performance Inference - Triton Inference Server runs models concurrently maximizing utilization, supports GPU & CPU based inferencing, offers advanced features like model ensemble and streaming inferencing. It helps developers bring models to production rapidly.
Designed for IT and DevOps - Also, available as a Docker container, Triton Inference Server integrates with Kubernetes for orchestration and scaling. It is part of Kubeflow and exports Prometheus metrics for monitoring. It helps IT/DevOps streamline model deployment in production.

Pricing Information

Usage Information

Support Information

Customer Reviews