Posted On: Nov 12, 2021
Today, we are excited to announce NVIDIA Triton™ Inference Server on Amazon SageMaker, enabling customers who choose NVIDIA Triton as their model server to bring their containers and deploy them at scale in SageMaker.
NVIDIA Triton is an open source model server that runs trained ML models from multiple ML frameworks including PyTorch, TensorFlow, XGBoost, and ONNX. Triton is an extensible server to which developers can add new frontends, which can receive requests in specific formats, and new back-ends, which can handle additional model execution runtimes. AWS worked closely with NVIDIA to add a new Triton frontend that is compatible with SageMaker hosted containers and a new backend that is compatible with SageMaker Neo-compiled models. As a result, customers can easily build a custom container that includes their model with Triton and bring it to SageMaker. SageMaker Inference will handle the requests and automatically scale the container as usage increases, making model deployment with Triton on AWS easier.