Introducing TorchServe: a PyTorch model serving framework

Posted on: Apr 21, 2020

Starting today, PyTorch customers can use TorchServe, a new model serving framework for PyTorch, to deploy trained models at scale without having to write custom code.

PyTorch is an open-source machine learning framework, originally created by Facebook, that has become popular among ML researchers and data scientists for its ease of use and “Pythonic” interface. However, deploying and managing models in production is often the most difficult part of the machine learning process requiring customers to write prediction APIs and scale them.  

TorchServe makes it easy to deploy PyTorch models at scale in production environments. It delivers lightweight serving with low latency, so you can deploy your models for high performance inference. It provides default handlers for the most common applications such as object detection and text classification, so you don’t have to write custom code to deploy your models. With powerful TorchServe features including multi-model serving, model versioning for A/B testing, metrics for monitoring, and RESTful endpoints for application integration, you can take your models from research to production quickly. TorchServe supports any machine learning environment, including Amazon SageMaker, Kubernetes, Amazon EKS, and Amazon EC2.  

TorchServe is built and maintained by AWS in collaboration with Facebook and is available as part of the PyTorch open-source project. To get started, see the TorchServe GitHub repository and the documentation