AWS Open Source Blog
Deploy fast.ai-trained PyTorch model in TorchServe and host in Amazon SageMaker inference endpoint
Over the past few years, fast.ai has become one of the most cutting-edge, open source, deep learning frameworks and the go-to choice for many machine learning use cases based on PyTorch. It has not only democratized deep learning and made it approachable to general audiences, but fast.ai has also become a role model on how scientific software should be engineered, especially in Python programming. However, to deploy a fast.ai model to production environment often involves setting up and self-maintaining a customized inference solution (e.g., with Flask), which is time-consuming and distracting to manage and maintain issues such as security, load balancing, services orchestration, etc.
Recently, in partnership with Facebook, Amazon Web Services (Amazon AWS) developed TorchServe, a flexible and easy-to-use, open source tool for serving PyTorch models. TorchServe removes the heavy lifting of deploying and serving PyTorch models with Kubernetes, and AWS and Facebook will maintain and continue contributing to TorchServe along with the broader PyTorch community. With TorchServe, many features work out of the box, and they provide full flexibility of deploying trained PyTorch models at scale so that a trained model can go into production with a few extra lines of code.
Meanwhile, Amazon SageMaker endpoint is a fully managed service that allows users to make real-time inferences via a REST API, which saves data scientists and machine learning engineers from managing their own server instances, load balancing, fault tolerance, auto-scaling, model monitoring, and others. Amazon SageMaker endpoint also provides different types of instances suitable for different tasks, including ones with GPU(s), which support industry-level machine learning inference and graphics-intensive applications while being cost-effective.
In this article, we demonstrate how to deploy a fast.ai-trained PyTorch model in TorchServe eager mode and host it in Amazon SageMaker inference endpoint.
Getting started with a fast.ai model
In this section, we train a fast.ai model that can solve a real-world problem with performance meeting the use-case specification. As an example, we focus on a “Scene Segmentation” use case from a self-driving car.
Installation
The first step is to install the fast.ai package, which is covered in the GitHub repository, as follows:
If you’re using Anaconda then run:
Or, if you’re using Miniconda then run:
For other installation options, please refer to the fast.ai documentation.
Modeling
The following materials are based on the fast.ai course called Practical Deep Learning for Coders.
First, import fastai.vision
modules and download the sample data CAMVID_TINY
by doing:
Second, define helper functions to calculate segmentation performance and read in segmentation mask for each training image.
Note: Defining one-line Python lambda
functions to pass to fastai
is tempting; however, this will introduce issues on serialization when we want to export a fast.ai model. Therefore, we avoid using anonymous Python functions during fast.ai modeling steps.
Third, we set up the DataLoader
, which defines the modelling path, training image path, batch size, mask path, mask code, etc. In this example, we also record the image size and number of classes from the data. In a real-world problem these values may be known beforehand and will be defined when constructing the dataset.
Next, we set up a U-Net learner with a Residual Neural Network (ResNet) backbone and then trigger the fast.ai training process.
Finally, we export the fast.ai model to use for following sections of this tutorial.
For more details about the modeling process, refer to the following AWS sample: notebook/01_U-net_Modelling.ipynb.
PyTorch transfer modeling from fast.ai
In this section we build a pure PyTorch model and transfer the model weights from fast.ai. The following materials are inspired by Practical-Deep-Learning-for-Coders-2.0 by Zachary Mueller et al.
Export model weights from fast.ai
First, we restore the fast.ai learner from the export “pickle” in the last section and save its model weights with PyTorch.
Obtaining the fast.ai prediction on a sample image is also straightforward.
“2013.04 – ‘Streetview of a small neighborhood’, with residential buildings, Amsterdam city photo by Fons Heijnsbroek, The Netherlands” by Amsterdam free photos & pictures of the Dutch city is marked under CC0 1.0. To view the terms, visit https://creativecommons.org/licenses/cc0/1.0/
PyTorch model from fast.ai source code
Next, we need to define the model in pure PyTorch. In a Jupyter notebook, you can investigate the fast.ai source code by adding ??
in front of a function name. Here we look into unet_learner
and DynamicUnet
by doing:
Each of these commands will pop up a window at bottom of the browser:
After investigating, the PyTorch model can be defined as:
Also, we check the inheritance hierarchy of the fast.ai-defined class SequentialEx
by:
Here we can see SequentialEx
stems from the PyTorch torch.nn.modules
; therefore, DynamicUnetDIY
is a PyTorch Model.
Note: Parameters of arch
, n_classes
, img_size
, etc. must be consistent with the training process. If other parameters are customized during training, they must be reflected here as well. Also, in the create_body
, we set pretrained=False
because we are transferring the weights from fast.ai. Thus, there is no need to download weights from PyTorch again.
Weights transfer
Now we can initialize the PyTorch model, load the saved model weights, and transfer the weights to the PyTorch model.
If we take one sample image, transform it, and pass it to the model_torch_rep
, we will get a prediction result identical to fast.ai’s.
Here we can see the difference: The fast.ai model fastai_unet.pkl
packages all the steps including the data transformation, image dimension alignment, etc. However, fasti_unet_weights.pth
has only the pure weights, and we have to manually redefine the data transformation procedures, among others, and make sure they are consistent with the training step.
Note: In image_tfm
, we need to make sure the image size and normalization statistics are consistent with the training step. In our example here, the size is 96x128
, and normalization is by default from ImageNet as used in fast.ai. If other transformations were applied during training, they may need to be added here as well.
For more details about the PyTorch weights transferring process, please refer to this AWS sample: notebook/02_Inference_in_pytorch.ipynb.
Deployment to TorchServe
In this section, we deploy the PyTorch model to TorchServe. For installation, please refer to the TorchServe GitHub repository.
Overall, there are three main steps to use TorchServe:
- Archive the model into
*.mar
. - Start the
torchserve
. - Call the API and get the response.
To archive the model, at least three files are needed in our case:
- PyTorch model weights
fasti_unet_weights.pth
. - PyTorch model definition
model.py
, which is identical toDynamicUnetDIY
definition described in the last section. - TorchServe custom handler.
Custom handler
As shown in /deployment/handler.py
, the TorchServe handler accepts data
and context
. In our example, we define another helper Python class with four instance methods to implement: initialize
, preprocess
, inference
, and postprocess
.
initialize
Here we work out whether GPU is available, then identify the serialized model weights file path, and finally instantiate the PyTorch model and put it to evaluation mode.
preprocess
As described in the previous section, we redefine the image transform steps and apply them to the inference data.
inference
Now convert image into PyTorch Tensor, load it into GPU if available, and pass it through the model.
postprocess
Here the inference raw output is unloaded from GPU if available and encoded with Base64 to be returned back to the API trigger.
Now we’re ready to set up and launch TorchServe.
TorchServe in action
Step 1: Archive the model PyTorch.
Step 3: Call API and get the response. (Here we use httpie.) For a complete response, see sample/sample_output.txt
.
The first call would have longer latency due to model weights loading defined in initialize
, but this will be mitigated from the second call onward. For more details about TorchServe setup and usage, please refer to notebook/03_TorchServe.ipynb.
Deployment to Amazon SageMaker inference endpoint
In this section, we deploy the fast.ai-trained Scene Segmentation PyTorch model with TorchServe in Amazon SageMaker endpoint using customized Docker image, and we will be using a ml.g4dn.xlarge
instance. Refer to Amazon Elastic Compute Cloud (Amazon EC2) G4 Instances for more details.
Getting started with Amazon SageMaker endpoint
There are four steps to set up an Amazon SageMaker endpoint with TorchServe:
- Build a customized Docker image and push to Amazon Elastic Container Registry (Amazon ECR). The dockerfile is provided in root of this code repository, which helps set up fast.ai and TorchServe dependencies.
- Compress
*.mar
into*.tar.gz
and upload to Amazon Simple Storage Service (Amazon S3). - Create SageMaker model using the Docker image from step 1 and the compressed model weights from step 2.
- Create the SageMaker endpoint using the model from step 3.
The details of these steps are described in notebook/04_SageMaker.ipynb. Once ready, we can invoke the SageMaker endpoint with image in real-time.
Real-time inference with Python SDK
Read a sample image.
Invoke the SageMaker endpoint with the image and obtain the response from the API.
Decode the response and visualize the predicted Scene Segmentation mask.
What’s next?
With an inference endpoint up and running, one could leverage its full power by exploring other features that are important for a machine learning product, including AutoScaling, model monitoring with Human-in-the-loop (HITL) using Amazon Augmented AI (A2I), and incremental modeling iteration.
Clean up
Make sure that you delete the following resources to prevent any additional charges:
- Amazon SageMaker endpoint.
- Amazon SageMaker endpoint configuration.
- Amazon SageMaker model.
- Amazon ECR.
- Amazon S3 buckets.
Conclusion
This article presented an end-to-end demonstration of deploying fast.ai-trained PyTorch models on TorchServe eager model and host in Amazon SageMaker endpoint. You can use this repository as a template to deploy your own fast.ai models. This approach eliminates the self-maintaining effort to build and manage a customized inference server, which helps you to speed up the process from training a cutting-edge deep learning model to its online application in real-world at scale.
If you have questions please create an issue or submit pull request on the GitHub repository.
Reference
- fast.ai: Making neural nets uncool again
- TorchServe
- Deploying PyTorch models for inference at scale using TorchServe
- Serving PyTorch models in production with the Amazon SageMaker native TorchServe integration
- Building, training, and deploying fast.ai models with Amazon SageMaker
- Running TorchServe on Amazon Elastic Kubernetes Service