AWS Open Source Blog
Running TorchServe on Amazon Elastic Kubernetes Service
This article was contributed by Josiah Davis, Charles Frenzel, and Chen Wu.
TorchServe is a model serving library that makes it easy to deploy and manage PyTorch models at scale in production environments. TorchServe removes the heavy lifting of deploying and serving PyTorch models with Kubernetes. TorchServe is built and maintained by AWS in collaboration with Facebook, and is available as part of the PyTorch open source project. It delivers lightweight model serving with low latency, so you can deploy your models for high-performance inference. TorchServe supports any machine learning environment including Amazon Elastic Kubernetes Service (EKS), Amazon’s managed Kubernetes service. TorchServe provides a Management API that allows you to easily register new model versions, which are then immediately accessible for making predictions through the Inference API. For more details, see the TorchServe GitHub repository and the documentation.
In this post, we will demonstrate how to deploy TorchServe on an Amazon EKS cluster for inference. This allows you to quickly deploy a pre-trained machine learning model as a scalable, fault-tolerant web service for low latency inference.
Getting started
To get started, you will first need to install the required packages on your local machine or on an Amazon Elastic Compute Cloud (Amazon EC2) instance. To learn more, see Getting Started with Amazon EC2. If you are using your local machine, credentials must first be configured, which documentation explains how to do. If you are using an Amazon EC2 instance, an AWS Identity and Access Management (IAM) role, which we are providing in this blog, must be attached. You can learn more about IAM roles in the documentation. This post was tested using an Amazon EC2 G4 instance.
Before beginning to set up the Amazon EKS cluster, you must first install the required command-line tools. To follow the steps in this post, you will need to have Docker, AWS Command Line Interface (AWS CLI), kubectl
, eksctl
, and AWS IAM Authenticator installed to deploy TorchServe to Amazon EKS. Refer to the GitHub repo for installation instructions.
Set up environment variables
To configure the set-up process, you must first set the following global environment variables. Use variable names to pre-populate templates for Amazon EKS so that you can set up everything automatically via manifest files.
export AWS_ACCOUNT=<ACCOUNT ID>
export AWS_REGION=<AWS REGION>
export K8S_MANIFESTS_DIR=<Absolute path to store manifests>
export AWS_CLUSTER_NAME=<Name for the AWS EKS cluster>
export PT_SERVE_NAME=<Name of TorchServe in the EKS>
Set up Git repository
First, git clone
the GitHub code repository.
git clone https://github.com/aws-samples/torchserve-eks
cd torchserve-eks
The directory structure of the Git repository is illustrated below.
Creating EKS manifest files
Once you have all dependencies installed, you will produce manifest files for Amazon EKS to create the cluster and for Kubernetes to deploy your TorchServe service. These files are in YAML format, and the GitHub code repository provides example YAML templates and a bash script to automatically generate them. Run the pt_serve_util.sh
bash script to auto-generate manifest files in the specified directory under the environment variable $K8S_MANIFESTS_DIR
.
./pt_serve_util.sh
This bash script will generate the manifest files based on the environment variables entered in the previous step. The files include an IAM policy, an EKS cluster for the underlying infrastructure, and a TorchServe manifest file for Kubernetes service and deployment.
The pt_serve_util.sh
bash script accomplishes the following tasks:
- Checks that command-line tools, such as AWS CLI,
kubectl
,eksctl
, and aws-iam-authenticator are installed properly - Checks all environment variables listed above are set properly
- Generates
cluster.yaml
andpt_inference.yaml
in directory$K8S_MANIFESTS_DIR
- Updates the
eks_ami_policy.json
IAM policy file with environment variablesAWS_ACCOUNT
andAWS_REGION
Set up IAM roles and policies
You’ll need a permissive IAM user policy to create the underlying infrastructure of TorchServe’s EKS service and deployment. This policy should include eksctl
minimum IAM policies permissions and permissions for retrieving the Amazon EKS-optimized AMI ID.
For more information, refer to the Adding and Removing IAM Identity Permissions documentation.
A single-node EKS cluster on GPU
eksctl
is a command-line tool CLI that creates clusters on EKS. The CLI runs the AWS CloudFormation piece, with options passed to cluster.yaml
. In this post, we make use of eksctl
GPU support, using a single G4 instance defined by the passed global variables for cluster name and region. You can find additional configuration options available with eksctl
in the documentation.
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: ${AWS_CLUSTER_NAME}
region: ${AWS_REGION}
nodeGroups:
- name: ng-1
instanceType: g4dn.xlarge
desiredCapacity: 1
Use kubetctl to create a Service and Deployment
The TorchServe manifest file is created to be run with kubectl
with GPU inference, a CLI for controlling Kubernetes clusters. The manifest file named pt_inference.yaml
consists of definitions for the Service and Deployment. The Service section of the file opens port 8080
for the Inference API, port 8081
for the Model Management API, and specifies that we would like to deploy service type LoadBalancer
. Note that if you do not specify the service type, it will default to a ClusterIP in which the service is only accessible from within the same VPC. The Deployment portion of the file sets the replica set, specifies from which container to build the deployment image, sets the same ports, and applies resource limits.
After running pt_serve_util.sh
, the Kubernetes application names populate in the pt_inference.yaml
. Inside the YAML file, under Deployment section, image
directly links to the TorchServe image registered in the Docker Hub. Once run, the template variable for the Kubernetes Service name your_service_name
will be set to the environment variable $PT{PT_SERVE_NAM}
---
kind: Service
apiVersion: v1
metadata:
name: your_service_name
labels:
app: your_service_name
spec:
ports:
- name: preds
port: 8080
targetPort: ts
- name: mdl
port: 8081
targetPort: ts-management
type: LoadBalancer
selector:
app: your_service_name
---
kind: Deployment
apiVersion: apps/v1
metadata:
name: your_service_name
labels:
app: your_service_name
spec:
replicas: 1
selector:
matchLabels:
app: your_service_name
template:
metadata:
labels:
app: your_service_name
spec:
containers:
- name: your_service_name
image: "pytorch/torchserve:latest-gpu"
ports:
- name: ts
containerPort: 8080
- name: ts-management
containerPort: 8081
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 4
memory: 4Gi
nvidia.com/gpu: 1
requests:
cpu: "1"
memory: 1Gi
Subscribe to EKS-optimized AMI with GPU support in the AWS Marketplace
To run Amazon EKS with a GPU, you must first subscribe to Amazon EKS-optimized AMI with GPU support from the console using your AWS account. The Amazon EKS-optimized AMI with GPU support builds on top of the standard Amazon EKS-optimized AMI, and configures to serve as the base image for Amazon P2, P3, and G4 instances in Amazon EKS Clusters. Following the link and clicking subscribe will ensure that the EKS node creation step succeeds.
Creating an EKS cluster
Now that the required command-line tools and a permissive policy are set up, you can begin creating your cluster. For this post, we will use eksctl
to launch an automation script that makes use of AWS CloudFormation based on a pre-configured YAML file, cluster.yaml
, to stand up the underlying infrastructure on which TorchServe will run. The YAML file contains the naming for the cluster, region, and instance type. In this tutorial, you will only set a single node to run, but you can edit the file further based on needs as described in the eksctl
documentation. To do so, run the below command to build up an EKS cluster with a single node EC2 instance:
eksctl create cluster -f ${K8S_MANIFESTS_DIR}/cluster.yaml
Running the TorchServe container on EKS
Install NVIDIA device plugin for Kubernetes
Because the pre-trained PyTorch model will be making use of a GPU, you will need to install the Nvidia device plugin.
With kubectl
set up, enter the following command:
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-
plugin/master/deployments/static/nvidia-device-plugin.yml
You can then verify that the plugin successfully installed by running the following command:
kubectl get daemonset -n kube-system
Deploy pods to EKS cluster
Next you will set up a namespace for a Kubernetes cluster and then apply a Kubernetes manifest file to the cluster. The manifest file called pt_inference.yaml
creates the Kubernetes deployment for the pod, service, and ingresses. In particular, it points to the container image for TorchServe registered in Docker Hub and the ports 8080:8081
from which the service is queryable when pushed live.
NAMESPACE=pt-inference; kubectl create namespace ${NAMESPACE}
kubectl -n ${NAMESPACE} apply -f ${K8S_MANIFESTS_DIR}/pt_inference.yaml
After this is complete, you can confirm that the deployment is set up and in service by running the following command:
kubectl get pods -n ${NAMESPACE}
Set up logging on Amazon CloudWatch
Run the following script to enable Amazon CloudWatch log groups:
./cloud_watch_util.sh
The cloud_watch_util.sh
bash script accomplishes the following tasks:
- Uses
eksctl
to obtain the IAM role of the Amazon EKS cluster, and saves that in environment variableNODE_INSTANCE_ROLE_NAME
- Updates
cloud_watch_policy.json
IAM policy file with environment variables$AWS_ACCOUNT
andAWS_REGION
- Attaches inline policies defined in
cloud_watch_policy.json
to the EKS cluster role$NODE_INSTANCE_ROLE_NAME
- Deploys
ContainerInsights
on EKS by setting up CloudWatch Agent and FluentD DaemonSet
Once the bash script executes successfully, perform the Inference on the endpoint step below. Then, on the AWS Management Console, we navigate to CloudWatch, Logs, Log groups.
Here, we can check TorchServe logs at:
/aws/containerinsights/${AWS_CLUSTER_NAME}/application/${PT_SERVE_NAME}*
These log entries are either performance-related (e.g., CPU utilization) or access-related, such as inference requests.
Register models with TorchServe
Get the external IP for the service and store it in a variable:
EXTERNAL_IP=`kubectl get svc -n ${NAMESPACE} -o
jsonpath='{.items[0].status.loadBalancer.ingress[0].hostname}'`
Here, we register a publicly available model. For more details on the required contents of the model file, read the docs for the model-archiver utility, which is provided with TorchServe.
response=$(curl --write-out %{http_code} --silent --output /dev/null --
retry 5 -X POST
"http://${EXTERNAL_IP}:8081/models?url=https://torchserve.s3.amazonaws.
com/mar_files/resnet-18.mar&initial_workers=1&synchronous=true")
if [ ! "$response" == 200 ]
then
echo "failed to register model with torchserve"
else
echo "successfully registered model with torchserve"
fi
Note: If you do not specify a LoadBalancer
as the type, the default type will be the ClusterIP and the endpoint will only be accessible within the internal VPC. In that case, you can use port forwarding as follows:
kubectl port-forward -n ${NAMESPACE} `kubectl get pods -n ${NAMESPACE}
--selector=app=densenet-service -o
jsonpath='{.items[0].metadata.name}'` 8080:8080 8081:8081 &
Inference on the endpoint
There are multiple ways in which to invoke inference from the cluster. In this post, we will query it directly by using the curl
method as demonstrated in the TorchServe’s model serving example.
# Save the image locally
curl -O
https://raw.githubusercontent.com/pytorch/serve/master/docs/images/kitt
en_small.jpg
# Send the image for inference
curl -X POST http://${EXTERNAL_IP}:8080/predictions/resnet-18 -T
kitten_small.jpg
# List out models currently registered
curl -X GET http://${EXTERNAL_IP}:8081/models/
Running the above should return ImageNet classes in a JSON format.
[
{
"tiger_cat": 0.46933549642562866
},
{
"tabby": 0.4633878469467163
},
{
"Egyptian_cat": 0.06456148624420166
},
{
"lynx": 0.0012828214094042778
},
{
"plastic_bag": 0.00023323034110944718
}
]
Cleaning up
To remove the cluster completely and tear down the associated infrastructure, run the following command:
./delete_cluster.sh
Conclusion
This post showed how to set up TorchServe on Amazon EKS using a variety of related command-line tools, such as kubectl
and eksctl
. Although demonstrated on a single model and on a single node cluster, this type of deployment is scalable to multiple nodes and extrapolates to more advanced deployments. For example, you can stack models on top of each other on a single node with TorchServe to reduce cost and increase resource utilization. Moreover, you can subdivide the GPU into multiple containers using Bin Packing to distribute the workload and schedule them across the namespace. TorchServe makes it easier to deploy and manage these types of workloads, and much more.