AWS Machine Learning Blog
Scale YOLOv5 inference with Amazon SageMaker endpoints and AWS Lambda
After data scientists carefully come up with a satisfying machine learning (ML) model, the model must be deployed to be easily accessible for inference by other members of the organization. However, deploying models at scale with optimized cost and compute efficiencies can be a daunting and cumbersome task. Amazon SageMaker endpoints provide an easily scalable and cost-optimized solution for model deployment. The YOLOv5 model, distributed under the GPLv3 license, is a popular object detection model known for its runtime efficiency as well as detection accuracy. In this post, we demonstrate how to host a pre-trained YOLOv5 model on SageMaker endpoints and use AWS Lambda functions to invoke these endpoints.
Solution overview
The following image outlines the AWS services used to host the YOLOv5 model using a SageMaker endpoint and invoke the endpoint using Lambda. The SageMaker notebook accesses a YOLOv5 PyTorch model from an Amazon Simple Storage Service (Amazon S3) bucket, converts it to YOLOv5 TensorFlow SavedModel
format, and stores it back to the S3 bucket. This model is then used when hosting the endpoint. When an image is uploaded to Amazon S3, it acts as a trigger to run the Lambda function. The function utilizes OpenCV Lambda layers to read the uploaded image and run inference using the endpoint. After the inference is run, you can use the results obtained from it as needed.
In this post, we walk through the process of utilizing a YOLOv5 default model in PyTorch and converting it to a TensorFlow SavedModel
. This model is hosted using a SageMaker endpoint. Then we create and publish a Lambda function that invokes the endpoint to run inference. Pre-trained YOLOv5 models are available on GitHub. For the purpose of this post, we use the yolov5l model.
Prerequisites
As a prerequisite, we need to set up the following AWS Identity and Access Management (IAM) roles with appropriate access policies for SageMaker, Lambda, and Amazon S3:
- SageMaker IAM role – This requires
AmazonS3FullAccess
policies attached for storing and accessing the model in the S3 bucket - Lambda IAM role – This role needs multiple policies:
- To access images stored in Amazon S3, we require the following IAM policies:
s3:GetObject
s3:ListBucket
- To run the SageMaker endpoint, we need access to the following IAM policies:
sagemaker:ListEndpoints
sagemaker:DescribeEndpoint
sagemaker:InvokeEndpoint
sagemaker:InvokeEndpointAsync
- To access images stored in Amazon S3, we require the following IAM policies:
You also need the following resources and services:
- The AWS Command Line Interface (AWS CLI), which we use to create and configure Lambda.
- A SageMaker notebook instance. These come with Docker pre-installed, and we use this to create the Lambda layers. To set up the notebook instance, complete the following steps:
- On the SageMaker console, create a notebook instance and provide the notebook name, instance type (for this post, we use ml.c5.large), IAM role, and other parameters.
- Clone the public repository and add the YOLOv5 repository provided by Ultralytics.
Host YOLOv5 on a SageMaker endpoint
Before we can host the pre-trained YOLOv5 model on SageMaker, we must export and package it in the correct directory structure inside model.tar.gz
. For this post, we demonstrate how to host YOLOv5 in the saved_model
format. The YOLOv5 repo provides an export.py
file that can export the model in many different ways. After you clone the YOLOv5 and enter the YOLOv5 directory from command line, you can export the model with the following command:
This command creates a new directory called yolov5l_saved_model
inside the yolov5
directory. Inside the yolov5l_saved_model
directory, we should see the following items:
To create a model.tar.gz
file, move the contents of yolov5l_saved_model
to export/Servo/1
. From the command line, we can compress the export
directory by running the following command and upload the model to the S3 bucket:
Then, we can deploy a SageMaker endpoint from a SageMaker notebook by running the following code:
The preceding script takes approximately 2–3 minutes to fully deploy the model to the SageMaker endpoint. You can monitor the status of the deployment on the SageMaker console. After the model is hosted successfully, the model is ready for inference.
Test the SageMaker endpoint
After the model is successfully hosted on a SageMaker endpoint, we can test it out, which we do using a blank image. The testing code is as follows:
Set up Lambda with layers and triggers
We use OpenCV to demonstrate the model by passing an image and getting the inference results. Lambda doesn’t come with external libraries like OpenCV pre-built, therefore we need to build it before we can invoke the Lambda code. Furthermore, we want to make sure that we don’t build external libraries like OpenCV every time Lambda is being invoked. For this purpose, Lambda provides a functionality to create Lambda layers. We can define what goes in these layers, and they can be consumed by the Lambda code every time it’s invoked. We also demonstrate how to create the Lambda layers for OpenCV. For this post, we use an Amazon Elastic Compute Cloud (Amazon EC2) instance to create the layers.
After we have the layers in place, we create the app.py
script, which is the Lambda code that uses the layers, runs the inference, and gets results. The following diagram illustrates this workflow.
Create Lambda layers for OpenCV using Docker
Use Dockerfile as follows to create the Docker image using Python 3.7:
Build and run Docker and store the output ZIP file in the current directory under layers
:
Now we can upload the OpenCV layer artifacts to Amazon S3 and create the Lambda layer:
After the preceding commands run successfully, you have an OpenCV layer in Lambda, which you can review on the Lambda console.
Create the Lambda function
We utilize the app.py
script to create the Lambda function and use OpenCV. In the following code, change the values for BUCKET_NAME
and IMAGE_LOCATION
to the location for accessing the image:
Deploy the Lambda function with the following code:
Attach the OpenCV layer to the Lambda function
After we have the Lambda function and layer in place, we can connect the layer to the function as follows:
We can review the layer settings via the Lambda console.
Trigger Lambda when an image is uploaded to Amazon S3
We use an image upload to Amazon S3 as a trigger to run the Lambda function. For instructions, refer to Tutorial: Using an Amazon S3 trigger to invoke a Lambda function.
You should see the following function details on the Lambda console.
Run inference
After you set up Lambda and the SageMaker endpoint, you can test the output by invoking the Lambda function. We use an image upload to Amazon S3 as a trigger to invoke Lambda, which in turn invokes the endpoint for inference. As an example, we upload the following image to the Amazon S3 location <S3 PATH TO IMAGE>/test_image.png
configured in the previous section.
After the image is uploaded, the Lambda function is triggered to download and read the image data and send it to the SageMaker endpoint for inference. The output result from the SageMaker endpoint is obtained and returned by the function in JSON format, which we can use in different ways. The following image shows example output overlayed on the image.
Clean up
Depending on the instance type, SageMaker notebooks can require significant compute usage and cost. To avoid unnecessary costs, we advise stopping the notebook instance when it’s not in use. Additionally, Lambda functions incur charges only when they’re invoked. Therefore, no cleanup is necessary for that. However, SageMaker endpoints incur charges when they are ‘In Service’ and should be deleted to avoid additional costs.
Conclusion
In this post, we demonstrated how to host a pre-trained YOLOv5 model on a SageMaker endpoint and use Lambda to invoke inference and process the output. The detailed code is available on GitHub.
To learn more about SageMaker endpoints, check out Create your endpoint and deploy your model and Build, test, and deploy your Amazon SageMaker inference models to AWS Lambda, which highlights how you can automate the process of deploying YOLOv5 models.
About the authors
Kevin Song is an IoT Edge Data Scientist at AWS Professional Services. Kevin holds a PhD in Biophysics from The University of Chicago. He has over 4 years of industry experience in Computer Vision and Machine Learning. He is involved in helping customers in the sports and life sciences industry deploy Machine Learning models.
Romil Shah is an IoT Edge Data Scientist at AWS Professional Services. Romil has over 6 years of industry experience in Computer Vision, Machine Learning and IoT edge devices. He is involved in helping customers optimize and deploy their Machine Learning models for edge devices for industrial setup.