AWS Compute Blog
Deploying machine learning models with serverless templates
This post written by Sean Wilkinson, Machine Learning Specialist Solutions Architect, and Newton Jain, Senior Product Manager for Lambda
After designing and training machine learning models, data scientists deploy the models so applications can use them. AWS Lambda is a compute service that lets you run code without provisioning or managing servers. Lambda’s pay-per-request billing, automatic scaling, and ease of use make it a popular deployment choice for data science teams.
With minimal code, data scientists can turn a model into a cost effective and scalable API endpoint backed by Lambda. Lambda supports container images, Advanced Vector Extensions 2 (AVX2), and functions with up to 10 GB of memory. Using these capabilities, data science teams can deploy larger, more powerful models with improved performance.
To deploy Lambda-based applications, serverless developers can use the AWS Serverless Application Model framework (AWS SAM). AWS SAM creates and manages serverless applications based on templates. It supports local testing, aids best practices, and integrates with popular developer tools. It allows data scientists to define serverless applications, security permissions, and advanced configuration capabilities using YAML.
AWS SAM contains pre-built templates that allow developers to get started quickly. This blog shows how to use machine learning templates to deploy a Scikit-Learn based model that classifies images of handwritten digits from zero to nine. Once deployed to Lambda, you can access the model via a REST API.
This walkthrough creates resources that incur costs in an AWS account. To minimize cost, follow the Cleaning up section to remove resources after completing the walkthrough.
Overview
The AWS SAM machine learning templates are available for the Scikit-Learn, PyTorch, TensorFlow, and XGBoost frameworks. Each template deploys a Lambda function to host the model behind an Amazon API Gateway, which serves as the front end and handles authentication. The following diagram shows the architecture of the solution:
Creating the containerized Lambda function
This section uses AWS SAM to build, test, and deploy a Docker image containing a pre-trained digit classifier model on Lambda:
- Update or install AWS SAM. AWS SAM CLI v1.24.1 or later is required to use the machine learning templates.
- In a terminal, create a new serverless application in AWS SAM using the command:
sam init
- Follow the on-screen prompts, select AWS Quick Start Templates as the template source.
- Choose Image as the package type.
- Select amazon/python3.8-base as the base image.
- When prompted, enter an application name. AWS SAM uses this to group and label resources it creates.
- Select the desired ML framework from the template list. The walkthrough uses the Scikit-Learn template.
- AWS SAM creates a directory with the name of your application. Change to the new directory and run the AWS SAM build command:
sam build
Files generated by AWS SAM
After selecting the template, AWS SAM generates the following files in the application directory:
- Dockerfile: The application uses the Lambda-provided Python 3.8 base image. It installs the relevant dependencies and defines the CMD variable for the Lambda execution environment to initialize the handler.
FROM public.ecr.aws/lambda/python:3.8 COPY app.py requirements.txt ./ COPY digit_classifier.joblib /opt/ml/model/1 RUN python3.8 -m pip install -r requirements.txt -t . CMD ["app.lambda_handler"]
- app.py: This Python code runs after the Lambda handler is invoked and generates predictions from the Scikit-Learn model. The model is reused across multiple Lambda invocations by loading it outside the lambda_handler.
import joblib import base64 import numpy as np import json from io import BytesIO from PIL import Image from scipy.ndimage import interpolation model_file = '/opt/ml/model' model = joblib.load(model_file) # Functions to pre-process images (we used same preprocessing when training) def moments(image): c0, c1 = np.mgrid[:image.shape[0], :image.shape[1]] img_sum = np.sum(image) m0 = np.sum(c0 * image) / img_sum m1 = np.sum(c1 * image) / img_sum m00 = np.sum((c0-m0)**2 * image) / img_sum m11 = np.sum((c1-m1)**2 * image) / img_sum m01 = np.sum((c0-m0) * (c1-m1) * image) / img_sum mu_vector = np.array([m0,m1]) covariance_matrix = np.array([[m00, m01],[m01, m11]]) return mu_vector, covariance_matrix def deskew(image): c, v = moments(image) alpha = v[0,1] / v[0,0] affine = np.array([[1,0], [alpha,1]]) ocenter = np.array(image.shape) / 2.0 offset = c - np.dot(affine, ocenter) return interpolation.affine_transform(image, affine, offset=offset) def get_np_image(image_bytes): image = Image.open(BytesIO(base64.b64decode(image_bytes))).convert(mode='L') image = image.resize((28, 28)) return np.array(image) # Lambda handler code def lambda_handler(event, context): image_bytes = event['body'].encode('utf-8') x = deskew(get_np_image(image_bytes)) prediction = int(model.predict(x.reshape(1, -1))[0]) return { 'statusCode': 200, 'body': json.dumps( { "predicted_label": prediction, } ) }
After completing these steps, this is the directory structure:
Testing the AWS SAM templates
For container image-based Lambda functions, sam build creates and updates a container image in the local Docker repo. It copies the template to the output directory and updates the location for the newly built image.
You can see the following top-level tree under the .aws-sam directory:
After building the Docker image, use AWS SAM’s local test functionality to test the endpoint. There are two ways to test the application locally:
- Local invoke –event uses the mock data in event.json to invoke the function and generate a prediction. An image of a handwritten digit is encoded as a base64 string in the body attribute in the event.json file. Test using mock event.json:
sam local invoke InferenceFunction --event events/event.json
- The start-api command starts up a local endpoint that emulates a REST API endpoint. It downloads an execution container that runs API Gateway and the Lambda function locally. Invoke using the API Gateway emulator:
sam local start-api
SAM local start-api monitorTo test the local endpoint use a REST client, like Postman, to send a POST request to the /classify_digit endpoint.
While testing locally, use images smaller than 100 KB. If the file is larger, the request fails with status code: 502 and the error “argument list too long”. After deploying to Lambda, you can use larger images.
Deploying the application to Lambda
After testing the model locally, use the AWS SAM guided deployment process to package and deploy the application:
- To deploy a Lambda function based on a container image, the container image must be pushed to Amazon Elastic Container Registry (ECR). Run the following command to retrieve an authentication token and authenticate the Docker client with the ECR registry. Replace the region and accountID placeholders with your Region and AWS account ID:
aws --region <region> ecr get-login-password | docker login --username AWS --password-stdin <accountID>.dkr.ecr.<region>.amazonaws.com
- Use the AWS CLI to create an ECR repository called classifier-demo:
aws ecr create-repository \ --repository-name classifier-demo \ --image-tag-mutability MUTABLE \ --image-scanning-configuration scanOnPush=true
- Copy the repositoryUri from the output. This is needed in the next step. Initiate the AWS SAM guided deployment using the deploy command:
sam deploy --guided
- Follow the on-screen prompts. To accept the default options provided in the interactive experience, press Enter. When prompted for an ECR repository, use the Amazon ECR repository created in the previous step.
- AWS SAM packages and deploys the application as a versioned entity. After deployment, the production API endpoint is ready to use. The template produces multiple outputs. Find the unique URL of the endpoint in the “HelloWorldAPI” key in the “Outputs” section.
After retrieving the URL, test the live endpoint using a REST client:
Optimizing performance
After the Lambda function is deployed, you can optimize for latency and cost. To do this, adjust the memory allocation setting for the function, which also linearly changes the allocated vCPU (to learn more, read the AWS News Blog).
The digit classifier model is optimized with 5 GB memory (~3 vCPUs). Any gains beyond 5 GB are relatively minor. Each model responds differently to changes in vCPU and memory, so it is best practice to determine this experimentally. There are open-source tools available to automate performance tuning.
Further optimizations can be made by compiling the source code to take advantage of AVX2 instructions. AVX2 allows Lambda to run more operations per clock cycle, reducing the time it takes a model to generate predictions.
Cleaning up
This walkthrough creates a Lambda function, API Gateway endpoint, and an ECR repository. These resources incur charges so it is recommended to clean up resources to avoid incurring cost. To delete the ECR repository, run:
aws ecr delete-repository --registry-id <account-id> --repository-name classifier-demo --force
To delete the remaining resources, navigate to AWS CloudFormation in the AWS Management Console and select the Region used for the walkthrough. Select the stack created by AWS SAM (the default is “sam-app”) and choose Delete.
Conclusion
Lambda is a cost-effective, scalable, and reliable way for data scientists to deploy CPU-based machine learning models for inference. With support for larger functions sizes, AVX2 instruction sets, and container image support, Lambda can now deploy more complex models while maintaining low latency.
Use the new machine learning templates within AWS SAM today to deploy your first serverless machine learning application in minutes. We look forward to seeing the exciting machine learning applications that you build on Lambda.
For more serverless learning resources, visit Serverless Land.