AWS Compute Blog

Building deep learning inference with AWS Lambda and Amazon EFS

This post is courtesy of Giuseppe Angelo Porcelli, Principal ML Specialist SA, and Diego Natali, Solutions Architect.

Amazon EFS for AWS Lambda makes it easier for serverless applications requiring persistent file storage or access to large amounts of reference data. Previously, applications had to download data from an object store or database to local ephemeral storage in 512-MB chunks for processing. This creates more code, causes slower startup behavior, and slower data processing. Customers also faced challenges when loading large code packages and models for ML inference.

Recently, AWS announced Amazon EFS support for AWS Lambda. It enables customers to easily share data across function invocations. It also allows you to read large reference data files, and write function output to a persistent and shared data store. Customers can now use Lambda to build data-intensive applications, and load larger libraries and models. They can process larger amounts of data in a highly distributed manner, and share data across functions, containers, and instances.

In this blog post, we show how you can use EFS to store deep learning (DL) framework libraries and models to load from Lambda to execute inferences. We provide a code example on executing serverless inferences with TensorFlow 2.

Using EFS and Lambda for deep learning inference requires to execute two steps:

  1. Storing the deep learning libraries and model on EFS
  2. Creating a Lambda function for inference, which loads the libraries and model from the EFS file system

In the next sections, we share some best practices to implement these steps, and then discuss a full, working example.

Prerequisites

This post assumes experience with Lambda, EFS, plus general knowledge of Python programming, DL, and DL frameworks. To help you get started, read the blog post and documentation.

1. Storing the deep learning libraries and model on Amazon EFS

To populate EFS with DL framework Python libraries and the DL model, there are different options. You can use EC2 instances, third-party tools like cmda or AWS CodeBuild. AWS CodeBuild is a fully managed continuous integration service that compiles source code, runs tests, and produces software packages for deployment.

This blog post uses an AWS CodeBuild project, configured as follows:

  • The build environment is a Docker container replicating the Lambda runtime environment. To make sure that the packages work in Lambda, it uses the lambci/lambda build container images on Docker Hub.
  • The EFS file system is mounted to the CodeBuild environment.
  • Build commands are used to install the DL framework and download the model to specific paths of the file system.

After the build completes, the EFS file system contains the Python libraries and the model in specific paths. It is attached to the Lambda function for loading those libraries at runtime and execute inference.

For this example, these are the CodeBuild commands to install the TensorFlow 2 framework and an SSD (Single Shot MultiBox Detector) pre-trained object detection model from TensorFlow Hub:

'echo "Downloading and copying model..."',
'mkdir -p $CODEBUILD_EFS1/lambda/model',
'curl https://storage.googleapis.com/tfhub-modules/google/openimages_v4/ssd/mobilenet_v2/1.tar.gz --output /tmp/1.tar.gz',
'tar zxf /tmp/1.tar.gz -C $CODEBUILD_EFS1/lambda/model',

'echo "Installing virtual environment..."',
'mkdir -p $CODEBUILD_EFS1/lambda',
'python3 -m venv $CODEBUILD_EFS1/lambda/tensorflow',
'echo "Installing Tensorflow..."',
'source $CODEBUILD_EFS1/lambda/tensorflow/bin/activate && pip3 install ' +
              (props.installPackages ? props.installPackages : "tensorflow"),

'echo "Changing folder permissions..."',
'chown -R 1000:1000 $CODEBUILD_EFS1/lambda/'

Considerations

  • The approach described can also work for other ML/DL frameworks
  • The EFS file system can be attached to multiple Lambda functions. This means it can share the DL framework libraries with multiple inference functions (up to 25000 connections for each file system).
  • There are alternatives to using EFS for model storage. If the model size fits in the Lambda package deployment, then you could optimize the first invocation since it doesn’t need to download the model. You can also use the function’s initializer to load the model since the first mount to EFS only takes a few hundred milliseconds.

2. Creating a Lambda function for inference

After attaching the EFS file system, you may structure the Lambda code as follows:

Lambda code structure

The code outside the handler method first adds the local mount path to the Python path. It then imports the frameworks, and loads the model into memory. Executing those operations outside of the function’s handler ensures that those objects remain initialized and reused in subsequent invocations of the same Lambda function instance. The code inside the handler runs the inference flow by reading inputs, executing the actual inference, and returning the results to the caller.

For hosting the TensorFlow 2 object detection model in the example, this is the function code:

import sys
import os

# Setting library paths.
efs_path = "/mnt/python"
python_pkg_path = os.path.join(efs_path, "tensorflow/lib/python3.8/site-packages")
sys.path.append(python_pkg_path)

import json
import string
import time
import io
import requests

# Importing TensorFlow
import tensorflow as tf

# Loading model
model_path = os.path.join(efs_path, 'model/')
loaded_model = tf.saved_model.load(model_path)
detector = loaded_model.signatures['default']

def lambda_handler(event, context):
    r = requests.get(event['url'])
    img = tf.image.decode_jpeg(r.content, channels=3)

    # Executing inference.
    converted_img  = tf.image.convert_image_dtype(img, tf.float32)[tf.newaxis, ...]
    start_time = time.time()
    result = detector(converted_img)
    end_time = time.time()

    obj = {
        'detection_boxes' : result['detection_boxes'].numpy().tolist(),
        'detection_scores': result['detection_scores'].numpy().tolist(),
        'detection_class_entities': [el.decode('UTF-8') for el in result['detection_class_entities'].numpy()] 
    }    

    return {
        'statusCode': 200,
        'body': json.dumps(obj)
    }

When invoked, the response is like:

{
    "statusCode": 200,
    "body": "{
    \"detection_boxes\": This field contains the relative position of the bounding boxes,
    \"detection_class_entities\": This field returns the class labels,
    \"detection_scores\": This field returns the detection confidences
    }"
}

Running the example

This working example is provided to set up and run ML/AI inference on Lambda using EFS. To run it, you must have the AWS CDK installed. Execute the following commands:

# clone repository
$ git clone https://github.com/aws-samples/lambda-efs-deep-learning-inference.git
$ cd lambda-efs-ml-demo

# Install the CDK and bootstrap the target account (if this was never done before)
$ npm install -g aws-cdk
$ cdk bootstrap aws://{account_id}/{region}

# Install packages for the project, build and deploy
$ cd cdk/
$ npm install
$ npm run build
$ cdk deploy

After deployment, note the output:

Outputs:
LambdaEFSMLDemo.LambdaFunctionName = LambdaEFSMLDemo-LambdaEFSMLExecuteInference17332C2-0546aa45dfXXXXXX

It takes a few minutes for AWS CodeBuild to deploy the libraries and framework to EFS. To test the Lambda function, run this command, replacing the function name:

$ aws lambda invoke \
    --function-name LambdaEFSMLDemo-LambdaEFSMLExecuteInference17332C2-0546aa45dfXXXXXX \
    --region us-east-1 \
    --cli-binary-format raw-in-base64-out \
    --payload '{"url": "https://images.pexels.com/photos/310983/pexels-photo-310983.jpeg?auto=compress&cs=tinysrgb&dpr=2&h=650&w=940"}' \
    --region us-east-1 \
    /tmp/return.json    

This is the output:

{
    "StatusCode": 200,
    "ExecutedVersion": "$LATEST"
}

Here you can check the inference’s result:
$ tail /tmp/return.json

The following image shows the bounding boxes created from the inference output.

Inference result

The following image shows the bounding boxes created from the inference output.

Image with bounding boxes

To generate this image with the bounding boxes, use the Jupyter notebook from the repository. We reduce the number of bounding boxes to the most relevant classes:

  • Bicycle: 91%
  • Wheel: 48%
  • Person: 45%
  • Wheel: 44%
  • Man: 40%
  • Bicycle wheel: 37%
  • Bicycle wheel: 30%

To clean up the deployment, run:

$ cdk destroy

Performance considerations

When planning for ML inference, you must keep three main aspects in mind: the type of compute resources required for inference, model size and memory footprint, function initialization and cold start.

Lambda is best suited for CPU-based inferencing, which meets the needs for most ML/DL inference use cases. Lambda’s memory can be set between 128 MB and 3008 MB. This means that large models (for example, FasterRCNN models) that may require more memory or dedicated GPUs are not a good fit.

It’s important to understand how Lambda invokes affect performance. The first request to a function instance is called a “cold-start”. This is where the function is provisioned, code downloaded, and the initializer is executed to download the code and load libraries. In this example, it takes about 40 seconds to load the full TensorFlow 2 libraries from EFS, and another 8 seconds to load the model into memory.

Subsequent calls to the same Lambda function instance don’t incur cold start latency if the request is handled by an existing execution environment. Customers who want to reduce this one-time cold start can use Provisioned Concurrency. This feature provides customers with greater control over performance of their serverless applications at any scale.

The EFS mount operation only takes a few hundred milliseconds and only happens once during the function provisioning. EFS supports up to 25,000 connections so is ideal for functions that scale up. We recommend you use EFS provisioned throughput with Provisioned Concurrency for better performance. To learn more, read the documentation about Amazon EFS performance and monitoring Amazon EFS.

Conclusion

This post shows how you can use EFS for Lambda to deploy large DL libraries and models into a function for synchronous invocations. The same approach can be applied to asynchronous invokes. For example, you could perform object detection on images stored in Amazon S3, or streaming invokes on data in Amazon Kinesis and Amazon DynamoDB.

EFS for Lambda enables many new use cases. To learn more about how to use EFS for Lambda, see the AWS News Blog post and read the documentation.