AWS Machine Learning Blog

Deploy trained Keras or TensorFlow models using Amazon SageMaker

Note: This post was updated March 2021 to include SageMaker Neo compilation. Updated the compatibility for model trained using Keras 2.2.x with h5py 2.10.0 and TensorFlow 1.15.3.

Amazon SageMaker makes it easier for any developer or data scientist to build, train, and deploy machine learning (ML) models. While it’s designed to alleviate the undifferentiated heavy lifting from the full lifecycle of ML models, Amazon SageMaker’s capabilities can also be used independently of one another. Models trained in SageMaker can be optimized and deployed outside of SageMaker including edge (mobile or IoT devices). Conversely, SageMaker can deploy and host pre-trained models such as model zoos or models trained locally by your team.

In this notebook, we’ll demonstrate how to deploy a trained Keras (TensorFlow backend) model using SageMaker. We’ll take advantage of SageMaker deployment features, such as selecting the type and number of instances, model compilation to improve inference latency, and automatic scaling.

Your trained model must be saved in either the Keras (JSON and weights hdf5) format or the TensorFlow Protobuf format. If you’d like to begin from a sample notebook that supports this blog post, download it here.

Step 1. Set up

In the AWS Management Console, go to the Amazon SageMaker console. Choose Notebook Instances, and create a new notebook instance. Upload the current notebook and set the kernel to conda_tensorflow_p36.

The get_execution_role function retrieves the AWS Identity and Access Management (IAM) role you created at the time of creating your notebook instance.

from sagemaker import get_execution_role
from sagemaker import Session
role = get_execution_role()
sess = Session()
bucket = sess.default_bucket()

If you are running this locally, check your version of TensorFlow to prevent downstream framework errors.

import tensorflow as tf
print(tf.__version__)  # This notebook runs on TensorFlow 1.15.x or earlier
tf_framework_version = tf.__version__

Import and install the necessary Python packages.

# reference: https://github.com/keras-team/keras/issues/14265
!pip install "h5py==2.10.0"
import h5py
import numpy as np

Step 2. Load the Keras model using the JSON and weights file

If you saved your model in the TensorFlow ProtoBuf format, skip to “Step 4. Convert the TensorFlow model to an Amazon SageMaker-readable format.”

Create a directory called keras_model, download hosted Keras model, and unzip the model.json and model-weights.h5 files to keras_model/.

!mkdir keras_model
!wget https://s3.amazonaws.com/aws-ml-blog/artifacts/keras-tensorflow-model-deployment/model.zip
!unzip model.zip -d keras_model

Load model from directory.

import os
import tensorflow as tf
import tensorflow.keras as keras
from keras.models import model_from_json

with open(os.path.join('keras_model', 'model.json'), 'r') as fp:
    loaded_model_json = fp.read()
loaded_model = model_from_json(loaded_model_json)
loaded_model.load_weights('keras_model/model-weights.h5')

Step 3. Export the Keras model to the TensorFlow ProtoBuf format

from tensorflow.python.saved_model import builder
from tensorflow.python.saved_model.signature_def_utils import predict_signature_def
from tensorflow.python.saved_model import tag_constants

# Note: This directory structure will need to be followed 
model_version = '1'
export_dir = 'export/Servo/' + model_version
# Build the Protocol Buffer SavedModel at 'export_dir'
builder = builder.SavedModelBuilder(export_dir)

# Create prediction signature to be used by TensorFlow Serving Predict API
signature = predict_signature_def(
    inputs={"inputs": loaded_model.input}, outputs={"score": loaded_model.output})

session = tf.compat.v1.Session()
init_op = tf.compat.v1.global_variables_initializer()
session.run(init_op)
# Save the meta graph and variables
builder.add_meta_graph_and_variables(
    sess=session, tags=[tag_constants.SERVING], signature_def_map={"serving_default": signature})
builder.save() 

Step 4. Convert TensorFlow model to an Amazon SageMaker-readable format

Amazon SageMaker will recognize this as a loadable TensorFlow model. Your directory and file structure should look like this:

model_path = 'export/Servo/1/'
!saved_model_cli show --all --dir {model_path}

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['inputs'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 50)
        name: dense_1_input:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['score'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1)
        name: dense_7/Sigmoid:0
  Method name is: tensorflow/serving/predict

Tar the entire directory and upload to Amazon S3.

import tarfile
model_archive = 'model.tar.gz'
with tarfile.open(model_archive, mode='w:gz') as archive:
    archive.add('export', recursive=True) 

# upload model artifacts to S3
model_data = sess.upload_data(path=model_archive, key_prefix='model')

Step 5. Deploy the trained model

from sagemaker.tensorflow.serving import Model
instance_type = 'ml.c5.xlarge' 

sm_model = Model(model_data=model_data, 
framework_version=tf_framework_version,role=role)

%%time
uncompiled_predictor = sm_model.deploy(initial_instance_count=1, instance_type=instance_type) 

Step 6. Invoke the SageMaker endpoint

# The sample model expects an input of shape [1,50]
data = np.random.randn(1, 50)
uncompiled_predictor.predict(data)

Step 7. Compile model using SageMaker Neo

SageMaker Neo makes it easy to compile pre-trained TensorFlow models and build an inference optimized container without the need for any custom model serving or inference code.

instance_family = 'ml_c5'
framework = 'tensorflow'
compilation_job_name = 'keras-compile'
# output path for compiled model artifact
compiled_model_path = 's3://{}/{}/output'.format(bucket,compilation_job_name)
data_shape = {'inputs':[1, data.shape[0], data.shape[1]]}

optimized_estimator = sm_model.compile(target_instance_family=instance_family,
                                         input_shape=data_shape,
                                         job_name=compilation_job_name,
                                         role=role,
                                         framework=framework,
                                      framework_version=tf_framework_version,
                                         output_path=compiled_model_path
                  )

optimized_predictor = optimized_estimator.deploy(initial_instance_count = 1, instance_type = instance_type) 

Invoke the optimized SageMaker endpoint.

optimized_predictor.predict(data)

Step 8. Clean up

To avoid incurring charges to your AWS account for the resources used in this tutorial, you need to delete the SageMaker Endpoint.

uncompiled_predictor.delete_endpoint()
optimized_predictor.delete_endpoint()

Conclusion

In this blog post, we demonstrated converting a Keras model to TensorFlow SavedModel format, deploying a trained model to a SageMaker Endpoint, and compiling the same trained model using SageMaker Neo to get better performance. Using Amazon SageMaker, you can take a trained model and in a few lines of code have a scalable, managed inference deployment. This gives you the flexibility to use your existing model training workflows, while easily deploying trained models to production with all the benefits and optimizations offered by a managed platform.


About the Author

Priya Ponnapalli is a Principal Data Scientist at Amazon ML Solutions Lab, where she helps AWS customers across different industries accelerate their AI and cloud adoption.