AWS Machine Learning Blog

Deploy trained Keras or TensorFlow models using Amazon SageMaker

This post was reviewed and updated May 2022, to enforce model results reproducibility, add reproducibility checks, and to add a batch transform example for model predictions.

Previously, this post was updated March 2021 to include SageMaker Neo compilation. Updated the compatibility for model trained using Keras 2.2.x with h5py 2.10.0 and TensorFlow 1.15.3.

Amazon SageMaker makes it easier for any developer or data scientist to build, train, and deploy machine learning (ML) models. While it’s designed to alleviate the undifferentiated heavy lifting from the full lifecycle of ML models, Amazon SageMaker’s capabilities can also be used independently of one another. Models trained in SageMaker can be optimized and deployed outside of SageMaker including edge (mobile or IoT devices). Conversely, SageMaker can deploy and host pre-trained models such as model zoos or models trained locally by your team.

In this notebook, we’ll demonstrate how to deploy a trained Keras (TensorFlow backend) model using SageMaker. We’ll take advantage of SageMaker deployment features, such as selecting the type and number of instances, model compilation to improve inference latency, and automatic scaling.

Your trained model must be saved in either the Keras (JSON and weights hdf5) format or the TensorFlow Protobuf format. If you’d like to begin from a sample notebook that supports this blog post, download it here.

Step 1. Set up

In the AWS Management Console, go to the Amazon SageMaker console. Choose Notebook Instances, and create a new notebook instance. Upload the current notebook and set the kernel to conda_tensorflow_p36.

The get_execution_role function retrieves the AWS Identity and Access Management (IAM) role you created at the time of creating your notebook instance.

from sagemaker import get_execution_role
from sagemaker import Session
role = get_execution_role()
sess = Session()
bucket = sess.default_bucket()

If you are running this locally, check your version of TensorFlow to prevent downstream framework errors.

import tensorflow as tf
print(tf.__version__)  # This notebook runs on TensorFlow 1.15.x or earlier
tf_framework_version = tf.__version__

Import and install the necessary Python packages.

# reference:
!pip install "h5py==2.10.0"
import h5py
import numpy as np

Step 2. Load the Keras model using the JSON and weights file

If you saved your model in the TensorFlow ProtoBuf format, skip to “Step 4. Convert the TensorFlow model to an Amazon SageMaker-readable format.”

Create a directory called keras_model, download hosted Keras model, and unzip the model.json and model-weights.h5 files to keras_model/.

!mkdir keras_model
!unzip -d keras_model

Load model from directory.

import os
import tensorflow as tf
import tensorflow.keras as keras
from keras.models import model_from_json
from keras import backend as K

We set the learning phase for the Keras model to test (= 0). This is particularly useful for an error-free protobuf conversion if your model contains batchnorm layers.


Now we load in the model architecture and the trained model’s weights.

with open(os.path.join('keras_model', 'model.json'), 'r') as fp:
    loaded_model_json =
loaded_model = model_from_json(loaded_model_json)

Step 3. Export the Keras model to the TensorFlow ProtoBuf format

from tensorflow.python.saved_model import builder
from tensorflow.python.saved_model.signature_def_utils import predict_signature_def
from tensorflow.python.saved_model import tag_constants

# Note: This directory structure will need to be followed 
model_version = '1'
export_dir = 'export/Servo/' + model_version

# Build the Protocol Buffer SavedModel at path defined by export_dir variable
builder = builder.SavedModelBuilder(export_dir)

# Create prediction signature to be used by TensorFlow Serving Predict API
signature = predict_signature_def(
    inputs={"inputs": loaded_model.input}, outputs={"score": loaded_model.output})

# Save the meta graph and variables
    sess=K.get_session(), tags=[tag_constants.SERVING], signature_def_map={"serving_default": signature})

Step 4. Convert TensorFlow model to an Amazon SageMaker-readable format

Amazon SageMaker will recognize this as a loadable TensorFlow model. Your directory and file structure should look like this:

model_path = 'export/Servo/1/'
!saved_model_cli show --all --dir {model_path}

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

  The given SavedModel SignatureDef contains the following input(s):
    inputs['inputs'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 50)
        name: dense_1_input:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['score'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1)
        name: dense_7/Sigmoid:0
  Method name is: tensorflow/serving/predict

Tar the entire directory and upload to Amazon S3.

import tarfile
model_archive = 'model.tar.gz'
with, mode='w:gz') as archive:
    archive.add('export', recursive=True) 

# upload model artifacts to S3
model_data = sess.upload_data(path=model_archive, key_prefix='model')

Step 5. Deploy the trained model

We will first upload the prepared model files for ingestion into a SageMaker endpoint, and deploy the model.

from sagemaker.tensorflow.serving import Model

# Select which type of SageMaker EC2 instance to deploy the model on  
instance_type = 'ml.c5.xlarge' 

# Instantiate the SageMaker TensorFlow serving model  
sm_model = Model(model_data=model_data, 

uncompiled_predictor = sm_model.deploy(initial_instance_count=1, instance_type=instance_type)   

Invoking the SageMaker endpoint

We will now query the SageMaker endpoint with some random data.

We will also use this opportunity to double check that the predictions match between the deployed model and the original model.

# The sample model expects an input of shape [1,50]
data = np.random.randn(1, 50)

deployed_model_preds = uncompiled_predictor.predict(data)

# Also get original model predictions for the same data 
original_model_preds = loaded_model.predict(data)

    deployed_model_preds == original_model_preds
    assert True
except Exception:
    print("Looks like your deployed model doesn't work exactly like your original model!")   

Step 6. Add batch transform for predictions

Alternatively to using SageMaker Endpoint, you may want to batch your prediction jobs. Follow these steps to configure and launch your batch job.  Note that SageMaker Endpoint intrinsically only supports input data in the JSON, JSON-lines, and CSV formats.

For custom data formats, you will need to create and add a custom script in the model archive, before proceeding to the model deployment. Refer to the SageMaker documentation and SageMaker examples for more details.

s3_model_path = 's3://{}/model/{}'.format(bucket, model_archive)

# Instantiate the SageMaker TensorFlow serving model  
tensorflow_serving_model = Model(model_data=model_data, 

Create some random data first:

# The sample model expects an input of shape [1,50]
data = np.random.randn(5, 50) # 5 samples  

# Save the random data locally first (make sure your file doesn't have a header or index!)
with open('batch_inputs.csv', 'w') as f:
    np.savetxt(f, data, delimiter=',', header='')

Then upload the resulting input data to a file on S3:

# Then upload the resulting datafile to an S3 path.  
# Here the path is s3://{sess.default_bucket()}/inputs/batch_inputs.csv  
# We assigned sess.default_bucket() to the variable 'bucket' at the starting of the notebook.  
s3_input_path = sess.upload_data(bucket=bucket,

# Also specify the S3 paths for the batch outputs    
s3_output_path = 's3://{}/outputs/'.format(bucket)   

Configure and instantiate your batch job:

# Configure your batch job
batch_instance_count = 1
batch_instance_type = 'ml.c5.xlarge'
concurrency = 1
max_payload_in_mb = 10

# Instantiate batch job
transformer = tensorflow_serving_model.transformer(

Start your batch job – this might take a while. Note that you have to specify the type of content (text/csv) and the data type (S3Prefix) you are passing in.

transformer.transform(data=s3_input_path, content_type='text/csv', data_type='S3Prefix')

Check the job output:

# Check out job output at your defined location on s3
s3_list_outputs = sess.list_s3_files(bucket, 'outputs')

# Read in each of the batch outputs  
# Note - the outputs are in json format.  
import pandas as pd
batch_outputs = []
for m in s3_list_outputs:
    temp_file = pd.read_json('s3://{}/{}'.format(bucket, m))
batch_outputs = pd.concat(batch_outputs)

#Print the outputs

Step 7. Compile model using SageMaker Neo

SageMaker Neo makes it easy to compile pre-trained TensorFlow models and build an inference optimized container without the need for any custom model serving or inference code.

Let us compile and deploy an optimized version of our model using SageMaker Neo. We will also double check that the predictions match between the deployed optimized model and the original model.

Deployment of the optimized model may take a little longer than deploying to a SageMaker endpoint.

import random, string

instance_family = 'ml_c5'
framework = 'tensorflow'
# We add a unique random identifier to the output  
random_output_identifier = ''.join(random.choices(string.ascii_letters + string.digits, k=6))  
compilation_job_name = 'keras-compile-' + random_output_identifier  

# output path for compiled model artifact
compiled_model_path = 's3://{}/{}/output'.format(bucket, compilation_job_name)
print('Saving output at {0}'.format(compiled_model_path))
data_shape = {'inputs':[1, data.shape[0], data.shape[1]]}

optimized_estimator = sm_model.compile(target_instance_family=instance_family,

optimized_predictor = optimized_estimator.deploy(initial_instance_count = 1, instance_type = instance_type)

Invoke the optimized SageMaker endpoint.

optimized_model_preds = optimized_predictor.predict(data)

Also get original model predictions for the same data.

original_model_preds = loaded_model.predict(data)

Check that the SageMaker endpoint predictions match the predictions from the original model, for this input data.

    optimized_model_preds == original_model_preds
    assert True
except Exception:
    print("Looks like your deployed optimized model doesn't work exactly like your original model!")

Step 8. Clean up

To avoid incurring charges to your AWS account for the resources used in this tutorial, you need to delete the SageMaker Endpoint.



In this blog post, we demonstrated converting a Keras model to TensorFlow SavedModel format, deploying a trained model to a SageMaker Endpoint, and compiling the same trained model using SageMaker Neo to get better performance. Using Amazon SageMaker, you can take a trained model and in a few lines of code have a scalable, managed inference deployment. This gives you the flexibility to use your existing model training workflows, while easily deploying trained models to production with all the benefits and optimizations offered by a managed platform.

About the Authors

Priya Ponnapalli is a Principal Data Scientist at Amazon ML Solutions Lab, where she helps AWS customers across different industries accelerate their AI and cloud adoption.

Jasleen Grewal is an Applied Scientist at Amazon Web Services, where she works with AWS customers to solve real world problems using machine learning, with special focus on precision medicine and genomics. She has a strong background in bioinformatics, oncology, and clinical genomics. She is passionate about using AI/ML and cloud services to improve patient care.

Federico Piccinini is a Deep Learning Architect for the Amazon Machine Learning Solutions Lab. He is passionate about machine learning, explainable AI, and MLOps. He focuses on designing ML pipelines for AWS customers. Outside of work, he enjoys sports and pizza.

Selvan Senthivel is a Senior ML Engineer with the Amazon ML Solutions Lab at AWS, focusing on helping customers on machine learning, deep learning problems, and end-to-end ML solutions. He was a founding engineering lead of Amazon Comprehend Medical and contributed to the design and architecture of multiple AWS AI services.