Deploy trained Keras or TensorFlow models using Amazon SageMaker
Note: This post was updated March 2021 to include SageMaker Neo compilation. Updated the compatibility for model trained using Keras 2.2.x with h5py 2.10.0 and TensorFlow 1.15.3.
Amazon SageMaker makes it easier for any developer or data scientist to build, train, and deploy machine learning (ML) models. While it’s designed to alleviate the undifferentiated heavy lifting from the full lifecycle of ML models, Amazon SageMaker’s capabilities can also be used independently of one another. Models trained in SageMaker can be optimized and deployed outside of SageMaker including edge (mobile or IoT devices). Conversely, SageMaker can deploy and host pre-trained models such as model zoos or models trained locally by your team.
In this notebook, we’ll demonstrate how to deploy a trained Keras (TensorFlow backend) model using SageMaker. We’ll take advantage of SageMaker deployment features, such as selecting the type and number of instances, model compilation to improve inference latency, and automatic scaling.
Your trained model must be saved in either the Keras (JSON and weights hdf5) format or the TensorFlow Protobuf format. If you’d like to begin from a sample notebook that supports this blog post, download it here.
Step 1. Set up
In the AWS Management Console, go to the Amazon SageMaker console. Choose Notebook Instances, and create a new notebook instance. Upload the current notebook and set the kernel to
get_execution_role function retrieves the AWS Identity and Access Management (IAM) role you created at the time of creating your notebook instance.
Step 2. Load the Keras model using the JSON and weights file
If you saved your model in the TensorFlow ProtoBuf format, skip to “Step 4. Convert the TensorFlow model to an Amazon SageMaker-readable format.”
Create a directory called keras_model, download hosted Keras model, and unzip the model.json and model-weights.h5 files to keras_model/.
Load model from directory.
Step 3. Export the Keras model to the TensorFlow ProtoBuf format
Step 4. Convert TensorFlow model to an Amazon SageMaker-readable format
Amazon SageMaker will recognize this as a loadable TensorFlow model. Your directory and file structure should look like this:
Tar the entire directory and upload to Amazon S3.
Step 5. Deploy the trained model
Step 6. Invoke the SageMaker endpoint
Step 7. Compile model using SageMaker Neo
SageMaker Neo makes it easy to compile pre-trained TensorFlow models and build an inference optimized container without the need for any custom model serving or inference code.
Invoke the optimized SageMaker endpoint.
Step 8. Clean up
To avoid incurring charges to your AWS account for the resources used in this tutorial, you need to delete the SageMaker Endpoint.
In this blog post, we demonstrated converting a Keras model to TensorFlow SavedModel format, deploying a trained model to a SageMaker Endpoint, and compiling the same trained model using SageMaker Neo to get better performance. Using Amazon SageMaker, you can take a trained model and in a few lines of code have a scalable, managed inference deployment. This gives you the flexibility to use your existing model training workflows, while easily deploying trained models to production with all the benefits and optimizations offered by a managed platform.
About the Author
Priya Ponnapalli is a Principal Data Scientist at Amazon ML Solutions Lab, where she helps AWS customers across different industries accelerate their AI and cloud adoption.