AWS Machine Learning Blog
Deploy trained Keras or TensorFlow models using Amazon SageMaker
This post was reviewed and updated May 2022, to enforce model results reproducibility, add reproducibility checks, and to add a batch transform example for model predictions.
Previously, this post was updated March 2021 to include SageMaker Neo compilation. Updated the compatibility for model trained using Keras 2.2.x with h5py 2.10.0 and TensorFlow 1.15.3.
Amazon SageMaker makes it easier for any developer or data scientist to build, train, and deploy machine learning (ML) models. While it’s designed to alleviate the undifferentiated heavy lifting from the full lifecycle of ML models, Amazon SageMaker’s capabilities can also be used independently of one another. Models trained in SageMaker can be optimized and deployed outside of SageMaker including edge (mobile or IoT devices). Conversely, SageMaker can deploy and host pre-trained models such as model zoos or models trained locally by your team.
In this notebook, we’ll demonstrate how to deploy a trained Keras (TensorFlow backend) model using SageMaker. We’ll take advantage of SageMaker deployment features, such as selecting the type and number of instances, model compilation to improve inference latency, and automatic scaling.
Your trained model must be saved in either the Keras (JSON and weights hdf5) format or the TensorFlow Protobuf format. If you’d like to begin from a sample notebook that supports this blog post, download it here.
Step 1. Set up
In the AWS Management Console, go to the Amazon SageMaker console. Choose Notebook Instances, and create a new notebook instance. Upload the current notebook and set the kernel to conda_tensorflow_p36
.
The get_execution_role
function retrieves the AWS Identity and Access Management (IAM) role you created at the time of creating your notebook instance.
Step 2. Load the Keras model using the JSON and weights file
If you saved your model in the TensorFlow ProtoBuf format, skip to “Step 4. Convert the TensorFlow model to an Amazon SageMaker-readable format.”
Create a directory called keras_model, download hosted Keras model, and unzip the model.json and model-weights.h5 files to keras_model/.
Load model from directory.
We set the learning phase for the Keras model to test (= 0). This is particularly useful for an error-free protobuf conversion if your model contains batchnorm layers.
Now we load in the model architecture and the trained model’s weights.
Step 3. Export the Keras model to the TensorFlow ProtoBuf format
Step 4. Convert TensorFlow model to an Amazon SageMaker-readable format
Amazon SageMaker will recognize this as a loadable TensorFlow model. Your directory and file structure should look like this:
Tar the entire directory and upload to Amazon S3.
Step 5. Deploy the trained model
We will first upload the prepared model files for ingestion into a SageMaker endpoint, and deploy the model.
Invoking the SageMaker endpoint
We will now query the SageMaker endpoint with some random data.
We will also use this opportunity to double check that the predictions match between the deployed model and the original model.
Step 6. Add batch transform for predictions
Alternatively to using SageMaker Endpoint, you may want to batch your prediction jobs. Follow these steps to configure and launch your batch job. Note that SageMaker Endpoint intrinsically only supports input data in the JSON, JSON-lines, and CSV formats.
For custom data formats, you will need to create and add a custom inference.py script in the model archive, before proceeding to the model deployment. Refer to the SageMaker documentation and SageMaker examples for more details.
Create some random data first:
Then upload the resulting input data to a file on S3:
Configure and instantiate your batch job:
Start your batch job – this might take a while. Note that you have to specify the type of content (text/csv) and the data type (S3Prefix) you are passing in.
Check the job output:
Step 7. Compile model using SageMaker Neo
SageMaker Neo makes it easy to compile pre-trained TensorFlow models and build an inference optimized container without the need for any custom model serving or inference code.
Let us compile and deploy an optimized version of our model using SageMaker Neo. We will also double check that the predictions match between the deployed optimized model and the original model.
Deployment of the optimized model may take a little longer than deploying to a SageMaker endpoint.
Invoke the optimized SageMaker endpoint.
Also get original model predictions for the same data.
Check that the SageMaker endpoint predictions match the predictions from the original model, for this input data.
Step 8. Clean up
To avoid incurring charges to your AWS account for the resources used in this tutorial, you need to delete the SageMaker Endpoint.
Conclusion
In this blog post, we demonstrated converting a Keras model to TensorFlow SavedModel format, deploying a trained model to a SageMaker Endpoint, and compiling the same trained model using SageMaker Neo to get better performance. Using Amazon SageMaker, you can take a trained model and in a few lines of code have a scalable, managed inference deployment. This gives you the flexibility to use your existing model training workflows, while easily deploying trained models to production with all the benefits and optimizations offered by a managed platform.
About the Authors
Priya Ponnapalli is a Principal Data Scientist at Amazon ML Solutions Lab, where she helps AWS customers across different industries accelerate their AI and cloud adoption.
Jasleen Grewal is an Applied Scientist at Amazon Web Services, where she works with AWS customers to solve real world problems using machine learning, with special focus on precision medicine and genomics. She has a strong background in bioinformatics, oncology, and clinical genomics. She is passionate about using AI/ML and cloud services to improve patient care.
Federico Piccinini is a Deep Learning Architect for the Amazon Machine Learning Solutions Lab. He is passionate about machine learning, explainable AI, and MLOps. He focuses on designing ML pipelines for AWS customers. Outside of work, he enjoys sports and pizza.
Selvan Senthivel is a Senior ML Engineer with the Amazon ML Solutions Lab at AWS, focusing on helping customers on machine learning, deep learning problems, and end-to-end ML solutions. He was a founding engineering lead of Amazon Comprehend Medical and contributed to the design and architecture of multiple AWS AI services.