Deploy with Amazon SageMaker

Amazon SageMaker makes it easy to generate predictions by providing everything you need to deploy machine learning models in production and monitor model quality.


Model monitoring

Machine learning models are typically trained and evaluated using historical data but their quality degrades after they are deployed in production. This is because the distribution of the data sent to models for predictions can vary from the distribution of data used during training. The validity of prediction results can change over time and errors can be introduced upstream which can impact model quality. To prevent this, you need to monitor the quality of the models in production, identify issues quickly, and take corrective actions such as auditing or retraining models. To achieve this, you need to build tooling to store prediction related data securely, followed by implementing various statistical techniques to analyze this data and evaluate the quality of the model. Finally, you need to detect deviations in model quality to take the right corrective actions. As an alternative to building additional tooling, retraining models at a regular frequency is done which can be expensive.

Amazon SageMaker Model Monitor eliminates the need to build any tooling to monitor models in production and detect when corrective actions need to be taken. SageMaker Model Monitor continuously monitors the quality of machine learning models in production, and alerts you when there are deviations in model quality.

SageMaker Model Monitor analyzes the data collected based on built-in rules or your own custom provided rules at a regular frequency to determine if there are any rule violations. The built-in rules can be used to analyze tabular data and detect common issues such as outliers in prediction data, drift in data distributions compared to training datasets, and changes in prediction accuracy based on observations from the real-world. SageMaker Model Monitor can be used within the SageMaker Studio, and metrics can be emitted to Amazon CloudWatch so you can set up alarms to audit and retrain models. You can view the alerts to understand what data is causing the drift so that you can adjust your training data accordingly. Prediction requests and responses are stored securely for all models deployed in SageMaker without requiring any code changes.

Model Monitor

One-click deployment

Amazon SageMaker makes it easy to deploy your trained model into production with a single click so that you can start generating predictions for real-time or batch data. You can one-click deploy your model onto auto-scaling Amazon ML instances across multiple availability zones for high redundancy. Just specify the type of instance, and the maximum and minimum number desired, and SageMaker takes care of the rest. SageMaker will launch the instances, deploy your model, and set up the secure HTTPS endpoint for your application. Your application simply needs to include an API call to this endpoint to achieve low latency, high throughput inference. This architecture allows you to integrate your new models into your application in minutes because model changes no longer require application code changes.

Batch Transform

Most of the time, processing batches of data for non-real-time predictions is one by resizing large datasets into smaller chunks of data and managing real-time endpoints. This can be expensive and error-prone. With the Batch Transform feature of Amazon SageMaker, there is no need to break down the data set into multiple chunks or manage real-time endpoints. Batch Transform allows you to run predictions on large or small batch datasets. Using a simple API, you can request predictions for a large number of data records and transform the data quickly and easily. Data can be as large as petabytes or as small as a few bytes to run explorations, or anywhere in between.

Train once, deploy anywhere

To achieve high inference performance across a range of edge devices, you typically need to spend weeks or months hand-tuning a model for each target device because every hardware configuration has a unique set of capabilities and restrictions. With Amazon SageMaker Neo, you can train your machine learning models once and deploy them anywhere in the cloud and at the edge.

SageMaker Neo uses machine learning to automatically optimize a trained model to run up to twice as fast and consumes less than a tenth of the memory footprint, with no loss in accuracy for your target deployment environment. You start with a machine learning model built using MXNet, TensorFlow, PyTorch, or XGBoost and trained using SageMaker. Then you choose your target hardware platform where you want the model to be deployed. With a single click, SageMaker Neo will compile the trained model into an executable. The compiler uses a neural network to discover and apply all of the specific performance optimizations that will make your model run most efficiently on the target hardware platform. The model can then be deployed to start making predictions in the cloud or at the edge. Local compute and ML inference capabilities can be brought to the edge with AWS IoT Greengrass.

To help make edge deployments easy, Greengrass supports Neo-optimized models so that you can deploy your models directly to the edge with over the air updates. Neo is also available as open source code as the Neo-AI project under the Apache Software License, enabling developers to customize the software for different devices and applications.





Integration with Kubernetes

Kubernetes is an open source system used to automate the deployment, scaling, and management of containerized applications. Many customers want to use the fully managed capabilities of Amazon SageMaker for machine learning, but also want platform and infrastructure teams to continue using Kubernetes for orchestration and managing pipelines. SageMaker addresses this requirement by letting Kubernetes users train and deploy models in SageMaker using SageMaker-Kubeflow operations and pipelines. With operators and pipelines, Kubernetes users can access fully managed SageMaker ML tools and engines, natively from Kubeflow. This eliminates the need to manually manage and optimize ML infrastructure in Kubernetes while still preserving control of overall orchestration through Kubernetes. Using SageMaker operators and pipelines for Kubernetes, you can get the benefits of a fully managed service for machine learning in Kubernetes, without migrating workloads.

Data processing beyond training

Real-life ML workloads typically require more than training and prediction. Data needs to be pre-processed and post-processed, sometimes in multiple steps. You might have to train and deploy using a sequence of algorithms that need to collaborate in delivering predictions from raw data. SageMaker enables you to deploy Inference Pipelines so you can pass raw input data and execute pre-processing, predictions, and post-processing on real-time and batch inference requests. Inference Pipelines can be comprised of any machine learning framework, built-in algorithm, or custom containers usable on Amazon SageMaker. You can build feature data processing and feature engineering pipelines with a suite of feature transformers available in the SparkML and Scikit-learn framework containers in Amazon SageMaker, and deploy these as part of the Inference Pipelines to reuse data processing code and easier management of machine learning processes.

Multi-Model Endpoints

Increasingly companies are training machine learning models based on individual user data. For example, a music streaming service will train custom models based on each listener’s music history to personalize music recommendations or a taxi service will train custom models based on each city’s traffic patterns to predict rider wait times. Building custom ML models for each use case leads to higher inference accuracy, but the cost of deploying and managing models increases significantly. These challenges become more pronounced when not all models are accessed at the same rate but still need to be available at all times.

Amazon SageMaker Multi-Model Endpoints provides a scalable and cost effective way to deploy large numbers of custom machine learning models. SageMaker Multi-Model endpoints enable you to deploy multiple models with a single click on a single endpoint and serve them using a single serving container. You specify the type of instance, and the maximum and minimum number desired, and SageMaker takes care of the rest. SageMaker will launch the instances, deploy your model, and set up the secure HTTPS endpoint for your application. Your application simply needs to include an API call with the target model to this endpoint to achieve low latency, high throughput inference.

Get high performance and low cost inference in the cloud

Using Amazon SageMaker, you can deploy your trained machine learning models to Amazon Inf1 instances, built using the AWS Inferentia chip, to provide high performance and low cost inference. Using Inf1 instances, you can run large scale machine learning inference applications like image recognition, speech recognition, natural language processing, personalization, and fraud detection. With Amazon SageMaker Neo, you can compile your trained machine learning models to run optimally on Inf1 instances and easily deploy the compiled models on Inf1 instances for real-time inference.

Tell me more

Get your ML models from experimentation to production

Learn how to deploy ML models to production using SageMaker

Deploy trained Keras or TensorFlow models using Amazon SageMaker

Learn how to deploy a trained Keras or TensorFlow using Amazon SageMaker

Automate Amazon SageMaker custom ML models  

Follow these examples on GitHub and automate the building, training, and deploying of custom ML models.