Amazon SageMaker Neo

Train models once, run anywhere with up to 2x performance improvement

Amazon SageMaker Neo enables developers to train machine learning models once and run them anywhere in the cloud and at the edge. Amazon SageMaker Neo optimizes models to run up to twice as fast, with less than a tenth of the memory footprint, with no loss in accuracy.

Developers spend a lot of time and effort to deliver accurate machine learning models that can make fast, low-latency predictions in real-time. This is particularly important for edge devices where memory and processing power tend to be highly constrained, but latency is very important. For example, sensors in autonomous vehicles typically need to process data in a thousandth of a second to be useful, so a round trip to the cloud and back isn’t possible. Also, there is a wide array of different hardware platforms and processor architectures for edge devices. To achieve high performance, developers need to spend weeks or months hand-tuning their model for each one. Also, the complex tuning process means that models are rarely updated after they are deployed to the edge. Developers miss out on the opportunity to retrain and improve models based on the data the edge devices collect.

Amazon SageMaker Neo automatically optimizes machine learning models to perform at up to twice the speed with no loss in accuracy. You start with a machine learning model built using MXNet, TensorFlow, PyTorch, or XGBoost and trained using Amazon SageMaker. Then you choose your target hardware platform from Intel, NVIDIA, or ARM. With a single click, SageMaker Neo will then compile the trained model into an executable. The compiler uses a neural network to discover and apply all of the specific performance optimizations that will make your model run most efficiently on the target hardware platform. The model can then be deployed to start making predictions in the cloud or at the edge. Local compute and ML inference capabilities can be brought to the edge with AWS IoT Greengrass. To help make edge deployments easy, AWS IoT Greengrass supports Neo-optimized models so that you can deploy your models directly to the edge with over the air updates.

Neo uses Apache TVM and other partner-provided compilers and kernel libraries. Neo is available as open source code as the Neo-AI project under the Apache Software License, enabling developers to customize the software for different devices and applications.


Run ML models with up to 2x better performance

Amazon SageMaker Neo automatically optimizes TensorFlow, MXNet, PyTorch, and XGBoost machine learning models to perform at up to twice the speed with no loss in accuracy. Using deep learning, SageMaker Neo discovers and applies code optimizations for your specific model and the hardware you intend to deploy the model on. You get the performance benefits of manual tuning without the weeks of effort.

Reduce framework size by 10x

Amazon SageMaker Neo reduces the set of software operations in your model’s framework to only those required for it to make predictions. Typically, this provides a 10x reduction in the amount of memory required by the framework. The model and framework are then compiled into a single executable that can be deployed in production to make fast, low-latency predictions. 

Run the same ML model on multiple hardware platforms

Amazon SageMaker Neo allows you to train your model once and run it virtually anywhere with a single executable. Neo understands how to optimize your model for Intel, NVIDIA, ARM, Cadence, Qualcomm, and Xilinx processor architectures automatically, which makes preparing your model for multiple platforms as easy as a few clicks in the Amazon SageMaker console. 

How it works


Key Features

Use the deep learning framework you prefer

Amazon SageMaker Neo converts the framework-specific functions and operations for TensorFlow, MXNet, and PyTorch into a single compiled executable that can be run anywhere. Neo compiles and generates the required software code automatically.

Easy and Efficient Software Operations

Amazon SageMaker Neo outputs an executable that is deployed on cloud instances and edge devices. The Neo runtime reduces the usage of resources such as storage on the deployment platforms by 10x and eliminates the dependence of frameworks. As an example, the Neo runtime occupies 2.5MB of storage compared to framework dependent deployments that can occupy up to 1GB of storage.

Open Source Software

Neo is available as open source code as the Neo-AI project under the Apache Software License. This enables developers and hardware vendors to customize applications and hardware platforms, and take advantage of Neo’s optimization and reduced resource usage techniques. 

High performance and low cost inference with Inf1

With Amazon SageMaker Neo, you can compile your trained machine learning models to run optimally on Inf1 instances and easily deploy the compiled models on Inf1 instances for real-time inference. Amazon EC2 Inf1 instances, based on the AWS Inferentia chip, deliver high performance and the low cost machine learning inference in the cloud.

Check out Amazon SageMaker Neo features

Refer to the documentation for instructions on how to use Amazon SageMaker Neo.

Sign up for a free account

Instantly get access to the AWS Free Tier. 

Sign up 
Start building in the console

Get started building with Amazon SageMaker Neo in the AWS Management Console.

Sign in