AWS for Industries

Future-proof Your AI at the Edge with AWS

In the rapidly evolving field of IoT in manufacturing and transportation domains, machine learning features create significant value. By utilizing machine learning at the edge, manufacturers can gather insights faster, identify trends, identify patterns, and detect anomalies, all resulting in enhanced security and safety and cost savings.

Amazon SageMaker is a fully managed service that provides developers and data scientists with the ability to build, train, model, and deploy ML models for any use case with fully managed infrastructure, tools, and workflows quickly at scale. In this blog, written with our Premier AWS Partner, Ness Digital Engineering, we present the various options to compile, package, deploy, and run machine learning models across a fleet of devices at the edge.

The value of Amazon SageMaker

Within Amazon SageMaker, to manage resources and optimize inference performance when deploying the ML models, Amazon SageMaker Edge Manager is a preferred way to manage models on an edge device, while ONNX runtime can be used to optimize models for inference. Amazon SageMaker Neo, a capability of Amazon SageMaker that enables ML models to train and run on the edge, can also be employed to optimize ML models for inference on SageMaker in the cloud and supported devices at the edge.

While Amazon SageMaker Edge Manager lets you prepare custom models for edge devices, include runtime for running ML inferences efficiently and enables the devices to securely send samples of data for relabeling and retraining, alternative approaches include DLR (Deep Learning Runtime) that can be used to run model compiled by SageMaker Neo. For a cross-platform edge runtime, use ONNX, an open-source ML model accelerator, that integrates into your SageMaker workflows as an automated step for your edge deployments.

Option 1. Amazon SageMaker Edge Manager Agent Service

With the availability of low power edge hardware for ML and the ability to allow predictions in real time, reduce costs, and preserve end-user privacy, many ML use cases are built to run ML models on edge devices. However, this is still challenging due to limited compute, memory, and connectivity limitations. Therefore, there is a need for an edge manager to optimize, run and update ML models across the fleet of devices at the edge.

Amazon SageMaker Edge Manager is designed to facilitate model management on edge devices, enabling optimization, security, monitoring, and maintenance of machine learning models across fleets of devices. The Edge Manager Agent is an inference engine for edge devices to make predictions with models loaded onto the edge device.

The Edge Manager Agent operates within the edge device and offers the following functionalities:

  • Loading Neo Compiled Models: It allows the deployment of Neo compiled models onto the edge device, ensuring compatibility and efficient execution.
  • Inference Execution: The Edge Manager Agent utilizes the loaded models to perform inference on the edge device, generating results based on the provided inputs.
  • Data Collection: It automatically captures the inputs and outputs of inference and periodically compiles them into a ZIP package. This package is then sent to an Amazon Simple Storage Service (Amazon S3) bucket, facilitating further model retraining and analysis.

Amazon SageMaker Edge Manager Agent

Figure 1 – Reference architecture for AI at the Edge with Amazon SageMaker Edge Manager Agent

The reference architecture above shows the complete end-to-end implementation of the solution. A step-by-step view of the reference architecture is presented below.

  • Step 1: Data Processing component collects telemetry data from sensors. From the data, ML inference model input is created and sent to the SageMaker Edge Manager Agent component with a request for inference on a specific model and version.
  • Step 2: The SageMaker Edge Manager Agent loads the model from the local model folder. The model is then used to create inference result (output) from our input. The input and output are stored locally, and the result of inference is returned to the Data Processing component.
  • Step 3: Data Processing component sends small, aggregated telemetry data and inference results into AWS IoT Core for further processing.
  • Step 4: Once in a configured period, the SageMaker Edge Manager Agent component sends inputs and outputs into the Amazon S3 bucket, so data can be later used.
  • Step 5: On the AWS Cloud, SageMaker uses inference inputs and outputs from Amazon S3 buckets o retrain the model by configured conditions. Trained and approved model is compiled with the help of Amazon SageMaker Neo for a specific edge device and saved to the Amazon S3 model bucket.
  • Step 6: With the new model version, AWS IoT Greengrass deployment can be created, and the new model version is deployed to the Edge device.
  • Step 7: The SageMaker Edge Manager Agent uses a new version of the model to serve Data Processing components with inference results.

Option 2. Amazon SageMaker Neo Job Compilation

Amazon SageMaker Neo Job Compilation is a built-in feature of Amazon SageMaker that enables the compilation of models from various machine learning frameworks to support different types of devices. With SageMaker Neo, we can compile models developed using TensorFlow, Keras, MXNet, ONNX, and more, targeting different processor architectures.

a. Using Amazon SageMaker Neo with DLR

Figure 2– Reference architecture for AI at the Edge with AWS SageMaker Neo and DLR

The reference architecture above shows the solution implemented using AWS SageMaker Neo with DLR, a compact runtime for deep learning models and decision tree models compiled by AWS SageMaker Neo, Apache TVM, etc. A step-by-step view of the reference architecture is presented below:

  • Step 1: With the help of Jupyter notebook, you will create a SageMaker Neo Compilation Job for an existing model, that compiles it for a specific device.
  • Step 2: After the compiled model is created; you will create model component in AWS IoT Core.
  • Step 3: To deploy a model component into the Edge device, you will deploy AWS IoT Greengrass, which is also created using Jupyter notebook.
  • Step 4: The final part is the example of how to use DLR runtime to load and use the SageMaker Neo compiled model for inference in a Python application.

The following example demonstrates how you can create a SageMaker Neo compilation job with the help of Python, specifically within a Jupyter notebook. When you use Jupyter notebook for creating resources in AWS, it is easy to modify it, and reuse the used source code in different DevOps pipelines to automate the process.

Get additional specifications and documentation here.

import boto3
import time

# Initialize sagemaker clients
sagemaker_client = boto3.client('sagemaker')

# Download the model artifacts
compilation_job_name = 'Sagemaker-Edge-Rpi-64-Comp-Job-123456'

job_response = sagemaker_client.create_compilation_job(
        RoleArn='<ARN of role for compilation job>',
            'S3Uri': 's3://<S3 model artifact path>',
            'DataInputConfig': '{"input_1" : [1, 128, 1]}',
            'Framework': 'TENSORFLOW', 
            'FrameworkVersion': '1.15'
            'S3OutputLocation': 's3://<S3 bucket folder to output the compiled model>',
            'TargetPlatform': { 'Os': 'LINUX', 'Arch': 'ARM64' },
            'MaxRuntimeInSeconds': 900


# Poll every 30 sec
while True:
    job_status_response = sagemak-er_client.describe_compilation_job(CompilationJobName=compilation_job_name)
    if job_status_response['CompilationJobStatus'] == 'COMPLETED':
    elif job_status_response['CompilationJobStatus'] == 'FAILED':
        raise RuntimeError('Compilation failed')
    print('Compiling ...')

The output generated by our compilation job is a file called “model-LINUX_ARM64.tar.gz.” The file structure consists of the following:

└── compiler.tar
└── <model files>

To ensure access to these files from an AWS IoT Greengrass Edge Device, the recommended approach is to create a AWS IoT Greengrass component. By deploying the component to the device, AWS IoT Greengrass takes advantage of its component versioning feature, which handles the deployment of new model versions in the future. We will utilize AWS IoT Greengrass Deployment, the standard method for deploying components in AWS IoT Greengrass, to deploy the component to the AWS IoT Greengrass Edge device.

import json
import boto3
# Variables to use
component_name = 'ml-model-component'
component_version = '1.0.0'
target_group_arn = '<ARN of Greengrass group to deploy model to>'
# Initialize clients
s3_client = boto3.client('s3')
gg_client = boto3.client('greengrassv2')
# Upload the compiled model to Greengrass component S3 folder
s3_client.copy_object(Bucket = greengrass_component_bucket_name, 
                      CopySource = f"{greengrass_component_bucket_name}/models_compiled/model-LINUX_ARM64.tar.gz", 
                      Key = f'{component_name}/{component_version}.zip')
# Load the Greengrass component recipe
recipe = '<recipe file content>'
# Create the component
# Load the deployment definition
deployment_json = '<deployment definition file content>'
deployment = json.loads(deployment_json) 
# Create the deployment
gg_client.create_deployment(targetArn = target_group_arn, 
                            deploymentName = 'ml-model-deployment-1', 
                            components = deployment['components'], 
                            deploymentPolicies = deployment['deploymentPolicies'], 
                            iotJobConfiguration = deployment['iotJobConfiguration'])

Once we have the compiled model ready, the next step is to utilize it on the Edge Device to perform machine learning inference. Since the Amazon SageMaker Edge Manager component is no longer supported, an alternative option is to leverage the open-source runtime called DLR, developed by the NEO-AI team. DLR can be used in Python or C++ code, allowing us to implement similar functionality to Edge Manager.

Following code will use DLR module to run inference inside of Python AWS IoT Greengrass component at the Edge Device:

import dlr
import numpy as np
# Model name and version
component_name = 'ml-model-component'
component_version = '1.0.0'
# Input sample array for inference model with shape [1, 128, 1]
data = [1..128]
# Load model   
model = dlr.DLRModel(f'../../../../artifacts/{component_name}/{component_version}', 'cpu')
# Convert data to numpy array
np_data = np.array(data)
# Expand dimensions for data to have correct shape
inference_input = np.expand_dims(np_data, axis=0).copy()
# Invoke the model
inference_output =
# Get array from resulting shape
prediction = inference_output[0].copy()
prediction = prediction[0].copy()
# Result of inference
print(f'Result of inference: {prediction}')

The usage of DLR is straightforward and user-friendly. However, to fully replace the functionality of SageMaker Edge Manager, we need to implement additional functionalities tailored to our specific use case:

  • Collecting and sending inference data: We need to develop code for collecting inputs and outputs of inference and sending them to an S3 bucket for further analysis or model retraining.
  • Separate inference logic: It is beneficial to create a separate AWS IoT Greengrass component specifically for handling the inference logic. This helps to keep the business logic of the application separate from the inference-related functionality and creates better separation of component failures.
  • AWS IoT Greengrass IPC integration: To consume and produce inference inputs and outputs efficiently, we can leverage AWS IoT Greengrass IPC (Inter-Process Communication) capabilities.

b. Using ONNX to export models and inferencing

Figure 3: Reference architecture for AI at the Edge with ONNX runtime

Alternatively, ONNX runtime can be considered for inference purposes. The ONNX runtime is a versatile machine learning model accelerator that supports various frameworks such as PyTorch, TensorFlow/Keras, TFLite, and Scikit-Learn . To utilize ONNX runtime, the following steps can be followed:

  • Step 1: Model Creation – Develop your model using either TensorFlow or PyTorch.
  • Step 2: Export as ONNX Model – Export the trained model into an ONNX model format.
  • Step 3: Build AWS IoT Greengrass Components – Follow a similar process as DLR to build AWS IoT Greengrass components for incorporating ONNX runtime.
  • Step 4: Inference using ONNX Runtime: Utilize the ONNX runtime along with the exported ONNX model for performing inference.

An excellent resource that showcases a comprehensive example application incorporating these steps is the AWS-created sample, which provides a detailed description and implementation guidelines.


As IoT in manufacturing and transportation evolves, integrating machine learning into IoT is valuable. In this blog, we presented the various options to compile, package, deploy, and run machine learning models across a fleet of devices at the edge.

SageMaker Edge Manager is a preferred way to manage models on edge devices, while ONNX runtime can be used to optimize models for inference.

With EOL for Amazon SageMaker Edge Manager, DLR runtime can also be used to run models compiled by SageMaker Neo. For a cross-platform edge runtime, ONNX can be integrated into your SageMaker workflows as an automated step for your edge deployments.

Embracing these Amazon SageMaker options ensures ongoing edge AI advancement and opportunity to overcome limitations.

Krishna Doddapaneni

Krishna Doddapaneni

Krishna is an IoT Specialist Partner Solutions Architect with AWS, essentially helping partners and customers build crazy and innovative IoT products and solutions on AWS. Krishna has a Ph.D. in Wireless Sensor Networks and a Postdoc in Robotic Sensor Networks. He is passionate about ‘connected’ solutions, technologies, security and their services.

Gabriel Novak

Gabriel Novak

Gabriel Novak is Principal Software Engineer/Architect and IoT specialist at Manufacturing and Transportation Centre of Excellence at Ness Digital Engineering. As a member of CoE, he uses his IoT expertise to help Ness teams in designing IoT solutions in initial stages of the project implementation. Part of this desire to help teams in implementation of IoT solutions is also producing Accelerators, that can be reused for more complex IoT systems incorporating Machine Learning and MLOps capabilities. In this role, he worked for Vix Technology and CAT Marine teams.

Maysara Hamdan

Maysara Hamdan

Maysara Hamdan is a Partner Solutions Architect based in Atlanta, Georgia. Maysara has over 15 years of experience in building and architecting Software Applications and IoT Connected Products in Telecom and Automotive Industries. In AWS, Maysara helps partners in building their cloud practices and growing their businesses. Maysara is passionate about new technologies and is always looking for ways to help partners innovate and grow.