Monitor edge application performance using AWS IoT Greengrass and AWS Distro for OpenTelemetry

Introduction

As edge computing continues to gain traction in various industries, organizations are often looking for ways to efficiently analyze, optimize, and troubleshoot edge applications for better performance and reliability. In this blog post, we will demonstrate how to use AWS IoT Greengrass and AWS Distro for OpenTelemetry to implement tracing and observability methods that provide granular visibility into edge application health, performance, and root cause analysis. By seamlessly connecting the observability strategy for cloud-based applications with those running at the edge, organizations can gain end-to-end visibility and insights for improved application performance on Amazon Managed Grafana.

Machine Learning (ML) at the edge is one of the key application domains that requires exceptional visibility and control to ensure optimal performance and reliability of applications in often resource-constrained environments. AWS and NXP® Semiconductors, a leading provider of edge computing solutions, have joined forces to offer a comprehensive cloud-to-edge machine learning offering that combines the power of AWS IoT Greengrass, Amazon Sagemaker, and NXP applications processors. This collaboration is aimed at addressing multiple challenges, such as the edge application performance monitoring and analysis.

Ali Osman Ors, Global Director of AI/ML Strategy and Technologies at NXP, highlights the importance of this engagement:

“By working together with AWS, we are able to provide our customers with a holistic solution for edge computing and machine learning that complements the device level enablement offered with NXP’s eIQ® ML SW development environment and empowers them to monitor and optimize their edge applications more effectively than ever before.”

Through this joint effort, organizations can better understand their edge applications, identify bottlenecks, and streamline performance management.

Monitoring and observability solution on the edge

AWS helps in making observability on the edge easy to configure and well-integrated with the workloads running on the cloud. As part of AWS’s observability offerings, AWS Distro for OpenTelemetry seamlessly collects and exports metrics and traces to AWS monitoring services. Distro for OpenTelemetry Collector is an agent that runs on your application environment. When it’s integrated with AWS IoT Greengrass, this combination extends your observability capabilities to both edge and cloud applications at scale, providing consistent and seamless tracing across your application infrastructure. This integrated approach delivers real-time visibility into application performance, enabling your team to make swift, data-driven decisions to enhance customer experience and drive business growth.

The following diagram illustrates the solution architecture.

Figure 1. Monitoring and observability solution on the edge architecture

This blog post will walk you through the steps of implementing an edge observability solution on AWS IoT Greengrass running edge devices from scratch. Also, you’ll observe some additional use cases in the follow up sections.

Below is an overview of the implemented architecture:

AWS IoT Greengrass acts as the edge software runtime that supports local execution of your applications, enabling them to interact with cloud-based services.
AWS Distro for OpenTelemetry is integrated with AWS IoT Greengrass to collect and export telemetry data from your edge applications, along with the applications running in the cloud, creating a unified view across your infrastructure.
AWS X-Ray offers insights into application behavior, while Amazon CloudWatch offer reliable, scalable, and flexible monitoring for your application stack.
Amazon Managed Grafana makes it easy to understand the performance of your application by visualizing the trace and performance data.

Prerequisites

AWS account.
A development environment/computer with AWS CLI and git installed.
A computer with the latest browser.
Ability to create a new IAM user or role for AWS IoT Greengrass minimal IAM policy.
A running AWS IoT Greengrass device or instance. If you don’t have, you can run it on your development environment. Follow Tutorial: Getting started with AWS IoT Greengrass V2 for step-by-step instructions.

Walkthrough

1. Prepare and publish AWS IoT Greengrass components

In this step, use the AWS IoT Greengrass Development Kit Command-Line Interface (GDK CLI) to create, build, and publish custom components. Refer to the documentation for GDK CLI installation. Clone the git repository that contains custom Greengrass component recipes and a sample application to your development environment:

git clone https://github.com/aws-samples/aws-iot-greengrass-observability-at-edge

a. Create AWS Distro for OpenTelemetry Collector Greengrass component

Go to com.custom.aws_otel_collector folder and check the component recipe file, recipe.yaml. For the sake of simplicity, we’re building the component recipe for AMD64 Amazon Linux 2 platform only. You can check other pre-built platforms on the Distro for OpenTelemetry Collector’s releases page. For the platforms that aren’t available as pre-built; you need to compile it for the target e.g. armv7/armv8 for NXP edge targets using the instructions provided in the git repository.

Run the following CLI command to publish the custom Greengrass component. Ensure you update the region parameter to match your Greengrass device’s region.

$ cd com.custom.aws_otel_collector
$ gdk component build && gdk component publish –region us-east-1

...

[2023-08-29 20:11:38] INFO - Created private version '1.0.0' of the component 'com.custom.aws_otel_collector' in the account.

Congrats! Now you have the Distro for OpenTelemetry Collector component published in your AWS account, you can use it for your Greengrass devices with defined platforms.

b. Create Application Greengrass component

This component represents your custom application logic running on the edge. It can be an ML inference component that runs models with data read from a sensor or a camera, or an analytics application that processes data gathered at the edge. It is the component that you want to monitor, troubleshoot, and analyze root causes of it when any issue occurs at the edge. For this demo, the sample application will divide 5 by a random number between 0-5 to occasionally triggers divide-by-zero math error to represent an issue on your application logic.

Go to com.custom.edge_application folder and check the main.py file. This is the main application code with the Distro for OpenTelemetry Collector integrated functions. As you may notice, functions are enclosed within with tracer.start_as_current_span() statements. This allows for measuring traces and sending them to the Distro for OpenTelemetry Collector running on the edge device, then eventually to be sent to the AWS X-Ray service using AWS APIs.
Check the component recipe file recipe.yaml in the com.custom.edge_application folder. For simplicity in the example, we install SDKs and dependency packages at the install phase of the component. However, it is recommended to keep dependencies as a separate component and reuse it as a component dependency with multiple components as needed.

Run the following CLI command to publish the custom Greengrass component. Ensure you update the region parameter to match your Greengrass device’s region.

$ cd com.custom.edge_application
$ gdk component build && gdk component publish –region us-east-1

...

[2023-08-29 20:11:38] INFO - Created private version '1.0.0' of the component 'com.custom.edge_application' in the account.

Congrats, now your application component is published! You can check both components on AWS IoT > Greengrass > Components.

2. Run components on AWS IoT Greengrass and collect trace data

Your Greengrass components are ready to deploy now. Before the deployment, set required permissions for your Greengrass device to download component artifacts from Amazon S3, and allow the Distro for OpenTelemetry Collector to interact with AWS X-Ray service.

Follow the steps described in Allow access to S3 buckets for component artifacts section in the official documentation to create an AWS Identity and Access Management (IAM) policy.
Follow the steps described in the AWS Distro for Open Telemetry Collector Documentation > Configuring Permissions to create an IAM policy.
Navigate to AWS IAM and attach the created policy to your Greengrass device’s IAM role, which is linked with an AWS IoT role alias. Check the official documentation for more info.

Next, navigate to AWS IoT > Greengrass > Deployments to revise your Greengrass device’s deployment to deploy the two newly created components.

3. Evaluate results and monitor applications on cloud

Now, you will use an Amazon Managed Grafana workspace to analyze observability data pushed by your Greengrass device. Follow the documentation Getting started with Amazon Managed Grafana, and enable AWS X-Ray data source during setup.

Once your Amazon Managed Grafana instance with AWS X-Ray data source is ready, navigate to your Amazon Managed Grafana workspace URL and create a new dashboard. You can use X-Ray data source with “Service Map”, “Trace List”, “Trace Statistics” query types, and “Node Graph”, “Traces” visualization types to build an observability dashboard. Check Creating dashboards page on Amazon Managed Grafana documentation for more info.

Figure 2. AWS X-Ray data source options on an Amazon Managed Grafana dashboard

A dashboard for this demo application can show a service map to understand the error rates in application nodes, overall error rates, and list of traces to investigate each single measurement. Constructing a dashboard for your solution is an iterative and exploratory process. Optimal configurations for your application can be discovered through exploration of various alternatives using Amazon Managed Grafana.

Figure 3. Amazon Managed Grafana dashboard showing edge application metrics

After creating a “Traces List” panel, select one of the traces using the “id” and analyze the details of the trace as following.

Figure 4. AWS X-Ray trace view on Amazon Managed Grafana dashboard

You can find out that this trace failed, because of the operation “inference” is failed. It collected the duration and related context, along with the “stack trace” data, which enables you to troubleshoot issues happened on edge device’s runtime.

Computer vision (CV) at the edge use case

Computer vision at the edge is a good example for integrating in-depth observability. AWS, NXP, and Toradex, an AWS APN partner, collaboratively built a CV at the edge demonstrator for the Embedded World exhibition and are continuously improving it.

In the demonstrator, the Au-Zone Technologies MAIVIN camera is running AWS IoT Greengrass. It’s based on the NXP i.MX 8M Plus applications processor SOM from Toradex. The embedded Linux Operating System, Bootloader and Drivers are provided by Toradex Torizon Platform, including frequent updates to the OS System. The MAIVIN camera performs ML inference on a video stream to generate insights. The Greengrass components enable two data paths. The first data path transmits the ML-enabled edge application outputs to AWS IoT Core for downstream applications. The second data path transmits observability metrics to AWS X-Ray. These metrics help embedded developers and ML data scientists in evaluating the ML model’s performance and behavior at the edge.

Below is the diagram illustrating the demonstration’s dual data paths from a fleet of MAIVIN devices to AWS cloud.

Figure 5. Data paths from MAIVIN cameras to AWS cloud.

This demo showcases how AWS and AWS partners help organizations build a comprehensive ML-enabled solutions at edge with observability support. Watch the demonstration video where Toradex showcases the solution.

The following dashboard is built on Amazon Managed Grafana and provides a central view for CV at the edge application performance with each edge component’s health.

Figure 6. Computer vision at the edge application metrics on Amazon Managed Grafana dashboard

ML Inference Time: Provides the time the edge device’s Neural Processing Unit takes for ML inference.
Inference stats, Video Source Stats, and AWS IoT Core Publish Stats: Display error rates for each Greengrass component, helping in pinpointing performance metrics.
Service Map: Gives an integrated perspective of the edge application, simplifying error correlation and troubleshooting.

In addition to the main dashboard, analyzing a single trace helps teams to pinpoint issues on specific edge devices, facilitating root cause identification for performance issues or errors.

Figure 7. AWS X-Ray trace view showing computer vision at the edge application metrics on Amazon Managed Grafana dashboard

Cleaning up

Uninstall the AWS IoT Greengrass Core software on your development environment.
Delete the Amazon Managed Grafana workspace.

Conclusion

In this post, we demonstrated how to monitor and optimize edge application performance by enabling the power of AWS IoT Greengrass, AWS Distro for OpenTelemetry and Amazon Managed Grafana. This blog post focused on the distributed tracing aspect, you can check Monitoring your IoT fleet using CloudWatch blog post to learn about other monitoring aspects for IoT. You can also visit repost.aws Internet of Things channel to discuss your observability implementation with the AWS IoT community and share ideas. To learn more about edge computing and observability, visit AWS IoT Greengrass developer guide, and One Observability workshop.

About the authors

Emir Ayar is a Senior Tech Lead Solutions Architect with the AWS Prototyping team. He specializes in assisting customers with building IoT, ML at the Edge, and Industry 4.0 solutions, an in implementing architectural best practices. He supports customers in experimenting with solution architectures to achieve their business objectives, emphasizing agile innovation and prototyping. He lives in Luxembourg and enjoys playing synthesizers.

David Walters is a Senior Partner Solutions Architect at Amazon Web Services. For the past 4 years, David’s focus is on building innovative IoT solutions with partners in the OEM space that can scale to millions of devices. He is also an Arm Ambassador and evangelizes how cloud and embedded computing can solve problems better together on Arm silicon.

Paul Devillers is a Prototype Architect on the AWS Prototyping team. He focuses on building innovative AI/ML, IoT and Data analytics solution for French customers.

Imaya Kumar Jagannathan is a Principal Solution Architect focused on AWS Observability tools including Amazon CloudWatch, AWS X-Ray, Amazon Managed Service for Prometheus, Amazon Managed Grafana and AWS Distro for Open Telemetry. He is passionate about monitoring and observability and has a strong application development and architecture background. He likes working on distributed systems and is excited to talk about microservice architecture design. He loves programming on C#, working with containers and serverless technologies.

The Internet of Things on AWS – Official Blog