AWS Machine Learning Blog
Monitor and Manage Anomaly Detection Models on a fleet of Wind Turbines with Amazon SageMaker Edge Manager
September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details.
In industrial IoT, running machine learning (ML) models on edge devices is necessary for many use cases, such as predictive maintenance, quality improvement, real-time monitoring, process optimization, and security. The energy industry, for instance, invests heavily in ML to automate power delivery, monitor consumption, optimize efficiency, and extend the lifetime of their equipment.
Wind energy is one of the most popular renewable energy sources. According to the Global Wind Energy Council, 22,893 wind turbines were installed globally in 2019, produced from 33 suppliers and accounting for over 63 GW of wind power capacity. With such scale, energy companies need an efficient platform to manage and maintain their wind turbine fleets, and the ML models running on the devices. A commercial wind turbine costs around $3–4 million. If a turbine is out of service, it costs $800–1,600 per day and results in a total loss of 7.5 megawatts, which is enough energy to power approximately 2,500 homes.
A wind turbine is a complex piece of engineering and consists of many sensors that can be used by a monitoring mechanism to capture data such as vibration, temperature, wind speed, and air humidity. You could train an ML model with this data, deploy it to an edge device connected to the turbine’s sensors, and predict anomalies in real time at the edge. It would reduce the operational cost of your fleet of turbines. But imagine the effort to maintain this solution on a fleet of thousands or millions of devices. How do you operate, secure, deploy, run, and monitor ML models on a fleet of devices at the edge?
Amazon SageMaker Edge Manager can help you to answer this question. The service allows you to optimize, secure, monitor, and maintain ML models on fleets of smart cameras, robots, personal computers, industrial equipment, mobile devices, and more. With Edge Manager, you can manage the lifecycle of each ML model on each device in your device fleets for up to thousands or millions of devices. The service provides a software agent that runs on edge devices and a management interface on the AWS Management Console.
In this post, we show how to use Edge Manager to create a robust end-to-end solution that manages the lifecycle of ML models deployed to a wind turbine fleet. But instead of using real wind turbines, you learn how to build your own fleet of mini 3D printed wind turbines. This is a DIY open-source, open-hardware project created to demonstrate how to build an ML at the edge solution with Amazon SageMaker. You can use to it as a platform to learn, experiment, and get inspired.
The next sections cover the following topics:
- The specifications of the wind turbine farm
- How to configure each Jetson Nano
- How to build an anomaly detection model using SageMaker
- How to run your own mini wind turbine farm
The wind turbine farm
The wind turbine farm created for this project has five mini 3D printed wind turbines connected to five distinct Jetson Nanos via USB. The Jetson Nanos are connected to the internet through Ethernet cables plugged to a cable modem. A fan, positioned in front of the farm, produces the wind to simulate an outdoor condition. The following image shows how the wind farm is organized.
The mini wind turbine
The mini wind turbine of this project is a mechanical device integrated with a microcontroller (Arduino) and some sensors. It was modeled using FreeCAD, an open-source tool for designing industrial parts. These parts were then 3D printed using PETG (plastic filament type) and assembled with the electronics components. Its base is static, which means that the turbine doesn’t align with the wind direction by itself. This restriction was important to simplify the project.
Each turbine has one voltage generator (small motor) and seven different sensors:
- Vibration (MPU6050: 6 axis accelerometer/gyroscope)
- Infrared rotation encoder (rotations per second)
- Gearbox temperature (MPU6050)
- Ambient temperature (BME680)
- Atmospheric pressure (BME680)
- Air humidity (BME680)
- Air quality (BME680)
An Arduini Mini Pro is responsible for interfacing with these sensors and collecting data from them. This data is streamed through the serial pins (TX, RX). An FTDI device that converts this serial signal to USB is the bridge between the Arduino and the Jetson Nano. A Python application that runs on Jetson Nano receives the raw data from the sensors through this bridge.
A micro servo was modified and transformed into a voltage generator. Its internal gearbox increases the generator (motor) speed by five times to produce a (low) voltage between 0–3.3v. This generator is also connected to the Arduino through an analog input pin. This information is also sent with the sensor’s readings.
The frequency at which the data is collected depends on the sensor. All the signals from BME650 are collected each 150 milliseconds, the rotation encoder each 1 second, and the voltage generator and the vibration sensor each 50 milliseconds.
If you want to know more about these technical details and learn how to build your own mini wind turbine, see the GitHub repository.
The edge device
Each Jetson Nano has a built-in GPU with 128-core NVIDIA Maxwell™ and a Quad-core ARM® A57 CPU running at 1.43 GHz. This hardware is enough to run a Python application that collects and formats the data from the sensors of the turbine and then calls the Edge Manager agent API to get the predictions. This application compares the prediction with a threshold to check for anomalies in the data. The model is invoked in real time.
When SageMaker Neo compiles the ML model for Jetson Nano, a runtime (DLR) optimized for this target device is included in the deployment package. This runtime detects automatically that it’s running on a Jetson Nano and loads the model directly into the device’s GPU for maximum performance.
The Edge Manager agent is also distributed as a Linux (arm64) application that can be run as a background process (daemon) on your Jetson Nano. It uses the runtime SageMaker Neo includes in the compilation package to interface with the optimized model and expose it as a well-defined API. This API is integrated with the local application through a low latency protocol (grpc + unix socket).
The cloud services
Now that you know some details about the physical hardware used to develop the wind turbine farm, it’s time to see which AWS services support the solution on the cloud side. A minimal, standalone setup to get a model deployed and running on the Edge Manager agent requires only SageMaker and nothing more. However, other services were used in this project with two important features: a mechanism for over-the-air (OTA) deployment and a dashboard for monitoring the anomalies in near-real time.
In summary, the components required for this project are:
- A device fleet (Edge Manager), which organizes and controls one or more registered devices through the agent (running on each device)
- One IoT thing per device and IoT thing group, which is used by the OTA mechanism to communicate with the devices via MQTT
- AWS IoT rules, and an AWS Lambda function to get and filter application logs and ingest them into Amazon OpenSearch Service
- A Lambda function to parse the model metrics captured by agent in ingest them into Amazon ES
- An OpenSearch server with Kibana, which has dashboards for monitoring the anomalies (optional)
- SageMaker to build, compile, and package the ML model
The following diagram illustrates this architecture.
Putting everything together
Now that we have all the components of our wind turbine farm, it’s time to understand the steps we need to take to integrate all these moving parts, deploy a model to our edge devices, and keep an application running and predicting anomalies in real time.
The following diagram shows all the steps involved in the process.
The solution consists of the following steps:
- The data scientist explores the dataset and designs an anomaly detection model (autoencoder) with PyTorch, using SageMaker Studio.
- The model is trained with a SageMaker training job.
- With Neo, the model is optimized (compiled) to Jetson Nano.
- Edge Manager creates a deployment package with the compiled model.
- The data scientist creates an IoT job that sends a notification of the new model available to the edge devices.
- The application running on Jetson Nano performs the following:
- Receives this notification and downloads the model package from the Amazon Simple Storage Service (Amazon S3) bucket.
- Unpacks the model and loads it using the Edge Manager agent API (LoadModel).
- Reads the sensors from the wind turbine, prepares the data, invokes the ML model, and captures some model metrics using the Edge Manager agent API.
- Compares the prediction with a baseline to detect potential anomalies.
- Sends the raw sensor data to an AWS IoT topic.
- Through a rule, AWS IoT reads the app logs topic and exports the data to Amazon ES.
- A Lambda function captures the model metrics (mean average error) exported by the agent and ingests the data into Amazon ES.
- The operator uses a Kibana dashboard to check for any anomalies.
Configure your edge device
The Edge Manager agent uses certificates provided by AWS IoT Core to authenticate and call other AWS services. That way you need to create an IoT thing first and then an edge device fleet. But first, you need to prepare some basic resources to support your solution.
Create prerequisite resources
Before getting started, you need to configure AWS Command Line Interface in your workstation first (if necessary) and then to create the following resources:
- An S3 bucket to store the captured data
- An AWS Identity and Access Management (IAM) role for your devices
- An IoT thing to map to your Edge Manager device
- An IoT policy to control the permissions of the temporary credentials of the edge device
- Create a new bucket for the solution.
Each time you call CaptureData in the agent API, it uploads the tensors (input and predictions) into this bucket.
Next, you create your IAM role.
- On the IAM console, create a role named WindTurbineFarm so the devices can access resources in your account.
- Add permissions to this role to upload files to the S3 bucket you created.
- Add the following trusted entities to the role:
iot.amazonaws.com
credentials.iot.amazonaws.com
sagemaker.amazonaws.com
Use the following code (provide the name for the S3 bucket, your AWS account, and Region):
You’re now ready to create your IoT thing, which you later map to your Edge Manager device.
- On the AWS IoT Core console, under Manage, choose Things
- Choose Create.
- Name your device (for this post, edge-device-0).
- Create a new group or choose an existing group (for this post, WindTurbineFarm).
- Create a certificate.
- Download the certificates, including the root CA.
- Activate the certificate.
You now create your policy, which controls the permissions of the temporary credentials of the edge device.
- On the AWS IoT Core console, under Secure, choose Policies.
- Choose Create.
- Name the policy (for this post, WindTurbine).
- Choose Advanced Mode.
- Enter the following policy, providing your AWS account and Region:
- Choose Create.
Lastly, you attach the policy to the certificate.
- On the AWS IoT Core console, under Secure, choose Certificates.
- Select the certificate you created.
- On the Actions menu, choose Attach policy.
- Select the policy WindTurbine.
- Choose Attach.
Now your IoT thing is ready to be linked to an edge device. Repeat these steps (except for creating the policy) for each additional device in your device fleet. For a production environment with hundreds or thousands of devices, you just apply a different approach, using automated scripts and parameter files to provision all the IoT things.
Create the edge fleet
To create your edge fleet, complete the following steps:
- On the SageMaker console, under Edge Inference, choose Edge device fleets.
- Choose Create device fleet.
- Enter a name for the device (for this post,
WindTurbineFarm
). - Enter the ARN of the IAM role you used in the previous steps (
arn:aws:iam::<<AWS_ACCOUNT_ID>>:role/WindTurbineFarm
). - Enter the output S3 bucket URI (
s3://<<NAME_OF_YOUR_BUCKET>>/wind_turbine_data/
). - Choose Submit.
Now you need to add a new device to the fleet.
- On the SageMaker console, under Edge Inference, choose Edge devices.
- Choose Register devices.
- For Device Properties, enter the name of the device fleet you created (
WindTurbineFarm
). - Choose Next.
- For Device name, enter any unique name for your device (for this post, we use the same name as our IoT thing,
edge-device-wind-turbine-00000000000
). - For IoT name, enter the name of the thing you created earlier (
edge-device-0
). - Choose Submit.
Repeat the registering process for all your other devices. Now you can SSH to your Jetson Nano and complete the configuration of your device.
Prepare the edge device
Before you start configuring your Jetson Nano, you need to install JetPack 4.4.1 in your Nano. This is the version you use to build, run, and test this demo.
The model preparation process for your target device is very sensitive in relation to the versions of the libraries installed in your device. For instance, because the target device is a Jetson Nano, Neo optimizes the model and runtime to a given version of the TensorRT and CUDA. The runtime (libdlr.so
) is physically linked to the versions you specify in the compilation job. This means that if you compile your model using Neo for JetPack 4.4.1, it doesn’t work with JetPack 3.x. and vice versa.
- With JetPack 4.4.1 running on your Jetson Nano, you can start configuring your device with the following commands:
- Download the Linux ARMv8 version of the Edge Manager agent.
- Copy the package to your Jetson Nano (
scp
). Create a folder for the agent and unpack the package in your home directory:
- Copy the AWS IoT Core certificates you provisioned for your thing in the previous section to the directory
~/agent/certificates/iot
in your Jetson Nano.
You should see the following files in this directory:
- pem – CA root
- <<CERT_PREFIX>>-public.pem.key – Public key
- <<CERT_PREFIX>>-private.pem.key – Private key
- <<CERT_PREFIX>>-certificate.pem.crt – Certificate
- Get the root certificate used to sign the deployment package created by Edge Manager. The agent uses this to validate the model.
- Copy this certificate to the directory
~/agent/certificates/root
in your Jetson Nano.
Next, you create the Edge Manager agent configuration file.
- Open an empty file named ~/agent/sagemaker_edge_config.json and enter the following code:
Provide the information for the following resources:
- SAGEMAKER_EDGE_DEVICE_NAME – The unique name of your device you defined previously.
- AWS_REGION – The Region where you created your edge device.
- LINUX_USER – The Linux user name you’re using in Jetson Nano.
- CERT_PREFIX – The prefix of the certificate files you created when you provisioned your IoT thing in the previous section.
- CREDENTIALS_ENDPOINT_HOST – Your endpoint host. You can get this endpoint through the AWS Command Line Interface (AWS CLI). (Install the AWS CLI if you don’t have it already). Use credentials of the same account and the same Region you used in the previous sections (this isn’t the IoT thing shadow URL). Then run the following command to retrieve the endpoint host:
- S3_BUCKET – The name of the S3 bucket you used to configure your edge device fleet in the previous section.
- Save the file with all these modifications.
Now you’re ready to run the Edge Manager agent in your Jetson Nano.
- To test the agent, run the following commands:
The following screenshot shows your output.
The agent is now running. After a few minutes, you can see the heartbeat of the device, reported on the console. To see it on the SageMaker console, under Edge Inference, choose Edge Devices and choose your device.
Configure the application
Now it’s time to set up the application that runs on the edge device. This application is responsible for the following:
- Get the temporary credentials using the certificate
- Listen to the OTA update topics to see whether a new model package is ready to deploy
- Deploy the available model package to the edge device
- Load the model to the agent if necessary
- Perform an infinite loop:
- Read the sensor data
- Format the input data
- Invoke the ML model and capture some metrics of the prediction
- Compare the predictions MAE (mean average error) to the baseline
- Publish raw data to an IoT topic (MQTT)
To install the application, first get the custom AWS IoT endpoint. On the AWS IoT Core console, choose Settings. Copy the endpoint and use it in the following code:
The application outputs something like the following screenshot.
Optional: run this application with the parameter –test-mode if you just want to run a test with no wind turbine connected to the edge device.
If everything went fine, the application keeps waiting for a new model. It’s time to train a new model and deploy it to the Jetson Nano.
Train and deploy the ML model
This post demonstrates how to detect anomalies in the components of a wind turbine. There are many ways of doing this with the data collected by its sensors. To keep this example as simple as possible, you prepare a model that analyzes vibration, wind speed, rotation (per second), and the produced voltage to determine whether an anomaly exists or not. For that purpose, we train an autoencoder using PyTorch on SageMaker and prepare it for deployment on your Jetson Nano.
This model architecture has two advantages: it’s unsupervised, so we don’t need to label our data, and you can collect data from wind turbines that are working perfectly. Therefore, your model is trained to detect what you consider normal behavior of your wind turbines. When a defect appears in any part of the turbine, a drift occurs on the sensors data, which the model interprets as abnormal behavior (an anomaly).
The following screenshot is a sample of the raw data captured by the turbine sensors.
The data has the following features:
- nanoId – ID of the edge device that collected the data
- turbineId – ID of the turbine that produced this data
- arduino_timestamp – Timestamp of the Arduino that was operating this turbine
- nanoFreemem: Amount of free memory in bytes
- eventTime – Timestamp of the row
- rps – Rotation of the rotor in rotations per second
- voltage – Voltage produced by the generator in milivolts
- qw, qx, qy, qz – Quaternion angular acceleration
- gx, gy, gz – Gravity acceleration
- ax, ay, az – Linear acceleration
- gearboxtemp – Internal temperature
- ambtemp – External temperature
- humidity – Air humidity
- pressure – Air pressure
- gas – Air quality
- wind_speed_rps – Wind speed in rotations per second
The selected features based on our goals are: qx
,qx
,qy
,qz
(angular acceleration), wind_speed_rps
, rps
, and voltage
. The following image is a sample of the feature qx
. The data produced by the accelerometer is too noisy so we need to clean it first.
The angular velocity (quaternion) is first converted to Euler Angles (roll, pitch, yaw). Then we denoise all the features with Wavelets (PyWavelets), and normalize them. The following screenshot shows the signals after these transformations.
Finally, we apply a sliding window to this resulting dataset (six features) to capture the temporal relationship between neighbor readings and create the input tensor of our ML model. The average interval between two sequential samples is approximately 50 milliseconds. Each time window (of our sliding window) is then converted into a tensor, using the following structure:
- Tensor – 6 features x 10 steps (100 samples) = 6×100
- Step – Group of time steps
- Time step – Group of intervals (time_step=20 = ~5 seconds)
- Interval – Group of samples (interval=5 = ~250 milliseconds)
- Reshaped tensor – 6x10x10
Interval
, time step
and step
are hyperparameters that you can adjust during training. The final result is a stream of data, encoded as a multidimensional tensor (representing a few seconds in the past). The trained autoencoder tries to recreate the input tensor as the output (prediction). By measuring the MAE between the input and output and comparing it with a pre-defined threshold, you can identify potential anomalies.
One important aspect of this approach is that it extracts the linear and non-linear correlations between the features, to better understand the impacts of one feature into another, such as wind speed on the rotation or produced voltage.
Now it’s time to run this experiment.
- First, you need to set up your Studio environment if you don’t have one yet.
- Clone the GitHub repo
https://github.com/aws-samples/amazon-sagemaker-edge-manager-demo
inside a Studio terminal.
The repository contains a folder named 03_Notebooks
with two Jupyter notebooks.
- Follow the instructions in the first notebook to prepare the dataset – Because the accelerator data is a signal, it contains noise, so you run a denoise mechanism to clean the data.
The final dataset has only six features: roll, pitch, yaw (converted from a Quaternion to Euler angles), wind_speed_rps, rps (rotations per second), voltage (produced by the generator).
- Follow the instructions in the second notebook to train, package, and deploy the model:
- Use SageMaker to train your PyTorch autoencoder (CNN based).
- Run a batch prediction to compute MAE and threshold used by the app to determine whether the prediction is an anomaly or not.
- Compile the model to Jetson Nano using Neo.
- Create a deployment package with Edge Manager.
- Create an IoT job that publishes a JSON document to a topic listened to by the application that is running on your Jetson Nano.
The application gets the package, unpacks it, loads the model in the Edge Manager agent, and unblocks the application run.
Both notebooks are very detailed, so follow the steps carefully, after which you’ll have an anomaly detection model to deploy in your Jetson Nano.
Compilation job and model optimization
One of the most important steps of the whole process is the model optimization step in the second notebook. When you compile a model with SageMaker Neo, it not only optimizes the model to improve the prediction performance in the target device, it also converts the original model into an intermediate representation. After this conversion, you don’t need to use the original framework anymore (PyTorch, TensorFlow, MXNet). This representation is then interpreted by a light runtime (DLR), which is packaged with the model by Neo. Both the runtime and optimized model are libraries, compiled as native programs for a specific operational system and architecture. In the case of Jetson Nano, the OS is a Linux distro and the architecture: ARM8 64bits. The runtime in this case uses TensorRT for maximum performance on the Jetson’s GPU.
When you launch a compilation job on Neo, you need to specify some parameters related to the setup of your target device, for instance:
- trt-ver – 7.1.3
- cuda-ver – 10.2
- gpu-code – sm_53
The Jetson Nano’s GPU is a NVIDIA Maxwell, architecture version 53, so the parameter gpu-code is the same for all compilation jobs. However, trt-ver and cuda-ver depend of the version of the TensorRT and CUDA installed on your Nano. When you were preparing your edge device, you set up your Jetson Nano with JetPack 4.4.1. This makes sure that the model you optimize using Neo is compatible with your Jetson Nano.
Visualize the results
The dashboard setup is out of scope for this post. For more information, see Analyze device-generated data with AWS IoT and Amazon Elasticsearch Service.
Now that you have your model deployed and running on your Jetson Nano, it’s time to look at the behavior of your wind turbines through a dashboard. The application you deployed to the Jetson Nano collects some logs and sends them to two different places:
- The IoT MQTT topic
wind-turbine/logs/<<iot_thing_name>>
contains the app logs and raw data collected from the wind turbine sensors - The S3 bucket
s3://<<S3_BUCKET>>/wind_turbine_data
contains the metrics of the ML model
You can get this data and ingest it into Amazon ES or another database. Then you can use your preferred reporting to prepare dashboards.
The following visualization shows three different but correlated things for each one of the five turbines: the rotation speed (in RPS), the produced voltage, and the detected anomalies for voltage, rotation, and vibration.
Some noise was injected in the raw data from the turbines to simulate failures.
The following visualization shows an aggregation of the turbines’ speed and produced voltage anomalies over time.
Conclusion
Securely and reliably maintaining the lifecycle of an ML model deployed across a fleet of devices isn’t an easy task. However, with Edge Manager, you can reduce the implementation effort and operational cost of such a solution. Also, with a demo like the mini wind turbine farm, you can experiment, optimize, and automate your ML pipeline with the services and expertise provided by AWS.
To build a solution for your own needs, get the code and artifacts used in this project from the GitHub repo. If you want more practice using Edge Manager, check out the end-to-end workshop for Edge Manager on Studio.
About the Author
Samir Araújo is an AI/ML Solutions Architect at AWS. He helps customers creating AI/ML solutions which solve their business challenges using AWS. He has been working on several AI/ML projects related to computer vision, natural language processing, forecasting, ML at the edge, and more. He likes playing with hardware and automation projects in his free time, and he has a particular interest for robotics.