The Internet of Things on AWS – Official Blog

Asset Maintenance with AWS IoT services – Predict and respond to potential failures before they impact your business

AWS IoT customers manage a large number of industrial assets which send sensor data to the cloud. The smooth operation of these assets is critical to the productivity of their plants, since any equipment breakdown can lead to unexpected downtime and require expensive recovery maintenance. The ability to predict such failures and respond to them in a timely manner can help industrial users improve operational efficiency and total uptime. In this post, we describe how AWS IoT users can use AWS IoT Events to deploy an integrated asset maintenance solution using sensor data in combination with machine learning based predictive models to respond to potential failures.

AWS IoT Events for asset maintenance

AWS IoT Events is a fully managed service which makes it easy to detect and respond to events from IoT sensors or applications. Once an event has been detected, AWS IoT Events allows you to trigger actions to other AWS services. For example, you can use AWS IoT Events to combine multiple sources of telemetry data, inferences, and external data to determine whether an asset requires maintenance. Then, you can automatically trigger a notification through Amazon Simple Notification Service (SNS) to inform an operator on the plant floor about the required actions.

Overview of asset maintenance strategies

An effective maintenance program allows industrial companies to reduce the likelihood of equipment failure. To achieve this, a typical solution requires multiple approaches to monitor assets. Each of these approaches can be defined by the number and variety of parameters analyzed, and has a “maintenance maturity level”. Here are 3 common maturity levels that our customers consider:

  • Schedule-based maintenance refers to the maintenance performed at periodic intervals, typically defined by the original equipment manufacturer (OEM). E.g. A pump has exceeded 100 hours of operation.
  • Condition-based maintenance refers to the maintenance performed when predefined metric thresholds are breached. These thresholds can be defined by the operator or by the OEM. E.g. Pumping temperature of the fluid exceeds 200 deg C.
  • Predictive (AI-based) maintenance refers to the maintenance activities that are triggered when there is a high likelihood of failure. The likelihood of failure can be determined using a machine learning algorithm on data coming from the sensors and other 3rd party systems, such as ERP or weather data. E.g. The probability of failure of the pump in the next 24 hours exceeds 50%.

The maintenance maturity level for an asset is based on 3 factors – data availability, accuracy of failure models, and cost of developing and maintaining an asset maintenance solution. A predictive maintenance solution, which requires continuous data gathering and intelligent insights, is ideal for critical assets where timely actions have a significant business impact.


Let us walk through how you can build a unified asset maintenance solution using AWS IoT services for all the maintenance maturity levels described above.

Consider a scenario where a motor is pumping coolant fluid for a critical process. The motor and pump housing can have several sensors to capture metrics such as temperature and pressure at various points, and flow rate. These inputs can be used to predict failure and take action if you can model the likelihood of failure based on the behavior observed before each failure. The solution here is designed to send a push notification to the operator when the pump requires maintenance. This could be:

  • a scheduled maintenance required when the pump has been running for a fixed number of hours
  • an unscheduled maintenance triggered by an abnormally high temperature for a defined period of time
  • an unscheduled maintenance triggered when the pump has a high likelihood of failure, determined by the ML model

There are 4 steps in building a cloud based solution for this pump.

  1. Collect and ingest data into AWS cloud
  2. Build a predictive machine learning model using historical data (cold path data flow)
  3. Predict failure with real time sensor data (hot path data flow)
  4. Identify state of asset and take action

Reference architecture for Predictive Maintenance using AWS IoT services

Step 1: Collect and ingest data into AWS cloud

Data is collected from industrial equipment (in this example, coolant pumps) through sensors. Sensor data can be transmitted from equipment and ingested into your AWS cloud in a number of ways. Some common choices include: (1) Direct ingest to AWS IoT Core, (2) Ingest to AWS IoT Core via AWS IoT Greengrass core running on a qualified gateway device, (3) Ingest via AWS IoT SiteWise Gateway. For this solution, we assume that the data has been ingested via AWS IoT Core.

Note: For more information on ingesting data via AWS IoT Greengrass, read our blog post on Asset Condition Monitoring.

Step 2: Build a machine learning model

After you ingest data, you can send aggregated historical data into a separate pipeline to build and tune your machine learning model using Amazon SageMaker. We assume you have already set up Amazon SageMaker to tune your model and will not cover it here outside of showing where it sits in the reference architecture (cold path data flow). The output of this step is a machine learning model which predicts whether the coolant pump will fail in the next 24 hours for a given set of sensor readings.

Note: To learn more about how to build and deploy a machine learning model, the Predictive Maintenance using Machine Learning solution on the AWS solutions page demonstrates how to use SageMaker with an example dataset.

Step 3: Predict failure in real time

To predict failure on real time data, you need to call the machine learning model that you have built in Step 2. To do this:

a. Through the AWS Lambda console, create an AWS Lambda function which can call the SageMaker endpoint for each message received from the motor (for more details on the Lambda function, refer to the AWS Machine Learning blog post: Call an Amazon SageMaker model endpoint using Amazon API Gateway and AWS Lambda)

b. Create two AWS IoT Events inputs (each input is a stream of data), which we will use later to define an event detector model. Input 1 (InputRawSensorData) will be used for the raw sensor data, whereas Input 2 (InputMLInference) will be used to handle machine learning inferences.

Creating an input in AWS IoT Events

c. Create a rule in AWS IoT Core to trigger 2 actions (to AWS Lambda and AWS IoT Events) when it receives real time messages from the motor. The first action triggers a Lambda function to compute the machine learning inference and pass the inference to AWS IoT Events for predictive maintenance. The second action sends the raw sensor data directly to AWS IoT Events for schedule and condition based maintenance.

You can do this from the AWS IoT Core console by clicking on Act in the left navigation. Then click on Create a rule.

Go to AWS IoT and click on Create a rule (under Act > Rules in the left navigation)

Click on Add action to configure the rule actions in AWS IoT Core to send raw sensor data to the AWS IoT Events detector model

d. Go back to the Lambda function that you have created earlier, and use the BatchPutMessage function to send messages to the AWS IoT Events detector model for machine learning inferences (with the inputName defined in AWS IoT Events). The detector model uses this input along with the sensor data (input in c) to determine the state of the coolant pump.

"messages": [ 
        "inputName": "InputMLInference", 
        "messageId": "string", 
        "payload": json.dumps({'motorID': motorID, 'prediction' : value}, indent = 4)

Now you’re ready to ingest a test message from the coolant pump and run your machine learning model against it to compute the likelihood for pump failure!

Step 4: Identify state of the asset and take action

Now, you can use the two inputs generated from Step 3 in AWS IoT Events to determine the state of your coolant pump. AWS IoT Events provides a drag and drop interface to create a detector model. The detector model defines the states to be monitored, conditions to be evaluated and actions to be triggered for each event.

Create a detector model using the drag and drop canvas on the AWS IoT Events console

You can create “Transition events” to define how your asset moves from one state to another. For example, when the sensors detect that the speed of the pump is greater than 10 rpm (input from sensor TT01), the pump automatically moves from “motor_off” to “motor_on”. Just drag the cursor from one state to another to create a transition, and define the logic on the right pane.

Define a transition event in the AWS IoT Events canvas with simple conditional expressions

When the detector model evaluates a new message from the coolant pump, event actions allow you to specify actions that should be taken in each state. Within each state, you can define event actions when the pump enters the state (OnEnter), receives an input and remains in the same state (OnInput), and when it exists the state (OnExit). In this example, AWS IoT Events sends a message to Amazon Simple Notification Service (SNS) when it enters a maintenance state such as “scheduled_service”. Users who need to be notified about the state change can subscribe to this SNS topic.

Define an action to notify an SNS topic when an event is detected

The screenshot below is a sample detector model for the pump maintenance use case. You can download the detector model as a json object and import it to your own AWS account using the import detector model option in the console. For scheduled maintenance, we have defined a variable to track the total runtime of the pump. When the runtime exceeds 100 hours, the state of the pump automatically changes to “scheduled_service”, and sends a push notification to a technician or a ticketing system via Amazon Simple Notification Service (SNS). For rule-based thresholds, you can use simple expressions using sensor inputs to trigger a state transition. In this example, we change the state of the pump to “high_temperature” when the temperature threshold is breached continuously for 5 minutes or 3 continuous readings (alarm on-delay). You can also define a third maintenance state, “likely_failure_24hrs” when the probability of failure in the next 24 hours exceeds your decision threshold for prediction. When the operator completes the service, you can send another input to the detector model to change the state.

Testing your detector model in AWS IoT Events

Once you have created all inputs and published your detector model, you can send sample messages through the AWS IoT Events console to test the detector model and trigger state transitions and actions. Click on Send data and select Send sample data. Then, choose the input name and enter the attributes you want to send. The first two messages will move the pump to the “motor_off” state. Now you can send sensor data and see the state of the coolant pump change!

Here is the preview of a sample message from the pump indicating that the speed is 80 rpm and temperature is 65 degrees Celsius.

Open the detector model in the AWS IoT Events console and click on Send data to send sample data

The current state of all detectors is displayed in the console. You can test sending another inference input (prediction = 1) to see the state of the pump change to “likely_failure_24hrs”.

Prediction inputs can change the current state of the detector to “likely_failure_24hrs”

AWS IoT Events is ideal for industrial use cases since it allows you to scale instantly when you want to expand this solution to other similar assets. By simply specifying a different key value in the input message, you can create any number of detector instances (or detectors) to detect the state every asset which follows the same transition path.