AWS Machine Learning Blog
Detect abnormal equipment behavior and review predictions using Amazon Lookout for Equipment and Amazon A2I
Companies that operate and maintain a broad range of industrial machinery such as generators, compressors, and turbines are constantly working to improve operational efficiency and avoid unplanned downtime due to component failure. They invest heavily in physical sensors (tags), data connectivity, data storage, and data visualization to monitor the condition of their equipment and get real-time alerts for predictive maintenance.
With machine learning (ML), more powerful technologies have become available that can provide data-driven models that learn from an equipment’s historical data. However, implementing such ML solutions is time-consuming and expensive because it involves managing and setting up complex infrastructure and having the right ML skills. Furthermore, ML applications need human oversight to ensure accuracy with sensitive data, help provide continuous improvements, and retrain models with updated predictions. However, you’re often forced to choose between an ML-only or human-only system. Companies are looking for the best of both worlds—integrating ML systems into your workflow while keeping a human eye on the results to achieve higher precision.
In this post, we show you how you can set up Amazon Lookout for Equipment to train an abnormal behavior detection model using a wind turbine dataset for predictive maintenance, use a human in the loop workflow to review the predictions using Amazon Augmented AI (Amazon A2I), and augment the dataset and retrain the model.
Amazon Lookout for Equipment analyzes the data from your sensors, such as pressure, flow rate, RPMs, temperature, and power, to automatically train a specific ML model based on your data, for your equipment, with no ML expertise required. Amazon Lookout for Equipment uses your unique ML model to analyze incoming sensor data in near-real time and accurately identify early warning signs that could lead to machine failures. This means you can detect equipment abnormalities with speed and precision, quickly diagnose issues, take action to reduce expensive downtime, and reduce false alerts.
Amazon A2I is an ML service that makes it easy to build the workflows required for human review. Amazon A2I brings human review to all developers, removing the undifferentiated heavy lifting associated with building human review systems or managing large numbers of human reviewers, whether running on AWS or not.
To get started with Amazon Lookout for Equipment, we create a dataset, ingest data, train a model, and run inference by setting up a scheduler. After going through these steps, we show you how you can quickly set up a human review process using Amazon A2I and retrain your model with augmented or human reviewed datasets.
In the accompanying Jupyter notebook, we walk you through the following steps:
- Create a dataset in Amazon Lookout for Equipment.
- Ingest data into the Amazon Lookout for Equipment dataset.
- Train a model in Amazon Lookout for Equipment.
- Run diagnostics on the trained model.
- Create an inference scheduler in Amazon Lookout for Equipment to send a simulated stream of real-time requests.
- Set up an Amazon A2I private human loop and review the predictions from Amazon Lookout for Equipment.
- Retrain your model based on augmented datasets from Amazon A2I.
The following diagram illustrates our solution architecture.
The workflow contains the following steps:
- The architecture assumes that the inference pipeline is built and sensor data is periodically stored in the S3 path for inference inputs. These inputs are stored in CSV format with corresponding timestamps in the file name.
- Amazon Lookout for Equipment wakes up at a prescribed frequency and processes the most recent file from the inference inputs Amazon Simple Storage Service (Amazon S3) path.
- Inference results are stored in the inference outputs S3 path in JSON lines file format. The outputs also contain event diagnostics, which are used for root cause analysis.
- When Amazon Lookout for Equipment detects an anomaly, the inference input and outputs are presented to the private workforce for validation via Amazon A2I.
- A private workforce investigates and validates the detected anomalies and provides new anomaly labels. These labels are stored in a new S3 path.
- Training data is also updated, along with the corresponding new labels, and is staged for subsequent model retraining.
- After enough new labels are collected, a new Amazon Lookout for Equipment model is created, trained, and deployed. The retraining cycle can be repeated for continuous model retraining.
Before you get started, complete the following steps to set up the Jupyter notebook:
Make sure your SageMaker notebook has the necessary AWS Identity and Access Management (IAM) roles and permissions mentioned in the prerequisite section of the notebook.
- When the notebook is active, choose Open Jupyter.
- On the Jupyter dashboard, choose New, and choose Terminal.
- In the terminal, enter the following code:
- First run the data preparation notebook –
- Then open the notebook for this blog –
You’re now ready to run the following steps through the notebook cells. Run the setup environment step to set up the necessary Python SDKs and libraries that we use throughout the notebook.
- Provide an AWS Region, create an S3 bucket, and provide details of the bucket in the following code cell:
Analyze the dataset and create component metadata
In this section, we walk you through how you can preprocess the existing wind turbine data and ingest it for Amazon Lookout for Equipment. Please make sure to run the data preparation notebook prior to running the accompanying notebook for the blog to follow through all the steps in this post. You need a data schema for using your existing historical data with Amazon Lookout for Equipment. The data schema tells Amazon Lookout for Equipment what the data means. Because a data schema describes the data, its structure mirrors that of the data files of the components it describes.
All components must be described in the data schema. The data for each component is contained in a separate CSV file structured as shown in the data schema.
You store the data for each asset’s component in a separate CSV file using the following folder structure:
Go to the notebook section Pre-process and Load Datasets and run the following cell to inspect the data:
The following screenshot shows our output.
Now we create components map to create a dataset expected by Amazon Lookout for Equipment for ingest. Run the notebook cells under the section Create the Dataset Component Map to create a component map and generate a CSV file for ingest.
Create the Amazon Lookout for Equipment dataset
We use Amazon Lookout for Equipment Create Dataset APIs to create a dataset and provide the component map we created in the previous step as an input. Run the following notebook cell to create a dataset:
You get the following output:
Alternatively, you can go to the Amazon Lookout for Equipment console to view the dataset.
You can choose View under Data schema to view the schema of the dataset. You can choose Ingest new data to start ingesting data through the console, or you can use the APIs shown in the notebook to do the same using Python Boto3 APIs.
Run the notebook cells to ingest the data. When ingestion is complete, you get the following response:
Now that we have preprocessed the data and ingested the data into Amazon Lookout for Equipment, we move on to the training steps. Alternatively, you can choose Ingest data.
Label your dataset using the SageMaker labeling workforce
If you don’t have an existing labeled dataset available to directly use with Amazon Lookout for Equipment, create a custom labeling workflow. This may be relevant in a use case in which, for example, a company wants to build a remote operating facility where alerts from various operations are sent to the central facility for the SMEs to review and update. For a sample crowd HTML template for your labeling UI, refer to our GitHub repository.
The following screenshot shows an example of what the sample labeling UI looks like.
For this post, we use the labels that came with the dataset for training. If you want to use the label file you created for your actual training in the next step, you need to copy the label file to an S3 bucket and provide the location in the training configuration.
Create a model in Amazon Lookout for Equipment
We walk you through the following steps in this section:
- Prepare the model parameters and split data into test and train sets
- Train the model using Amazon Lookout for Equipment APIs
- Get diagnostics for the trained model
Prepare the model parameters and split the data
In this step, we split the datasets into test and train, prepare labels, and start the training using the notebook. Run the notebook code Split train and test to split the dataset into an 80/20 split for training and testing, respectively. Then run the prepare labels code and move on to setting up training config, as shown in the following code:
In the preceding code, we set up model training parameters such as time periods, label data, and target sampling rate for our model. For more information about these parameters, see CreateModel.
After setting these model parameters, you need to run the following train model API to start training your model with your dataset and the training parameters:
You get the following response:
Alternatively, you can go to the Amazon Lookout for Equipment console and monitor the training after you create the model.
The sample turbine dataset we provide in our example has millions of data points. Training takes approximately 2.5 hours.
Evaluate the trained model
After a model is trained, Amazon Lookout for Equipment evaluates its performance and displays the results. It provides an overview of the performance and detailed information about the abnormal equipment behavior events and how well the model performed when detecting those. With the data and failure labels that you provided for training and evaluating the model, Amazon Lookout for Equipment reports how many times the model’s predictions were true positives. It also reports the average forewarning time across all true positives. Additionally, it reports the false positive results generated by the model, along with the duration of the non-event.
For more information about performance evaluation, see Evaluating the output.
Review training diagnostics
Run the following code to generate the training diagnostics. Refer to the accompanying notebook for the complete code block to run for this step.
The results returned show the percentage contribution of each feature towards the abnormal equipment prediction for the corresponding date range.
Create an inference scheduler in Amazon Lookout for Equipment
In this step, we show you how the CreateInferenceScheduler API creates a scheduler and starts it—this starts costing you right away. Scheduling an inference is setting up a continuous real-time inference plan to analyze new measurement data. When setting up the scheduler, you provide an S3 bucket location for the input data, assign it a delimiter between separate entries in the data, set an offset delay if desired, and set the frequency of inference. You must also provide an S3 bucket location for the output data. Run the following notebook section to run inference on the model to create an inference scheduler:
After you create an inference scheduler, the next step is to create some sample datasets for inference.
Prepare the inference data
Run through the notebook steps to prepare the inference data. Let’s load the tags description; this dataset comes with a data description file. From here, we can collect the list of components (subsystem column) if required. We use the tag metadata from the data descriptions as a point of reference for our interpretation. We use the tag names to construct a list that Amazon A2I uses. For more details, refer to the section Set up Amazon A2I to review predictions from Amazon Lookout for Equipment in this post.
To build our sample inference dataset, we extract the last 2 hours of data from the evaluation period of the original time series. Specifically, we create three CSV files containing simulated real-time tags for our turbine 10 minutes apart. These are all stored in Amazon S3 in the
inference-a2i folder. Now that we’ve prepared the data, create the scheduler by running the following code:
You get the following response:
Alternatively, on the Amazon Lookout for Equipment console, go to the Inference schedule settings section of your trained model and set up a scheduler by providing the necessary parameters.
Get inference results
Run through the notebook steps List inference executions to get the run details from the schedule you created in the previous step. Wait 5–15 minutes for the scheduler to run its first inference. When it’s complete, we can use the ListInferenceExecution API for our current inference scheduler. The only mandatory parameter is the scheduler name.
You can also choose a time period for which you want to query inference runs. If you don’t specify it, all runs for an inference scheduler are listed. If you want to specify the time range, you can use the following code:
This code means that the runs after
2010-01-03 00:00:00 and before
2010-01-05 00:00:00 are listed.
You can also choose to query for runs in a particular status, such as
You get the following response:
Get actual prediction results
After each successful inference, a JSON file is created in the output location of your bucket. Each inference creates a new folder with a single
results.jsonl file in it. You can run through this section in the notebook to read these files and display their content.
The following screenshot shows the results.
Stop the inference scheduler
Make sure to stop the inference scheduler; we don’t need it for the rest of the steps in this post. However, as part of your solution, the inference scheduler should be running to ensure real-time inference for your equipment continues. Run through this notebook section to stop the inference scheduler.
Set up Amazon A2I to review predictions from Amazon Lookout for Equipment
Now that inference is complete, let’s understand how to set up a UI to review the inference results and update it, so we can send it back to Amazon Lookout for Equipment for retraining the model. In this section, we show how to use the Amazon A2I custom task type to integrate with Amazon Lookout for Equipment through the walkthrough notebook to set up a human in the loop process. It includes the following steps:
- Create a human task UI
- Create a workflow definition
- Send predictions to Amazon A2I human loops
- Sign in to the worker portal and annotate Amazon Lookout for Equipment inference predictions
Follow the steps provided in the notebook to initialize Amazon A2I APIs. Make sure to set up the bucket name in the initialization block where you want your Amazon A2I output:
You also need to create a private workforce and provide a work team ARN in the initialize step.
On the SageMaker console, create a private workforce. After you create the private workforce, find the workforce ARN and enter the ARN in the notebook:
Create the human task UI
You now create a human task UI resource, giving a UI template in liquid HTML. You can download the provided template and customize it. This template is rendered to the human workers whenever a human loop is required. For over 70 pre-built UIs, see the amazon-a2i-sample-task-uis GitHub repo. We also provide this template in our GitHub repo.
You can use this template to create a task UI either via the console or by running the following code in the notebook:
Create a human review workflow definition
Workflow definitions allow you to specify the following:
- The worker template or human task UI you created in the previous step.
- The workforce that your tasks are sent to. For this post, it’s the private workforce you created in the prerequisite steps.
- The instructions that your workforce receives.
This post uses the
Create Flow Definition API to create a workflow definition. Run the following cell in the notebook:
Send predictions to Amazon A2I human loops
We create an item list from the Pandas DataFrame where we have the Amazon Lookout for Equipement output saved. Run the following notebook cell to create a list of items to send for review:
Run the following code to create a JSON input for the Amazon A2I loop. This contains the lists that are sent as input to the Amazon A2I UI displayed to the human reviewers.
Run the following notebook cell to call the Amazon A2I API to start the human loop:
You can check the status of human loop by running the next cell in the notebook.
Annotate the results via the worker portal
Run the following notebook cell to get a login link to navigate to the private workforce portal:
You’re redirected to the Amazon A2I console. Select the human review job and choose Start working. After you review the changes and make corrections, choose Submit.
You can evaluate the results store in Amazon S3.
Evaluate the results
When the labeling work is complete, your results should be available in the S3 output path specified in the human review workflow definition. The human answers are returned and saved in the JSON file. Run the notebook cell to get the results from Amazon S3:
You get a response with human reviewed answers and flow-definition. Refer to the notebook to get the complete response.
Model retraining based on augmented datasets from Amazon A2I
Now we take the Amazon A2I output, process it, and send it back to Amazon Lookout for Equipment to retrain our model based on the human corrections. Refer to the accompanying notebook for all the steps to complete in this section. Let’s look at the last few entries of our original label file:
The following screenshot shows the labels file.
Update labels with new date ranges
Now let’s update our existing labels dataset with the new labels we received from the Amazon A2I human review process:
You get the following response:
The following screenshot shows the updated labels file.
Let’s upload the updated labels data to a new augmented labels file:
Update the training dataset with new measurements
We now update our original training dataset with the new measurement range based on what we got back from Amazon A2I. Run the following code to load the original dataset to a new DataFrame that we use to append our augmented data. Refer to the accompanying notebook for all the steps required.
The following screenshot shows our original training dataset snapshot.
Now we use the updated training dataset with the simulated inference data we created earlier, in which the human reviewers indicated that they found faulty equipment when running the inference. Run the following code to modify the index of the simulated inference dataset to reflect a 10-minute duration for each reading:
Run the following code to append the simulated inference dataset to the original training dataset:
The simulated inference data with the recent timestamp is appended to the end of the training dataset. Now let’s create a CSV file and copy the data to the training channel in Amazon S3:
Now we update the components map with this augmented dataset, reload the data into Amazon Lookout for Equipment, and retrain this training model with this dataset. Refer to the accompanying notebook for the detailed steps to retrain the model.
In this post, we walked you through how to use Amazon Lookout for Equipment to train a model to detect abnormal equipment behavior with a wind turbine dataset, review diagnostics from the trained model, review the predictions from the model with a human in the loop using Amazon A2I, augment our original training dataset, and retrain our model with the feedback from the human reviews.
With Amazon Lookout for Equipment and Amazon A2I, you can set up a continuous prediction, review, train, and feedback loop to audit predictions and improve the accuracy of your models.
Please let us know what you think of this solution and how it applies to your industrial use case. Check out the GitHub repo for full resources to this post. Visit the webpages to learn more about Amazon Lookout for Equipment and Amazon Augmented AI. We look forward to hearing from you. Happy experimentation!
About the Authors
Dastan Aitzhanov is a Solutions Architect in Applied AI with Amazon Web Services. He specializes in architecting and building scalable cloud-based platforms with an emphasis on machine learning, internet of things, and big data-driven applications. When not working, he enjoys going camping, skiing, and spending time in the great outdoors with his family
Prem Ranga is an Enterprise Solutions Architect based out of Atlanta, GA. He is part of the Machine Learning Technical Field Community and loves working with customers on their ML and AI journey. Prem is passionate about robotics, is an Autonomous Vehicles researcher, and also built the Alexa-controlled Beer Pours in Houston and other locations.
Mona Mona is a Senior AI/ML Specialist Solutions Architect based out of Arlington, VA. She works with public sector customers, and helps them adopt machine learning on a large scale. She is passionate about NLP and ML explainability areas in AI/ML.
Baris Yasin is a Solutions Architect at AWS. He’s passionate about AI/ML & Analytics technologies and helping startup customers solve challenging business and technical problems with AWS.