AWS Architecture Blog

Field Notes: Building an Industrial Data Platform to Gather Insights from Operational Data

Co-authored with Russell de Pina, former Sr. Partner Solutions Architect at AWS

Manufacturers looking to achieve greater operational efficiency need actionable insights from their operational data. Traditionally, operational technology (OT) and enterprise information technology (IT) existed in silos with various manual processes to clean and process data. Leveraging insights at a process level requires converging data across silos and abstracting the right data for the right use case in a scalable fashion.

The AWS Industrial Data Platform (IDP) facilitates the convergence of enterprise and operational data for gathering insights that lead to overall operational efficiency of the plant. In this blog, we demonstrate a sample AWS IDP architecture along with the steps to gather data across multiple data sources on the shop floor.

Overview of solution

With the IDP Reference Architecture as the backdrop, let us walk through the steps to implement this solution. Let us say we have gas compressors being used in our factory. The Operations Team has identified an anomaly in the wear of the bearings used in some of these compressors. Let us assume that the bearings are supplied by two fictitious manufacturers, ABC and XYZ (all company and vendor references in this post are fictitious and do not represent any real company).

The graphs in Figures 1 and 2 show the expected and actual performance data of the bearings with respect to the vibration (actual and expected) against time. The points represent a cluster of plots which are represented as single dots here. The Mean Time Between Maintenance (MTBM) per the manufacturer for these bearings is five years. The factory has detected that for ABC bearings, the actual vibrations detected during its half-life is much more than the normal (expected) as shown. This clearly requires further analysis to identify the root cause for this discrepancy to prevent compressor breakdowns, unplanned downtime and cost.

ABC Bearings Anomoly

Figure 1 – ABC Bearings Anomaly

XYZ Bearings Vibration

Figure 2 – XYZ Bearings Vibration

Although the deviation observed from expected is much less in XYZ bearings, there is still a deviation which needs to be understood. This blog provides an overview of how the various AWS services of the IDP architecture help with this analysis. Figure 3, shows the AWS architecture used for this specific use case.

Figure 3 – A simple architecture illustrating Manufacturing Industrial Data Platform IDP AWS Architecture to solve for the Bearings Anomaly

Tracking Anomaly Data

The sensors on the compressor bearings send the vibration/chatter data to AWS through AWS IoT core. Amazon Lookout for Equipment is configured as detailed out in the user guide with the necessary data formatting guidelines.

The raw sensor data from IoT Core which is in a JSON format is converted into a CSV format with the necessary headers to match the schema (one schema for each sensor) in Figure 7 using AWS Lambda. This CSV conversion is needed for the data to be processed by Amazon Lookout for Equipment. A sample sensor data from the ABC sensor, the AWS Lambda code snippet to convert this JSON to CSV and the output CSV which is to be ingested into Lookout for Equipment are shown in Figure 4,5 and 6 respectively. For detailed steps on how a dataset is created to be ingested into Lookout For Equipment, please refer the user guide.

Figure 4: Sample Sensor Data from ABC Bearings’ Sensor

Figure 4: Sample Sensor Data from ABC Bearings’ Sensor

import boto3
import botocore
import csv
import json

def lambda_handler(event, context):
    BUCKET_NAME = 'L4EData' 
    OUTPUT_KEY = 'csv/ABCSensor.csv' # OUTPUT FILE 
    INPUT_KEY = 'ABCSensor.json'# INPUT FILE 
    s3client = boto3.client('s3')
    s3 = boto3.resource('s3') 
    obj = s3.Object(BUCKET_NAME, INPUT_KEY)
    data = obj.get()['Body'].read().decode('utf-8')
    json_data = json.loads(data)

    output_file_path = "/tmp/data.csv"
    with open(output_file_path, "w") as file:
        csv_file = csv.writer(file)
        csv_file.writerow(['TIMESTAMP', 'Sensor'])
        for item in json_data:
    csv_binary = open(output_file_path, 'rb').read()
        obj = s3.Object(BUCKET_NAME, OUTPUT_KEY)
    except botocore.exceptions.ClientError as e:
        if e.response['Error']['Code'] == "404":
            print("The object does not exist.")
        download_url = s3client.generate_presigned_url(
                              'Bucket': BUCKET_NAME,
                              'Key': OUTPUT_KEY
        return json.dumps(download_url)
    except Exception as e:
        raise utils_exception.ErrorResponse(400, e, Log)

Figure 5 –  AWS Lambda Python Code snippet to Convert CSV to JSON

Figure 6: Output CSV from AWS Lambda with Timestamp and Vibration values from Sensor

Figure 6 – Output CSV from AWS Lambda with Timestamp and Vibration values from Sensor

Figure 7: Data Schema used for Dataset Ingestion in Amazon Lookout for Equipment for ABC Vibration Sensor

Figure 7 – Data Schema used for Dataset Ingestion in Amazon Lookout for Equipment for ABC Vibration Sensor

Once the model is trained the anomaly is detected on bearing ABC as already stated.

Analyze Factory Environment Data

For our analysis, it is important to factor in the factory environment conditions. There are variables like factory humidity, machine temperature, bearing lubrication levels that could attribute to increased vibration. We have all the environment data gathered from factory sensors made available through AWS IoT Core as shown in Figure 3. Since the number of parameters measured through the factory sensors can change over time, the architecture uses a no-SQL database, Amazon DynamoDB, to persist this data for downstream analysis.

Since we only need sensor events that exceed a particular threshold, we set rules under AWS IoT Core to capture those events that could potentially cause increased vibration on the bearings. For example, we just want only those events that exceed a temperature threshold, say anything above 75-degree Fahrenheit along with the timestamp in Amazon DynamoDB. This is beyond the normal operating temperature for the bearings and certainly demands our interest.

So, we set a rule trigger as shown in Figure 8 in the Rule query statement while defining an action under IoT Core. The flow of events from an IoT sensor to Amazon DynamoDB is illustrated in Figure 9.

Figure 8 - Define a Rule in IoT Core

Figure 8 – Define a Rule in IoT Core

The events are received on a minute basis. So, if there are 3 records inserted in Amazon DynamoDB on a day, it means that there has been a temperature threshold breach for a maximum of 3 minutes.

We set up similar rules and actions to DynamoDB for humidity and compressor lubrication levels.

Figure 9: Event from IoT factory sensor to Amazon DynamoDB

Figure 9: Event from IoT factory sensor to Amazon DynamoDB

Factory Data Storage

It is required to factor in both the factory data like machine id, shift number, shop floor and the operator data like the operator name, machine-operator relationships, operator’s shift, etc., for this analysis since we are drilling down to the granular details of which operators where working on the machines which have ABC bearings. Since this data is relational, it is stored in Amazon Aurora. We opt for the serverless option of the Aurora database since the operations team requires a database that is fully managed.

The  data of the vendors who supply ABC and XYZ bearings and their contracts  are stored in Amazon S3. We also have the operators’ performance scores pre-calculated and stored in Amazon S3.

Querying the Data Lake

Now that we have data ingested and stored in various channels – Amazon S3, AWS IoT Core, Amazon DynamoDB and Amazon Aurora, it is required that we collate this data for further analysis and querying. We use Athena Federated Query  under Amazon Athena for querying across these multiple data sources and store the results in Amazon S3 as shown in Figure 10. It is required that we create a workgroup and configure connectors to set up Amazon S3, Amazon DynamoDB and Amazon Aurora as the data sources under Amazon Athena. The detailed steps to create and connect the data sources are provided in the user guide.

Figure 10: Athena Federated Query across Data Sources

Figure 10: Athena Federated Query across Data Sources

We are now interested to gather some insights across the three different data sources. Let us find all those machines and their corresponding operators with performance scores in factory shop floor 1 which have the lubrication levels as demonstrated in Figure 11.

Figure 11: Query to Determine Operators and their Performance

Figure 11: Query to Determine Operators and their Performance

From the output in Table 1, we see that operators on these machines which have the machine temperature levels of above the threshold to be having good scores (above 5, which is the average) based on past performance. Hence, we could conclude for now that they have been operating these machines under the normal expected conditions with the right controls and settings.

Table 1: Operators and their Scores

Table 1: Operators and their Scores

We would then want to find all those machines which have ABC Bearings and the vendors who supplied them as demonstrated in Figure 12.

Figure 12: Vendors Supplying ABC Bearings

Figure 12: Vendors Supplying ABC Bearings

Table 2: Names of Vendors Who Supply ABC Bearings

Table 2: Names of Vendors Who Supply ABC Bearings

As a next step, we would want to reach out to the vendors, Jane Doe Enterprises and Scott Inc.(see Table 2) to report the issue with ABC Bearings. As a next step, we would want to reach out to the vendors, Jane Doe Enterprises and Scott Inc. (refer to Table 2) to report the issue with ABC Bearings (vendor names used here are all fictitious).


In this blog, we covered a sample use case to demonstrate how the AWS Industrial Data Platform can help gather actionable insights from factory floor data to gain operational efficiencies. We also touched on how the various AWS services can be used to build a scalable data platform for seamless querying and processing of data coming in with various formats.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.
Shailaja Suresh

Shailaja Suresh

Shailaja Suresh is a Senior Solutions Architect at Amazon Web Services and provides prescriptive technical guidance to her customers as part of their AWS cloud journey. Her key competencies are product architecture, strategy and delivery. Her engineering skills cover a wide range of technologies and she has played the roles of being a software engineer, architect, lead and project manager. Shailaja strongly believes in empowering teams through mentorship and coaching.