AWS Architecture Blog
Field Notes: Building an Industrial Data Platform to Gather Insights from Operational Data
Co-authored with Russell de Pina, former Sr. Partner Solutions Architect at AWS
Manufacturers looking to achieve greater operational efficiency need actionable insights from their operational data. Traditionally, operational technology (OT) and enterprise information technology (IT) existed in silos with various manual processes to clean and process data. Leveraging insights at a process level requires converging data across silos and abstracting the right data for the right use case in a scalable fashion.
The AWS Industrial Data Platform (IDP) facilitates the convergence of enterprise and operational data for gathering insights that lead to overall operational efficiency of the plant. In this blog, we demonstrate a sample AWS IDP architecture along with the steps to gather data across multiple data sources on the shop floor.
Overview of solution
With the IDP Reference Architecture as the backdrop, let us walk through the steps to implement this solution. Let us say we have gas compressors being used in our factory. The Operations Team has identified an anomaly in the wear of the bearings used in some of these compressors. Let us assume that the bearings are supplied by two fictitious manufacturers, ABC and XYZ (all company and vendor references in this post are fictitious and do not represent any real company).
The graphs in Figures 1 and 2 show the expected and actual performance data of the bearings with respect to the vibration (actual and expected) against time. The points represent a cluster of plots which are represented as single dots here. The Mean Time Between Maintenance (MTBM) per the manufacturer for these bearings is five years. The factory has detected that for ABC bearings, the actual vibrations detected during its half-life is much more than the normal (expected) as shown. This clearly requires further analysis to identify the root cause for this discrepancy to prevent compressor breakdowns, unplanned downtime and cost.
Although the deviation observed from expected is much less in XYZ bearings, there is still a deviation which needs to be understood. This blog provides an overview of how the various AWS services of the IDP architecture help with this analysis. Figure 3, shows the AWS architecture used for this specific use case.
Figure 3 – A simple architecture illustrating Manufacturing Industrial Data Platform IDP AWS Architecture to solve for the Bearings Anomaly
Tracking Anomaly Data
The sensors on the compressor bearings send the vibration/chatter data to AWS through AWS IoT core. Amazon Lookout for Equipment is configured as detailed out in the user guide with the necessary data formatting guidelines.
The raw sensor data from IoT Core which is in a JSON format is converted into a CSV format with the necessary headers to match the schema (one schema for each sensor) in Figure 7 using AWS Lambda. This CSV conversion is needed for the data to be processed by Amazon Lookout for Equipment. A sample sensor data from the ABC sensor, the AWS Lambda code snippet to convert this JSON to CSV and the output CSV which is to be ingested into Lookout for Equipment are shown in Figure 4,5 and 6 respectively. For detailed steps on how a dataset is created to be ingested into Lookout For Equipment, please refer the user guide.
import boto3
import botocore
import csv
import json
def lambda_handler(event, context):
BUCKET_NAME = 'L4EData'
OUTPUT_KEY = 'csv/ABCSensor.csv' # OUTPUT FILE
INPUT_KEY = 'ABCSensor.json'# INPUT FILE
s3client = boto3.client('s3')
s3 = boto3.resource('s3')
obj = s3.Object(BUCKET_NAME, INPUT_KEY)
data = obj.get()['Body'].read().decode('utf-8')
json_data = json.loads(data)
print(json_data)
output_file_path = "/tmp/data.csv"
with open(output_file_path, "w") as file:
csv_file = csv.writer(file)
csv_file.writerow(['TIMESTAMP', 'Sensor'])
for item in json_data:
csv_file.writerow([item.get('TIMESTAMP'),item.get('Sensor')])
csv_binary = open(output_file_path, 'rb').read()
try:
obj = s3.Object(BUCKET_NAME, OUTPUT_KEY)
obj.put(Body=csv_binary)
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
try:
download_url = s3client.generate_presigned_url(
'get_object',
Params={
'Bucket': BUCKET_NAME,
'Key': OUTPUT_KEY
},
ExpiresIn=3600
)
return json.dumps(download_url)
except Exception as e:
raise utils_exception.ErrorResponse(400, e, Log)
Figure 5 – AWS Lambda Python Code snippet to Convert CSV to JSON
Once the model is trained the anomaly is detected on bearing ABC as already stated.
Analyze Factory Environment Data
For our analysis, it is important to factor in the factory environment conditions. There are variables like factory humidity, machine temperature, bearing lubrication levels that could attribute to increased vibration. We have all the environment data gathered from factory sensors made available through AWS IoT Core as shown in Figure 3. Since the number of parameters measured through the factory sensors can change over time, the architecture uses a no-SQL database, Amazon DynamoDB, to persist this data for downstream analysis.
Since we only need sensor events that exceed a particular threshold, we set rules under AWS IoT Core to capture those events that could potentially cause increased vibration on the bearings. For example, we just want only those events that exceed a temperature threshold, say anything above 75-degree Fahrenheit along with the timestamp in Amazon DynamoDB. This is beyond the normal operating temperature for the bearings and certainly demands our interest.
So, we set a rule trigger as shown in Figure 8 in the Rule query statement while defining an action under IoT Core. The flow of events from an IoT sensor to Amazon DynamoDB is illustrated in Figure 9.
The events are received on a minute basis. So, if there are 3 records inserted in Amazon DynamoDB on a day, it means that there has been a temperature threshold breach for a maximum of 3 minutes.
We set up similar rules and actions to DynamoDB for humidity and compressor lubrication levels.
Factory Data Storage
It is required to factor in both the factory data like machine id, shift number, shop floor and the operator data like the operator name, machine-operator relationships, operator’s shift, etc., for this analysis since we are drilling down to the granular details of which operators where working on the machines which have ABC bearings. Since this data is relational, it is stored in Amazon Aurora. We opt for the serverless option of the Aurora database since the operations team requires a database that is fully managed.
The data of the vendors who supply ABC and XYZ bearings and their contracts are stored in Amazon S3. We also have the operators’ performance scores pre-calculated and stored in Amazon S3.
Querying the Data Lake
Now that we have data ingested and stored in various channels – Amazon S3, AWS IoT Core, Amazon DynamoDB and Amazon Aurora, it is required that we collate this data for further analysis and querying. We use Athena Federated Query under Amazon Athena for querying across these multiple data sources and store the results in Amazon S3 as shown in Figure 10. It is required that we create a workgroup and configure connectors to set up Amazon S3, Amazon DynamoDB and Amazon Aurora as the data sources under Amazon Athena. The detailed steps to create and connect the data sources are provided in the user guide.
We are now interested to gather some insights across the three different data sources. Let us find all those machines and their corresponding operators with performance scores in factory shop floor 1 which have the lubrication levels as demonstrated in Figure 11.
From the output in Table 1, we see that operators on these machines which have the machine temperature levels of above the threshold to be having good scores (above 5, which is the average) based on past performance. Hence, we could conclude for now that they have been operating these machines under the normal expected conditions with the right controls and settings.
We would then want to find all those machines which have ABC Bearings and the vendors who supplied them as demonstrated in Figure 12.
As a next step, we would want to reach out to the vendors, Jane Doe Enterprises and Scott Inc.(see Table 2) to report the issue with ABC Bearings. As a next step, we would want to reach out to the vendors, Jane Doe Enterprises and Scott Inc. (refer to Table 2) to report the issue with ABC Bearings (vendor names used here are all fictitious).
Conclusion
In this blog, we covered a sample use case to demonstrate how the AWS Industrial Data Platform can help gather actionable insights from factory floor data to gain operational efficiencies. We also touched on how the various AWS services can be used to build a scalable data platform for seamless querying and processing of data coming in with various formats.