AWS Solutions Library

Guidance for Aircraft Predictive Maintenance on AWS

Overview

This Guidance shows how machine learning (ML) models can be applied to Internet of Things (IoT) sensor data to predict component or system failures before they happen and recommend appropriate maintenance steps. Aerospace manufacturing, aircraft operations, and other manufacturing and industrial domains use IoT devices to identify patterns in sensor output data to predict preventative maintenance operations needed to prevent system failures and downtime. This Guidance helps you use that data to reduce unplanned downtime of manufacturing lines, aircrafts, and other systems.

How it works

This architecture diagram shows how to predictively reduce unscheduled maintenance delays and flight cancellations using MLand data from IoT devices.

Download the architecture diagram

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

Operational Excellence

Amazon CloudWatch maintains telemetry on the running system to alert on conditions such as failed AWS Glue extract, transform, and load (ETL) jobs (indicating formatting errors in aircraft data) or error codes returned by API Gateway (indicating configuration problems with the maintenance application or website). Aurora is configured to generate automatic backups of aircraft and prediction data and can rapidly restore those backups.

Automated telemetry and alarms help identify when the system is not meeting desired business outcomes and can help to quickly identify underlying issues before the customer detects or reports them. Errors can be detected and reported both in external customer systems (such as the ACARS system or QAR processing) in addition to services in the AWS Cloud. Automated database backup and restoration allows for quicker recovery to normal status in the event of a failure or disruption.

Read the Operational Excellence whitepaper

Security

Amazon S3, AWS Glue, and Kinesis Data Streams enforce mutual TLS for encryption of all customer data (such as aircraft, flight ops, and maintenance data) ingested to the cloud. Amazon S3 and Aurora, where all customer data is stored, enforce encryption on all data in storage. Customer data is encrypted at all times, whether in transit or at rest. This ensures that sensitive data about flight operations and aircraft repair data is only visible to authorized users.

AWS Glue is configured to eliminate privacy-regulated data from the dataset upon ingestion. API Gateway enforces user access control by requiring an authentication token provided by AWS IAM Identity Center, which manages user credentials and roles. User authentication functions help ensure that user credentials are securely managed and rotated, with users allocated to groups with specific access rights according to job role (such as mechanic, supervisor, data scientist or admin), following the least privilege principle. Group- and role-based access management help ensure that user access rights are securely and consistently managed at scale across all organizations.

Read the Security whitepaper

Reliability

Amazon S3 and Aurora provide a high degree of data durability with multi-Availability Zone data replication in addition to automation and restoration of data backups. Data durability ensures that all data required to make maintenance predictions is available and can be restored in the event of a failure.

Lambda, AWS Glue, SageMaker and API Gateway are fully managed services with automated scaling of resources. Loss of an Availability Zone or database replica will not take down the preventative maintenance system; these services will automatically divert requests from failed resources to healthy ones. The managed services provide automated failover without user intervention and without additional cost.

Kinesis Data Streams automatically scales data ingestion and throttles throughput to match downstream processing rates. The autoscaling of compute resources and auto-throttling of data streams helps ensure that the system can adapt reliably to traffic bursts related to events such as higher flight volume or uploads of large maintenance record batches.

Read the Reliability whitepaper

Performance Efficiency

SageMaker and Aurora report utilization metrics to CloudWatch, allowing you to monitor historical utilization of computing resources. CloudWatch alarms can be configured to invoke scale-in or scale-out operations in Aurora and SageMaker to match changing demand. For example, if the alarm signals low utilization of database instances, it could automatically eliminate a database replica or the operator could select a smaller database instance type.

CloudWatch instrumentation provides real-time visibility to changes in system utilization, allowing deeper insight into when computing resources are right-sized to the predictive maintenance application. Based on this information, you can adapt computing resources, such as allocating larger or smaller instance types for the SageMaker prediction inference endpoint or an Amazon Redshift data warehouse for maintenance analytics.

Read the Performance Efficiency whitepaper

Cost Optimization

Amazon S3 provides automated lifecycle management of data, moving infrequently-accessed data to lower-cost Amazon S3 Glacier storage tiers. This can save significant cost in retaining legacy flight and component records that may be outdated but still relevant for infrequent reports or model training. The automated tiering or retiring of older data reduces storage costs while maintaining a long service history for making accurate maintenance predictions.

Additionally, Lambda and AWS Glue provide serverless computing and data transformation that automatically scale resources up or down to match real-time demand signals; you only pay for the actual computing time used for maintenance predictions. The fully managed, serverless computing resources help to avoid cost waste by automatically scaling resources based on real-time demand. This is important because system utilization will be inherently cyclic: data from flight ops, ACARS, and QAR systems will peak during the daytime or peak travel seasons and wane at night or during off-peak seasons.

Read the Cost Optimization whitepaper

Sustainability

Aurora and Athena both support compression of underlying data sources. Compression of system data (such as maintenance logs or flight records) significantly reduces the data storage requirements of the predictive maintenance system, reducing the system’s environmental impact.

Read the Sustainability whitepaper

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages