Guidance for Aircraft Predictive Maintenance on AWS
Overview
How it works
This architecture diagram shows how to predictively reduce unscheduled maintenance delays and flight cancellations using MLand data from IoT devices.
Well-Architected Pillars
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
Operational Excellence
Amazon CloudWatch maintains telemetry on the running system to alert on conditions such as failed AWS Glue extract, transform, and load (ETL) jobs (indicating formatting errors in aircraft data) or error codes returned by API Gateway (indicating configuration problems with the maintenance application or website). Aurora is configured to generate automatic backups of aircraft and prediction data and can rapidly restore those backups.
Automated telemetry and alarms help identify when the system is not meeting desired business outcomes and can help to quickly identify underlying issues before the customer detects or reports them. Errors can be detected and reported both in external customer systems (such as the ACARS system or QAR processing) in addition to services in the AWS Cloud. Automated database backup and restoration allows for quicker recovery to normal status in the event of a failure or disruption.
Security
Amazon S3, AWS Glue, and Kinesis Data Streams enforce mutual TLS for encryption of all customer data (such as aircraft, flight ops, and maintenance data) ingested to the cloud. Amazon S3 and Aurora, where all customer data is stored, enforce encryption on all data in storage. Customer data is encrypted at all times, whether in transit or at rest. This ensures that sensitive data about flight operations and aircraft repair data is only visible to authorized users.
AWS Glue is configured to eliminate privacy-regulated data from the dataset upon ingestion. API Gateway enforces user access control by requiring an authentication token provided by AWS IAM Identity Center, which manages user credentials and roles. User authentication functions help ensure that user credentials are securely managed and rotated, with users allocated to groups with specific access rights according to job role (such as mechanic, supervisor, data scientist or admin), following the least privilege principle. Group- and role-based access management help ensure that user access rights are securely and consistently managed at scale across all organizations.
Reliability
Amazon S3 and Aurora provide a high degree of data durability with multi-Availability Zone data replication in addition to automation and restoration of data backups. Data durability ensures that all data required to make maintenance predictions is available and can be restored in the event of a failure.
Lambda, AWS Glue, SageMaker and API Gateway are fully managed services with automated scaling of resources. Loss of an Availability Zone or database replica will not take down the preventative maintenance system; these services will automatically divert requests from failed resources to healthy ones. The managed services provide automated failover without user intervention and without additional cost.
Kinesis Data Streams automatically scales data ingestion and throttles throughput to match downstream processing rates. The autoscaling of compute resources and auto-throttling of data streams helps ensure that the system can adapt reliably to traffic bursts related to events such as higher flight volume or uploads of large maintenance record batches.
Performance Efficiency
SageMaker and Aurora report utilization metrics to CloudWatch, allowing you to monitor historical utilization of computing resources. CloudWatch alarms can be configured to invoke scale-in or scale-out operations in Aurora and SageMaker to match changing demand. For example, if the alarm signals low utilization of database instances, it could automatically eliminate a database replica or the operator could select a smaller database instance type.
CloudWatch instrumentation provides real-time visibility to changes in system utilization, allowing deeper insight into when computing resources are right-sized to the predictive maintenance application. Based on this information, you can adapt computing resources, such as allocating larger or smaller instance types for the SageMaker prediction inference endpoint or an Amazon Redshift data warehouse for maintenance analytics.
Cost Optimization
Amazon S3 provides automated lifecycle management of data, moving infrequently-accessed data to lower-cost Amazon S3 Glacier storage tiers. This can save significant cost in retaining legacy flight and component records that may be outdated but still relevant for infrequent reports or model training. The automated tiering or retiring of older data reduces storage costs while maintaining a long service history for making accurate maintenance predictions.
Additionally, Lambda and AWS Glue provide serverless computing and data transformation that automatically scale resources up or down to match real-time demand signals; you only pay for the actual computing time used for maintenance predictions. The fully managed, serverless computing resources help to avoid cost waste by automatically scaling resources based on real-time demand. This is important because system utilization will be inherently cyclic: data from flight ops, ACARS, and QAR systems will peak during the daytime or peak travel seasons and wane at night or during off-peak seasons.
Sustainability
Aurora and Athena both support compression of underlying data sources. Compression of system data (such as maintenance logs or flight records) significantly reduces the data storage requirements of the predictive maintenance system, reducing the system’s environmental impact.
Disclaimer
Did you find what you were looking for today?
Let us know so we can improve the quality of the content on our pages