AWS HPC Blog

Building a Scalable Predictive Modeling Framework in AWS – Part 3

In the first part of this blog series, we introduced the open source aws-do-pm framework for predictive modeling at scale. In the second part, we demonstrated the scalability of the framework for building and deploying battery models for electric vehicles. The electric vehicle demo includes capabilities for data generation and model building.

In this blog, we will use the synthetic dataset (generated earlier) and the models to showcase the model updating and sensitivity analysis capabilities of the aws-do-pm framework.

The code featuring an example that showcases data generation, model building, model update, and sensitivity analysis can be found in https://github.com/aws-samples/aws-do-pm.

Prerequisites

To run this example, one needs to have AWS account and have Docker installed. For more details on the setup and how to run the example, refer to the documentation here. Further, the models built in the second part of this blog series need to be retained.

Updating Predictive Models

For a model to be useful, its predictions need to be accurate. However, the behavior of the underlying physical assets (like a battery) tends to change over time. Therefore, the model(s) of such assets need to updated continually for them to remain accurate.

One may consider “rebuilding” the models from scratch whenever there’s new data, but this approach is not meaningful for the following reasons:

New data is limited and not sufficient for full-scale model building. Combining older data and new data for model building results in an “average” model that does not reflect the current state of the system. Importantly, physical entities usually have a notion of “state” that needs to be tracked over time. Rebuilding models, instead of updating models, destroys this notion of “state”.

Combining data from multiple entities in the fleet is not possible since not all entities are in the same state, at a given time. Rebuilding a model is computationally more expensive and time consuming than updating.

The general approach is to start with all batteries initially having the same model. Individual batteries, however, are used differently in real-world scenarios (operating environment, different missions etc.). This leads to divergence in the performance of the batteries in the fleet. Therefore, having a “single fleet model” for all the batteries in the fleet is not useful for making accurate predictions at the individual entity level. Hence, the initial “single model” is updated with data from each individual battery, to create unique predictive models for each battery.

In the prior blog, a synthetic dataset was generated and a neural network model was trained on the synthetic data using the aws-do-pm framework. This model was trained on the first route for each of the 100 vehicles, in order to capture the behavior of a new pristine battery. Selected parameters of this model (pertaining to the last hidden layer of ANN model built before) will be appropriately modified, to reflect observed battery behavior as the vehicles gets driven across various routes, but the model structure itself will be retained. The model structure captures the “physics” of the system (which is common to all batteries) while the model parameters represent the “state” of an individual battery (which is unique to each battery). Initially, all batteries in a fleet have the same model. Over time. this model gets updated with data from each individual entity of the fleet. Note that the predictive model of an individual battery is updated with only its data (and nothing else). A simplified version of the model updating process is shown below:

Figure 1 - Model Update Process Flow

Figure 1 – Model Update Process Flow

Running the model update

The current model prediction along with the actual observed field data is shown on the left in the figure below. As the model predictions diverge significantly from the field observations, the model needs to be updated. The aws-do-pm framework ships with the Unscented Kalman Filter as the default model updating technique. The model can be updated using the following aws-do-pm command:

pm model update <model_id> <data_id>

The predictions of the updated model is shown in the right half of the figure.

The update process significantly improves the model’s accuracy, as we see the prediction values match closely the field observations, after the model has been updated. The updated model predictions also contains an uncertainty estimate for the predictions. A user may choose to update a model only when the observed data falls beyond the uncertainty bounds.

Sensitivity Analysis

To understand the effect of the inputs and model parameters on the output, a sensitivity analysis is required. The sensitivity analysis is performed on the set of updatable model parameters (a subset of all model parameters) to assess the sufficiency of the updatable model parameters for capturing the observed field behavior.

The sensitivity analysis can also be performed over the model’s lifetime to assess a shift in the dominant model parameters. These shifts indicate a change in the underlying deterioration of the physical asset and can provide valuable insights into the operation of the assets.

Variance-based measures are preferred for sensitivity analysis because they measure sensitivity across the entire input space, can deal with non-linearity and they also quantify the effect of interactions in non-additive systems. The aws-do-pm framework contains functionality to perform global sensitivity analysis using Sobol sensitivity. An overview of Sobol sensitivity, can be found in the link here.

The Sobol indices provide an estimate of the main and 2-way interaction effects. The aws-do-pm command to perform model sensitivity analysis is shown below:

pm model sensitivity <model_id> <data_id>

A sample output of the sensitivity analysis is given below:

This plot shows that the output of the model (trip voltage) is most sensitive to changes in one of the model parameters (bias_weight_array_2). Similarly, the input variable trip_dist has the highest impact on the output trip voltage.

To capture the interaction of multiple variables with each other, the aws-do-pm framework provides the chord plot. A sample plot is shown below.

Each color denotes one variable and the width of the arc connecting two variables denotes the strength of the interaction between them. Understanding the interactions are important because sometimes a variable that is not sensitive by itself might have a large impact on the output through its interaction with another variable with a larger sensitivity.

 aws-do-pm graph architecture

The aws-do-pm framework is designed as a graph that dynamically grows as the user interacts with the framework. Every operation generates and/or connects entities in the graph. The graph architecture enables auditability, allows for easy visualization of activities and helps track version changes of datasets, models, and techniques. Knowing who, when, and why changes were made is a key part of traceability for predictive models. The graph architecture enables the users to easily extract the details of the data, models and techniques used in the past for predictions for forensic analysis.

The aws-do-pm graph is designed with the following primitives (shipped by defaults and extensible by users):

  • Data primitive: Represents the data that can be used for model building, update, predictions etc.
  • Model primitive: Collection of files, binaries and environment settings essential to run the model
  • Technique primitive: The standardized methods and algorithms that can be generalized for a wide application on different data and model artifacts. For example, model building using artificial neural networks, model update using unscented Kalman filter and serving a model with gRPC are general techniques that can be used by a wide variety of applications.
  • Task primitive: The task represents an action on other primitives
  • Trash primitives: All delete actions are recorded in the trash primitive so that the system is able to track actions through their lifecycle
  • Error primitives: Any errors or failures during the process create error artifacts for debugging

A snapshot of the aws-do-pm framework, after the user has performed a few actions, is shown below.

As the user performs more actions on the aws-do-pm framework, the graph expands accordingly to reflect all the actions. The figure above shows a snapshot where the following actions have been performed:

  1. Default techniques have been registered through the technique registration
  2. Data artifacts have been used to build a model through a model_build_ann technique
  3. New operational data has been registered into the system and has been used to predict and update the model
  4. In the process of creating an updated model, the model was exposed as a gRPC service and then deleted (moved to trash) once the update process is complete

These fundamental operations are available as a part of the aws-do-pm framework and can be used to build an application. For example, the ev-demo script in the GitHub repository ( https://github.com/aws-samples/aws-do-pm) shows a typical use of the API for a single electric vehicle use case. Running the ev-demo with default settings will generate a dataset for 100 vehicles, use it to generate the base model, update the base model for one vehicle on one route, and run model sensitivity for one vehicle to showcase all the features of the aws-do-pm framework. After running the demo successfully, the graph of the system will look as shown below:

The aws-do-pm framework scales elastically on EKS and autoscales the resources to perform large scale computations.

Running the ev-fleet-sequential-demo performs a continuous update of 100 electric vehicles, driven over 10 routes each. The state of the aws-do-pm graph after running the demo is shown below. Refer to the README file in the GitHub repository (https://github.com/aws-samples/aws-do-pm) for detailed instructions to run the demo.

The graph database records the connectivity and impact of different entities over time. With the evolution of the system, the graph can be used to analyze complex patterns and extract insights across the entire system.

Clean-up

To delete and free up the resources created by this example, follow the instructions provided here.

Summary

In the first part of this blog series, we introduced the aws-do-pm framework for predictive modeling at scale. In the second part, we discussed data generation and model building. In this blog, we illustrated model updating and sensitivity analysis capabilities of the aws-do-pm framework. The scalability of the aws-do-pm framework was demonstrated by running a simulation consisting of 100 vehicles over 10 routes each, on Amazon EKS. Further, we also showcased a scalable graph architecture that can be used to extract system level insights. An interested reader can extend the capabilities of the framework by adding more models and techniques, by following the instructions in the technical documentation.

Alex Iankoulski

Alex Iankoulski

Alex Iankoulski is a full-stack software and infrastructure architect who likes to do deep, hands-on work. He is currently a Principal Solutions Architect for Self-managed Machine Learning at AWS. In his role he focuses on helping customers with containerization and orchestration of ML and AI workloads on container-powered AWS services. He is also the author of the open source [Do framework](https://bit.ly/do-framework) and a Docker captain who loves applying container technologies to accelerate the pace of innovation while solving the world's biggest challenges. During the past 10 years, Alex has worked on combating climate change, democratizing AI and ML, making travel safer, healthcare better, and energy smarter.

Mahadevan Balasubramaniam

Mahadevan Balasubramaniam

Mahadevan Balasubramaniam has 24 years experience in the area of physics infused deep learning and building digital twins at scale for physical assets such as aircraft engines, industrial gas turbines and industrial process platforms. At AWS, he is a WWSO HPC Principal SA developing solutions for large-scale HPC+AI and ML Frameworks. Prior to joining AWS, Mahadevan was first at GE where he focused on probabilistic modeling, hardware design, anomaly detection, and remaining useful life predictions for a variety of applications across aviation, energy, and oil & gas. Mahadevan then joined as a Senior Principal Data Scientist for a startup where focused on deep learning based solar energy forecasting for managing the battery discharge in PV-battery installations. Dr. Balasubramaniam obtained his Ph.D. from MIT in 2001 where he studied computational geometry for automated toolpath generation for 5-axis NC machining.

Venkatesh Rajagopalan

Venkatesh Rajagopalan

Venkatesh Rajagopalan is a Principal Solutions Architect for Autonomous Computing. He has ~13 years of industrial experience in research and product development. In his current role, he develops solutions for problems in large-scale machine learning and autonomous systems. Prior to joining AWS, Venkatesh was the Senior Director of Data Science with GE’s Oil & Gas business, where lead an Industrial AI team responsible for building hybrid analytics (physics + deep learning) products focused on production optimization in large fields, anomaly detection, and remaining useful life estimation for the oil & gas industry. Prior to this, Venkatesh was a Senior Engineer with the Prognostic Systems Lab at the GE Research Center, India. His research focused on developing models and methods for failure prediction in critical industrial assets like gas turbines, electrical motors, and aircraft engines. He was a key contributor to the development of the Digital Twin platform at GE Research. The digital twins that he developed are being used to monitor the health and performance of a fleet of (7000+) aircraft engines and (300+) gas turbines. Venkatesh has a PhD in Electrical Engineering from the Pennsylvania State University. He has been granted 12 patents, has published 17 papers, with more than 600 citations. His expertise includes signal processing, estimation theory, optimization and machine learning.