AWS HPC Blog
Building a Scalable Predictive Modeling Framework in AWS – Part 2
In the first part of this blog series, we introduced the aws-do-pm framework for building predictive models at scale in AWS. In this blog, we showcase a sample application for predicting the life of batteries in a fleet of electric vehicles, using the aws-do-pm framework.
To run this example, one needs to have AWS account and have Docker installed. For more details on the setup and how to run the example, refer to the documentation here.
Demonstrating a sample application using aws-do-pm
A common use case is to manage a fleet of electric vehicles (EV) for commercial operations. In EVs the battery is the most critical component. Building models to predict performance of the battery is of utmost importance. When the batteries are new, it can be assumed the performance of EVs in the fleet are similar. However, the individual batteries degrade differently over time, leading to divergence in EV performance. Thus, a battery’s model need to continuously updated to track the degradation of the battery, over its lifetime. We have provided a complete demo in aws-do-pm to:
- generate synthetic data
- build models
- update models
- perform sensitivity analysis
The code for the demo has been released and documented in https://github.com/aws-samples/aws-do-pm. After completing the initial setup in the documentation, you can run the entire demo using the following command from the base folder in the repository:
In the subsequent sections, we will discuss in detail the aspects of:
- data generation and registration
- model building,
- model prediction
- using the aws-do-pm framework.
Data Generation and Registration
To provide the most flexibility, users can choose the number of vehicles they would like to model, in the data generation module. The module uses a phenomenological degradation model to generate the data for each vehicle. All vehicles start with the “ideal” battery. Each vehicle is expected to travel ~100 routes, and every route is assigned a specific distance, speed, load, rolling friction, and drag. The damage depends on all the inputs. The battery voltage, as a function of time in each trip is calculated based on the inputs and the accumulated damage.
The demo generates a simulated dataset that mimics real world behavior of 100 vehicles over 100 routes. The data is organized by vehicle id and route id. The “train_data” folder, created during the execution of the demo, has inputs and outputs required to build a model for pristine battery performance in all the vehicles. The generated data is automatically registered in the aws-do-pm framework. Any external dataset can be registered for use in aws-do-pm using the following command:
pm data register <local_path> <main_file> ['description']
An example would be like
pm data register /project/data ev_data.json “EV data for model training”
In this section, we will briefly discuss the model building process. Battery models are usually developed with test data generated in a laboratory. Numerous tests are performed, under different loading conditions, to generate a multitude of performance profiles (i.e., voltage, current and power curves). Then, a model is built with this large data set. The model can be physics-informed or empirical, in nature. Once built, this model has a specific structure (form) and has a set of parameters associated with it. This model form and the associated parameters represent the physics of the system. This model is then used for all individual batteries of the fleet. An alternative approach would be to use data from real-world operations for model building. The data from the vehicles, when the battery is pristine, can be used for modeling ideal battery behavior. In our example, we will use the data generated from the first trip of the vehicles for modeling the ideal battery, using a dense neural network.
Building a Neural Network Model for “ideal” battery performance
A dense neural network is used to model the trip voltage of the battery, with trip load, trip velocity, and trip distance as inputs. The neural network, built using PyTorch, consisting of 5 hidden layers was trained with data from the first trip of 100 vehicles generated above. It is assumed that all batteries are in pristine condition at the start of operation. Therefore, all individual batteries have the same model at the time of their initial operation.
The fully-connected layers in the neural network have ReLu (Rectified Linear unit) activation functions to enable good regression performance. The drop-out layers were designed to enable model uncertainty calculations to be built in, for downstream processes. 80% of the data was used for training and 20% was used for testing.
A schematic of the neural network is shown below:
The model is of the form shown below:
Note that even though the data generation used the drag and rolling resistance as inputs, the modeler would not have any information about the specific drag or the rolling resistance for any route to include them in the model. This is similar to most real-world situations, and we have designed this example to mimic the real-world use case as much as possible.
The model was trained on an Amazon EC2 g4dn.2xlarge Instance. For larger datasets and more complex models, the same code (aws-do-pm) can be deployed on larger, more powerful instances like P4d. The command for building an ANN model in aws-do-pm is shown below. The neural network model is built using the registered data represented by the data_id.
pm model build <data_id>
The model build performance plots from aws-do-pm are shown below. As seen in the plots, the model performs equally well on both training and test data. The loss-vs-Epochs plot shows that the model training error has reduced to a near minimum while keeping the validation error low. This is critical to make sure the model has not been over-fit. The actual-vs-predicted plots for both training and test datasets show that over the entire domain of 100 vehicles, the model performs adequately, with less than 5% maximum error.
A model built in the aws-do-pm framework is automatically registered. However, you can also register an external model in aws-do-pm framework as given below:
pm model register <folder_path> <rel_model_path> [‘description’]
<rel_model_path> represents the executable version of the model while the
<folder_path> contains all its dependencies.
After a model is built and registered, the model can be used to predict, on a registered dataset, using the following command
pm model predict <model_id> <data_id>
A sample prediction (trip voltage vs index), in normalized units, for one vehicle traveling over one route, along with the prediction uncertainty is shown below.
To run the examples in the third blog of this series, the models built here need to be retained. In such a case, the clean-up can be done after running the examples of the third blog. Otherwise, to free up the resources created by this example, follow the instructions provided here.
In this blog, we have introduced how to use the aws-do-pm framework for predictive modeling at scale. We demonstrated the scalability of the framework using a battery discharge model of an electric vehicle. The electric vehicle demo included synthetic data generation, model building and predicting with built model. In the next blog, we will use this dataset and models as the basis to showcase the model updating and sensitivity analysis capabilities of the aws-do-pm framework.