AWS HPC Blog

Building a Scalable Predictive Modeling Framework in AWS – Part 1

Predictive models have powered the design and analysis of real-world systems such as jet engines, automobiles, and powerplants for decades. These models are used to provide insights on system performance and to run simulations, at a fraction of the cost compared to experiments with physical hardware. Our aim with the predictive modeling framework is to enable a broad group of end users build and deploy predictive models without worrying about the underlying techniques, orchestration mechanisms, or the infrastructure required.

However, the use of such models for making operational decisions has been limited due to lack of scalable tools and ease of use of distributed computing systems. For example, a power plant manager using a physics-based model (with hundreds of parameters) for performance prediction may want to update the model with new observations. For this task, the power plant manager at a minimum would need to:

  • understand the model (e.g., what parameters in the model needs to be updated)
  • know of the techniques that may be useful for updating the model (e.g. Kalman filters),
  • deploy the model in a way that the technique can communicate with the model, and
  • deploy the technique at scale to update the model.

The adoption of these techniques has thus been limited to small-scale problems, given the complexity of the process for even a basic task as “updating models”.

Introducing the aws-do-pm framework

We have adapted the open source AWS DevOps for Docker (aws-do-docker) for predictive modeling and are now making it available as a separate open source project, the aws-do-pm framework. The framework allows users to deploy predictive models at scale across a distributed computing architecture, with the capability to probabilistically update models using real-world data, while maintaining the full history of user actions.

The aws-do-pm framework caters to both the end users and advanced developers. The end users can use the framework to achieve individual tasks such as updating models or quantifying uncertainty without the burden of understanding the underlying techniques or infrastructure. The framework is built to be extensible for advanced users. For example, users can register data, models and techniques, built outside of aws-do-pm, in the aws-do-pm framework. Once registered, they can be used along with existing techniques to build new applications. The aws-do-pm framework is organized as shown below.

Figure 1 - A high-level diagram of the aws-do-pm architecture

Figure 1 – A high-level diagram of the aws-do-pm architecture.

The architecture of aws-do-pm (Fig. 1) consists of three layers: Services, Entities, and Interfaces. The architecture is containerized and implemented as a AWS DevOps for Docker project. It can run locally or on the cloud. It supports EKS and Docker/docker-compose as target container orchestrators. Users and applications interact with aws-do-pm framework through its CLI and SDK interfaces. Additional details can be found here.

The aws-do-pm project is designed to spin up the infrastructure just-in-time (on demand as needed), deploy the services required to run the specific task (such as model building or sensitivity analysis) requested by the user, complete the task, save the state of the entire system, and then gracefully spin down all services and infrastructure before exiting.

The entire system state is thus reduced to the compressed storage at the end of each run (if the user so desires) automatically, thus reducing the footprint to the absolute bare minimum. The framework is designed to be run both on AWS and on the user’s local resources (e.g., laptop). The blocks shown above (Data, Techniques, Asset, and Model) are extensible by the user, without any need to modify the underlying infrastructure. We will demonstrate the aws-do-pm framework’s capabilities using data and models from a fleet of simulated electric vehicles, in the subsequent blogs.

Next steps

In this first post of three, we described the motivation and general architecture of the open-source aws-do-pm framework project. In our second post, we will show you how to use the framework to create a sample application for predicting the life of batteries in a fleet of electric vehicles. In the final part of this series, we we will use the synthetic dataset and the models generated in the second part to showcase how to update the model and perform a sensitivity analysis using aws-do-pm.

Alex Iankoulski

Alex Iankoulski

Alex Iankoulski is a full-stack software and infrastructure architect who likes to do deep, hands-on work. He is currently a Principal Solutions Architect for Self-managed Machine Learning at AWS. In his role he focuses on helping customers with containerization and orchestration of ML and AI workloads on container-powered AWS services. He is also the author of the open source [Do framework](https://bit.ly/do-framework) and a Docker captain who loves applying container technologies to accelerate the pace of innovation while solving the world's biggest challenges. During the past 10 years, Alex has worked on combating climate change, democratizing AI and ML, making travel safer, healthcare better, and energy smarter.

Mahadevan Balasubramaniam

Mahadevan Balasubramaniam

Mahadevan Balasubramaniam has 24 years experience in the area of physics infused deep learning and building digital twins at scale for physical assets such as aircraft engines, industrial gas turbines and industrial process platforms. At AWS, he is a WWSO HPC Principal SA developing solutions for large-scale HPC+AI and ML Frameworks. Prior to joining AWS, Mahadevan was first at GE where he focused on probabilistic modeling, hardware design, anomaly detection, and remaining useful life predictions for a variety of applications across aviation, energy, and oil & gas. Mahadevan then joined as a Senior Principal Data Scientist for a startup where focused on deep learning based solar energy forecasting for managing the battery discharge in PV-battery installations. Dr. Balasubramaniam obtained his Ph.D. from MIT in 2001 where he studied computational geometry for automated toolpath generation for 5-axis NC machining.

Venkatesh Rajagopalan

Venkatesh Rajagopalan

Venkatesh Rajagopalan is a Principal Solutions Architect for Autonomous Computing. He has ~13 years of industrial experience in research and product development. In his current role, he develops solutions for problems in large-scale machine learning and autonomous systems. Prior to joining AWS, Venkatesh was the Senior Director of Data Science with GE’s Oil & Gas business, where lead an Industrial AI team responsible for building hybrid analytics (physics + deep learning) products focused on production optimization in large fields, anomaly detection, and remaining useful life estimation for the oil & gas industry. Prior to this, Venkatesh was a Senior Engineer with the Prognostic Systems Lab at the GE Research Center, India. His research focused on developing models and methods for failure prediction in critical industrial assets like gas turbines, electrical motors, and aircraft engines. He was a key contributor to the development of the Digital Twin platform at GE Research. The digital twins that he developed are being used to monitor the health and performance of a fleet of (7000+) aircraft engines and (300+) gas turbines. Venkatesh has a PhD in Electrical Engineering from the Pennsylvania State University. He has been granted 12 patents, has published 17 papers, with more than 600 citations. His expertise includes signal processing, estimation theory, optimization and machine learning.