Accelerating industrialization of Machine Learning at BMW Group using the Machine Learning Operations (MLOps) solution

The BMW Group and Amazon Web Services (AWS) announced a strategic collaboration in 2020. The goal of that collaboration is to help further accelerate the BMW Group’s pace of innovation by placing data and analytics at the center of its decision-making.

The BMW Group’s Cloud Data Hub (CDH) manages company-wide data and data solutions on AWS. The CDH provides BMW Analysts and Data Scientists with access to data that helps drive business value through Data Analytics and Machine Learning (ML). As a part of BMW’s larger strategy to leverage the availability of data within CDH and to help accelerate the industrialization of Machine Learning, the BMW Group worked closely with AWS Professional Services to develop their Machine Learning Operations (MLOps) solution.

The BMW Group’s MLOps solution includes (1) Reference architecture, (2) Reusable Infrastructure as Code (IaC) modules that use Amazon SageMaker and Analytics services, (3) ML workflows using AWS Step Functions, and (4) Deployable MLOps template that covers the ML lifecycle from data ingestion to inference.

The MLOps solution supported the BMW Group in accelerating their industrialization of AI/ML use cases, resulting in significant value generation within the first two years after the solution’s release. The long-term goal of the BMW’s MLOps solution team is to help accelerate industrialization of over 80% of the AI/ML use cases at the BMW Group, helping to enable continuous innovation and improvement in AI/ML at the BMW Group.

Starting in 2022, the MLOps solution has been rolled out to AI/ML use cases at the BMW Group. It has seen widespread adoption and recognition as the BMW internal master solution for MLOps.

In this blog, we talk about BMW Group’s MLOps solution, its reference architecture, high-level technical details, and benefits to the AI/ML use cases who develop and productionize ML models using the MLOps solution.

Overview of MLOps Solution

The MLOps solution has been developed to address the requirements of AI/ML use cases at the BMW Group. This includes integration with the BMW data lake, such as CDH, as well as ML workflow orchestration, data and model lineage, and governance requirements such as compliance, networking, and data protection.

AWS Professional Services and the MLOps solution team from the BMW Group collaborated closely with various AI/ML use cases to discover successful patterns and practices. This enabled the AWS and the BMW Group’s MLOps solution team to gain a comprehensive understanding of the technology stack, as well as the complexities involved in productionizing AI/ML use cases.

To meet the BMW Group’s AI/ML use case requirements, the team worked backwards and developed the MLOps solution architecture as mentioned in Figure 1 below.

Figure 1: MLOps Solution Architecture

In the section below, we explain the details of each component of the MLOps solution as represented in the MLOps solution architecture.

1. MLOps Template

The MLOps template is a composition of IaC modules and ML workflows built using AWS managed services with a serverless first strategy designed to allow the BMW Group to use the scalability, reduced maintenance costs, and agility of ML on AWS. The template will be deployed to the AWS account of the AI/ML use cases to create an end-to-end, deployable ML and infrastructure pipeline. This is designed to act as a starting point for building AI/ML use cases at the BMW Group.

The MLOps template offers functional capabilities for the BMW Group’s Data Scientists and ML Engineers ranging from data import, exploration, training, to deployment of ML model for inference. It supports applications in the operations of AI/ML use cases at the BMW Group by offering version control and infrastructure and ML monitoring capabilities.

The MLOps solution is designed to offer functional and infrastructure capabilities for use cases as independent building blocks. These capabilities can be used by AI/ML use cases as a whole or can choose selected blocks to help the BMW Group to meet their business goals.

Below is the overview of the MLOps Template building blocks offered by the BMW Group’s MLOps Solution:

MLOps Modules are reusable IaC definitions to create AWS resources in a specification that is designed by considering the BMW Group’s security, compliance, and networking requirements.
Data Import and preparation module offers users the opportunity for the BMW Group to import data from the BMW Group data lake e.g., CDH and establish pre-processing jobs in preparation for the ML training and inference.
Module for Data and Model Exploration provides the users of the BMW Group’s AI/ML use cases with an opportunity to conduct an exploratory analysis of data.
The Model Training module provides users with the flexibility to either train ML models via Amazon SageMaker Training Jobs or incorporate their own ML models using containers.
Model Evaluation module offers functionality to evaluate model performance such as model quality metrics and then register models in Amazon SageMaker Model Registry.
Pipeline Definition module brings the pipeline orchestration to execute the steps that are needed to train or infer from a ML model.
Model Deployment module offers functionality to deploy ML models for batch or real-time inference.
The Infrastructure and Cost Monitoring module offers monitoring of the solution and tracking expenses.
Module for Data & Model Quality Monitoring offers ML model observability and tracking functionalities to the users.

Figure 2: MLOps Template building blocks

2. Notebook Stack:

MLOps solution offers Data Scientists and ML Engineers at the BMW Group with example notebooks to help enhance the learning curve of the BMW Group’s Data Scientists and ML Engineers with AWS Services. These example notebooks include:

Working examples to demonstrate data exploration, feature engineering, to model registry.
Practical examples of utilizing Amazon SageMaker’s Processing, Tuning, and Training Jobs, as well as showcasing how to use the Amazon SageMaker Model Registry to register various versions of ML models and their approval statuses.
How to create four types of ML model monitors including Data Quality, Model Quality, Model Bias, and Model Explainability monitors, utilizing the Amazon SageMaker Model Monitor and Clarify services.
How to use the Bring Your Own Container feature to package custom algorithms in Docker containers for training, tuning, and inference in Amazon SageMaker.

3. Training Pipeline

The MLOps solution’s training pipeline developed using AWS Step Functions Data Science Python SDK, consists of required steps to train ML models, including data loading, feature engineering, model training, evaluation, and model monitoring.

Use case teams at the BMW Group have the flexibility to modify or expand the MLOps solution’s Training pipeline as required for their specific projects. Common customizations thus far have included parallel model training, simultaneous experiments, pre-production approval workflows, and monitoring and alert notifications via Amazon SNS integration.

The details of MLOps solution’s training pipeline steps are shown in Figure 3 below:

data-load-step: Utilizes Amazon SageMaker Processing Job for the ingestion of unprocessed data into the pipeline execution.
feature-engineering-step: Leverages Amazon SageMaker Processing Job and custom logic to perform feature engineering on raw data accessed via the data-load-step from Amazon S3.
training-step: Employs Amazon SageMaker Training Job to train an ML model.
model-evaluation-step: A stage in the training pipeline that facilitates users in measuring the quality metrics of an ML model.
register-model-step: Saves trained model that meet model quality metrics in the Amazon SageMaker Model Registry.
model-monitoring-step: The MLOps solution’s training pipeline is equipped with Model Explainability monitoring that offers users at the BMW Group with a functionality to explain the predictions of a deployed model producing inference.
store-metadata-step: Custom pipeline stage for the storage of metadata pertaining to each execution in Amazon DynamoDB.

Figure 3: Training Pipeline

4. Continuous Integration and Continuous Delivery (CI/CD)

MLOps solution employs AWS CodePipeline to facilitate continuous integration and deployment workflows. The AWS CodePipeline sourcing steps allow users at the BMW Group to select their preferred source control, such as AWS CodeCommit or GitHub Enterprise.

AI/ML use case teams at the BMW Group can use AWS CodePipeline to help deploy the ML training pipeline, and thereby bootstrapping the required AWS infrastructure for orchestrating the ML training pipeline from reading data from the BMW Group data lake e.g., CDH to model training, evaluation, and ML model registration.

When the model training pipeline completes with registering the ML model in the Amazon SageMaker Model registry, the MLOps Solution uses Amazon EventBridge notifications to trigger AWS CodePipeline to deploy the inference module.

5. Inference:

Around 80% of AI/ML use cases at the BMW Group served by the MLOps solution require high-performance and high-throughput methods for transforming raw data and generating inference from them. To meet the use case needs, the MLOps solution offers a batch inference pipeline with the required steps for those users at the BMW Group to load and pre-process the raw data, generate predictions, and monitor the predicted results for quality and offer explainability.

Along with the batch inference pipeline, the AI/ML use case teams at the BMW Group are provided with the required modules to help set up real-time inference in case they require low latency predictions and API integration with external use case applications.

The details of MLOps solution’s batch inference pipeline steps are shown in Figure 4 below:

data-load-step: Utilizes Amazon SageMaker Processing Job for the ingestion of unprocessed data into the pipeline execution.
feature-engineering-step: Uses Amazon SageMaker Processing Job and custom logic to perform feature engineering on raw data accessed via the data-load-step from Amazon S3.
batch-inference-step: Utilizes SageMaker Batch Transform to generate predictions.
model-monitoring-step: MLOps solution’s inference pipeline is equipped with Model Explainability monitoring that offers users with a functionality to explain the predictions of a deployed model producing inference.
post-processing-step: Inference pipeline extended with an additional step to provide users to bring business logic that can be applied on predicted results from batch-inference-step.

Figure 4: Inference Pipeline

6. Use Case Application Stack

The MLOps solution offers AI/ML use cases of the BMW Group to bring their own application stack in addition to the set of modules offered as a part of the MLOps solution. This helps AI/ML use cases at the BMW Group to make necessary customization as per their business and technical needs.

Conclusion

The MLOps solution helped the AI/ML use cases of the BMW Group to build and deploy production grade models, thereby reducing overall time to market by approximately 75%. The MLOps solution also offers a broad range of benefits to the BMW Group, including:

AWS Optimized, serverless first solution
Helping integrate BMW specific requirements around networking, data handling, compliance, and security
Establishing state-of-the art practices that can be iteratively improved and rolled out to new use cases by the BMW Group
New AWS services and capabilities are continuously added to the MLOps solution to drive innovation for the BMW Group

Learn more about BMW’s Cloud Data Hub (CDH) in this blog post, AWS offerings at the AWS for Automotive page or contact your AWS team today.