BMW Group Uses AWS-Based Data Lake to Unlock the Power of Data
The BMW Group, headquartered in Munich, Germany, is a global manufacturer of premium automobiles and motorcycles, covering the brands BMW, BMW Motorrad, MINI, and Rolls-Royce. It also provides premium financial and mobility services.
For the past several years, the BMW Group has worked to stay at the forefront of the automotive industry’s digital transformation by using data and predictive analytics. According to Kai Demtröder, BMW Group vice president of data transformation, artificial intelligence, data and DevOps platforms, “To stay innovative, we are focusing on creating new digital and connected experiences and driving change in our value chain toward improving both efficiency and effectiveness by enabling data-driven decisions." To generate these innovations, in 2015 the BMW Group created a centralized, on-premises data lake that collects and combines anonymized data from sensors in vehicles, operational systems, and data warehouses to derive historical, real-time, and predictive insights.
However, the company needed to more easily scale its data lake to support the growing demands of internal and external stakeholders. Because data wasn’t easily accessible—spread across myriad, siloed environments—the BMW Group’s innovation was slowed down by its own IT infrastructure and the long lead times required to support new initiatives. The BMW Group needed to develop a solution agile enough to both support the data needs of all the various internal business units and allow the company to move quickly to address the array of emerging use cases its customers demand.
The BMW Group also sought to give data consumers real-time access, for example, to vehicle telemetry—such as information on speed, location, temperature, battery and brake levels, and engine status. In addition, it wanted to integrate analytics and machine learning into the data lake to accelerate the development of new, innovative services. And, as a basic prerequisite, the solution would have to provide the governance required to ensure compliance with privacy and security regulations.
We are just starting our journey with AWS, and we look forward to helping our business fulfill its strategy of driving innovation into the future."
Vice President of Data Transformation, Artificial Intelligence, Data and DevOps Platforms
Empowering a Data-Driven Approach
In response to these challenges, the BMW Group decided to re-architect and move its on-premises data lake to the Amazon Web Services (AWS) Cloud. The company’s Cloud Data Hub (CDH) processes and combines anonymized data from vehicle sensors and other sources across the enterprise to make it easily accessible for internal teams creating customer-facing and internal applications. Ultimately, the company found that AWS offered the agility and flexibility it needed, along with the necessary footprint to support users across the globe.
Prior to the migration, the BMW Group’s rigid on-premises data lake failed to meet the ever-increasing needs of data engineers and analysts. Running interdependent workflows, the old data lake couldn’t handle multiple tenants well and, as a consequence, the BMW Group’s platform, ingestion, and use case teams required complex coordination to work on projects and ran into organizational bottlenecks, slowing their pace.
The BMW Group turned to a mix of AWS managed services—including Amazon Athena, Amazon Simple Storage Service (Amazon S3), Amazon Kinesis Data Firehose, and AWS Glue—to reduce the setup’s complexity by differentiating components and create an environment capable of scaling to meet the needs of data engineers. In addition, the teams could now have their own DevOps process from end-to-end, giving them the autonomy and agility needed to continue to innovate. Moreover, the BMW Group implemented a modern web portal that helps users of the CDH discover trusted datasets using an advanced search algorithm and easily query data to generate new insights.
Democratizing Data Usage at Scale
Using AWS services, the BMW Group ingests a massive amount of data every day. Currently, millions of BMW and MINI vehicles are connected to the CDH via BMW Group’s highly secure backend, processing terabytes of anonymous telemetry data daily. The company uses this data to monitor vehicle health indicators such as check control errors to identify potential issues across vehicle lines. This enables the BMW Group to leverage fleet data ingested, collected, and refined from the CDH to better resolve issues, even before they impact customers.
To better manage this data, the BMW Group introduced the notion of “data providers” and “data consumers” to increase both the autonomy and agility of its software engineering teams. Data providers ingest and transform data with AWS services such as Amazon Kinesis Data Firehose, AWS Lambda, AWS Glue, and Amazon EMR. Data consumers can then use services such as Amazon Athena, Amazon SageMaker, AWS Glue, and Amazon EMR to leverage data for their use cases. Both providers and consumers use these services in their own accounts and only share well-defined interfaces that can be controlled by a central API, helping prevent bottlenecks. The individual data layers are stored in Amazon S3 buckets, and their schemas are registered in the AWS Glue Data Catalog.
Besides collecting technical metadata in the AWS Glue Data Catalog, the BMW Group found that building up a human-readable data catalog was essential to democratizing data organization-wide. This effort would ensure a high degree of transparency about which data assets are gathered in the CDH and how. The front-end application Data Portal serves as a data explorer to boost the productivity of data analysts, data scientists, and engineers by clearly displaying data resources and offering a “popularity index” based on data usage patterns for more than 500 users across the organization.
In addition, the CDH leverages GraphQL via AWS AppSync to build scalable and universal APIs for data providers and consumers alike, increasing development flexibility. Unlike traditional REST APIs, interfaces built on GraphQL are well-suited to support evolutionary requirements such as representing metadata for the data catalog or providing heterogeneous data collected from connected vehicles. Developers have the flexibility to define the payload structure and query parameters to fetch the data they need for a given use case. This helps them build applications significantly faster than before because they no longer have to create a new set of APIs for each project with a different set of data requirements.
The centralized and AWS-based data lake forms the BMW Group’s foundation to develop data-driven IT solutions and enables the company to automatically and independently scale on a serverless architecture. It can therefore innovate faster than it could with the previous on-premises solution, which required infrastructure management and capacity planning for each new initiative.
The BMW Group will open source key components surrounding the CDH including its APIs, architecture, and Data Portal. This is additionally fueled by the fact that BMW Group is a first day member of Gaia-X, the European initiative for establishing sovereign data spaces.
Going forward, the BMW Group will continue to scale out the CDH platform’s capabilities to further accelerate its digital transformation and drive additional value across the business, empowering innovative customer experiences, new mobility services, and internal business insights. Demtröder concludes, “We are just starting our journey with AWS, and we look forward to helping our business fulfill its strategy of driving innovation into the future.”
To learn more, visit aws.amazon.com/automotive.
Figure 1: CDH architecture overview
Figure 2: CDH portal view
About the BMW Group
With its four brands—BMW, MINI, Rolls-Royce and BMW Motorrad—the BMW Group is a leading premium manufacturer of automobiles and motorcycles. The company also provides premium financial and mobility services.
Benefits of AWS
- Democratizes data usage at scale
- Processes terabytes of telemetry data from millions of vehicles daily
- Resolves issues before they impact customers
- Accelerates innovation
AWS Services Used
Amazon Kinesis Data Firehose
Amazon Kinesis Data Firehose is the easiest way to reliably load streaming data into data lakes, data stores, and analytics services. It can capture, transform, and deliver streaming data to Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, generic HTTP endpoints, and service providers like Datadog, New Relic, MongoDB, and Splunk.
Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high quality models.
AWS AppSync is a fully managed service that makes it easy to develop GraphQL APIs by handling the heavy lifting of securely connecting to data sources like AWS DynamoDB, Lambda, and more. Once deployed, AWS AppSync automatically scales your GraphQL API execution engine up and down to meet API request volumes.
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics.
Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.