AWS for Industries

Digital Twin Data Middleware with AWS and MongoDB

There are an estimated 100 million connected vehicles on the road worldwide today, and are expected to reach 400 million by 2025. The buying decision by customers towards connected vehicles is and will become increasingly dependent on software functionality, and integration into a person’s digital lifestyle. This will enable customers access to a new level of information and control, way beyond the physical driving experience.

Eventually features such as autonomous driving will be enabled on demand via software applications and trip planning will become fully automated, opening up a wide variety of new pay-per-use business models. Automotive companies are well aware of these trends and customer expectations, but getting there is rather difficult. Speed is crucial to remain competitive and to ensure revenue generation beyond the point of sale of the vehicle. Automakers want to address this problem, yet they find themselves spending too much time on non-differentiating tasks such as developing bidirectional integration. This time should be better spent on competitive differentiation and innovation instead of such “technical plumbing”.

Developer efficiency is at the core of MongoDB’s value proposition and over the last couple of years, the feature set has evolved way beyond the well-known database technology. MongoDB as a company has evolved from one of the most popular database technology providers to one of the leading developer data platforms on AWS. This blog post will demonstrate how you can use MongoDB Atlas as your cloud backend on AWS with Realm and Atlas Device Sync to remove the non-differentiating integration or “technical plumbing” work between vehicles, the cloud and mobile apps and accelerate building the next generation of digital twin use cases and applications.

The Digital Twin Challenge

The “Digital Twin” can best be described as the effortless bidirectional integration of data between a physical and virtual machine. Enabling a Digital Twin environment allows for real time insights and decision making and supports analytical use cases across fleets of vehicles.

The focus in this blog is on vehicle Digital Twins. As seen in Figure 1, this means building and using digital data models of vehicles within different environments and synchronizing them bidirectionally between the vehicle, the AWS cloud and mobile devices to help enable a wide range of use cases.

Figure 1- High Level Overview

Figure 1- High Level Overview

A simple example of such a use case is leveraging status information with trip and weather data to predict EV range and automatically plan charging stops.

Creating a digital twin requires three core components:

  • Creating digital models of the vehicle
  • Bringing those digital models to life with mobile and cloud applications
  • Ensuring efficient bidirectional data synchronization mechanisms despite unreliable networks

What may sound simple at first glance becomes an incredibly complex and time-consuming task that automakers are currently faced with. Creating digital models itself requires years of work and collaboration between automakers and their suppliers. To address this, leading automakers, suppliers, and technology partners have begun to define standards such as the Vehicle Signal Specification (VSS). However, when combining these tree-like structures with traditional database technologies, you’ll find your developers wasting scarce time merely mapping the structure into relational data models.
The digital model must also exist within applications in different environments – be it in the vehicle, in the cloud, and in mobile/web applications. All of these environments call for their own specific requirements. The result is a plethora of distinct technologies and frameworks.

And if that doesn’t already seem challenging enough, vehicles and mobile applications leveraging these digital models are also expected to work seamlessly when offline and over unstable networks. Offline-first applications in combination with seamless data synchronization are the future. But if automakers continue to waste vast amounts of development time spent on the mere maintenance of such integration functionalities, they risk being left behind while the competition accelerates innovating on the next generation of digital twin applications.

Let’s look at the three components in a little more detail.

The Vehicle: A resource constrained embedded environment

A car is capable of generating an estimated 25 gigabytes of data per day. To put into perspective, 25 gigabytes of data is equivalent to streaming 8 hours of Netflix! A strong focus on resource efficiency remains since every additional MB of memory has to be multiplied with millions of vehicles and thus being able to use super low footprint technologies will be crucial.

MongoDB’s Realm, an embedded database technology already used on millions of mobile phones as part of mobile apps as well as infotainment like systems, was designed with these exact requirements in mind. With Realm, you are armed with an idiomatic lightweight file system API to collect, store, and process data with minimal resource consumption. You don’t have to worry about unstable connectivity as all the required data for your application can be stored and processed while the vehicle is offline.

Additionally, the object orientation of Realm makes the whole software development experience idiomatic and natural without additional abstraction layers such as the object-relational mapping frameworks.
And since mobile applications share a lot of commonalities with the embedded world, you’ll find that there are also Realm SDKs available that support the most popular mobile app frameworks and programming languages, allowing you to leverage the same advantages idiomatically in your preferred environment.

The Cloud: Backend power from AWS

Not only will the amount of data produced per car increase exponentially in the future, the structures and data models will also evolve. MongoDB is a multi-model data platform built on top of the document model which can process the different evolving data models efficiently through a single query language and API. This allows you to reduce complexity, and data duplication, save cost and accelerate software development.

MongoDB’s distributed and horizontally scalable architecture in combination with AWS regions, availability zones and infrastructure services, allows you to fulfill data locality and high availability requirements dynamically and with ease.

The close collaboration between MongoDB and AWS goes beyond leveraging the mere cloud infrastructure. The collaboration helps provides you with deep technical integrations into native AWS services and further allows AWS customers to consume MongoDB Atlas services through their existing AWS contracts.

Curious to learn more about MongoDB Atlas on AWS? Click here.

Bidirectional Data Synchronization

MongoDB provides out of the box bidirectional data synchronization through Atlas Device Sync between Realm databases embedded in client applications, and MongoDB Atlas the backend data platform on AWS.

Atlas Device Sync is unique as it combines data modeling and synchronization of instantiated data objects across multiple development environments seamlessly and in real time. All a software developer has to care about is creating or modifying instances of objects inside the application code.

While data objects are defined as classes/types inside application code, the MongoDB Atlas backend uses JSON schemas to define attributes, data types and relationships. The same schema is also automatically converted into class definitions for the different programming environments and Realm SDKs. Developers simply copy & paste the data model for their programming environment and can use it straight away.

The fully managed GraphQL API endpoint also automatically leverages the JSON schema and allows immediate access to fresh data including field level role-based access control, typically through web and mobile applications. No additional schema maintenance is required.

Vehicles don’t always operate in areas with stable networks and face sudden drops in bandwidth or interruptions. To accommodate such unpredictable network conditions, Realm follows an offline first paradigm. This means changes to the data are persisted locally first. As soon as connectivity is established, the modifications are then transparently synchronized through the cloud backend with other devices in real time.

Realm SDKs, also, automatically handle sudden network interruptions transparently while making sure reconnections under unstable network conditions do not lead to draining the device battery.

To accommodate low bandwidth and expensive data transfer costs, the synchronization mechanism only sends the changes/delta on field level over the network. These so called changesets are further compressed before transmission to reduce the transferred data volume to a minimum.

The distributed architecture and offline first paradigm also means that changes to data can occur on the same data objects across multiple applications which can lead to conflicts. Handling conflicts such as order of modifications require complex application logic. The Atlas Device Sync protocol has therefore built in deterministic conflict resolution, removing large amounts of complex non-differentiating code freeing up developer time to work on features and functionality.

AWS & MongoDB Telemetry Feedback Loop

Now that we’ve covered data management in the vehicle, the cloud, and how to sync those worlds, let’s bring it all together with an example use case and architecture demonstrating how vehicle battery telemetry can be collected within the vehicle, moved to the cloud backend for inference and how the result is then shared with the driver, owner and workshop people in real time.

Figure 2: High Level Architecture

Figure 2: High Level Architecture

Looking at the architecture above, a connected vehicle uses a Realm SDK for an onboard application e.g. within the infotainment system. This application collects vehicle telemetry data, persists it locally within a Realm database file as long as the vehicle is offline and once connectivity is established, synchronizes it via HTTPS Websocket to the cloud backend. On the backend the data is stored in a MongoDB database where a trigger pushes the collected telemetry data via AWS EventBridge into Amazon SageMaker. Sagemaker then analyzes this data. The inference is then written back into the MongoDB where Atlas Device Sync will pick it up immediately and synchronize it with all the connected applications such as mobile phones or infotainment systems, letting everyone interested immediately know about the condition of the vehicle no matter where they are and depending on the permissions they have for that data.

This is just a simple example of how such a connected vehicle feedback loop can be built with ease, leveraging MongoDB Atlas and AWS.

Imagine all the great customer experiences your organization will be able to build. Just few ideas:

  • Collect and combine vehicle telemetry with weather data for automated route planning.
  • Bidirectional communication between the driver and technical support for road assistance and remote problem solving.
  • Driver profiling for reduced insurance fees.


Signed, sealed, and delivered. With MongoDB and AWS, you can get a powerful data middleware for your Digital Twins, allowing you to focus on innovation and added-value use cases, generating new revenue streams with ease. Spend more time on differentiation and be at the forefront of creating the next generation of vehicle customer experiences.

If you are curious to learn more, stay tuned for part 2 of this blog series. There we’ll dive deep into the technical considerations (with code samples) not covered in this blog along with a demo presentation showcasing how you can utilize MongoDB and AWS for your digital twin.