AWS Database Blog

Model hierarchical automotive component data using Amazon DynamoDB

In this post, we discuss an automotive manufacturing information management use case where we store information about components within a vehicle as well as the hierarchy between each of the components. For our automotive use case, we use Amazon DynamoDB to deliver transactional queries, such as component attribute lookups. We will also show you how to use DynamoDB for larger responses such as a recursive query for all the components in a vehicle. While recursive object relationships can be represented in graph databases and possibly traditional RDBMS (with complex joins), these deeper queries can also be represented in DynamoDB.

Component data management in the automotive industry is facing new challenges as new automotive data compliance regulations require increased documentation of both core component specifications as well as component lineage to ensure that regulators can track and trace all the subcomponents contained within a vehicle. The regulatory requirements for more detailed information combined with the number of components that are tracked leads manufacturers to consider a move from limited on premises IT infrastructure in favour of cloud technologies to address the resultant growth in component data management. In this post we demonstrate one approach for designing a DynamoDB data model to document the hierarchical relationships between automotive components to an arbitrary depth. We will use the battery components within a modern electric vehicle as an example.

There is a clear lineage for all the battery components of an electric vehicle. Manufacturers assemble battery cells from raw materials such as lithium, nickel, and cobalt. Battery cells are then assembled into bundles, referred to as modules. The modules are then packaged together as batteries by the manufacturer and finally integrated as a core component in a vehicle.

The typical query and access patterns for the component data within a vehicle are related to the details about the immediate sub-components, supplier, production data, and product specifications (80-90% of queries) along with the lineage details of those components (10-20% of queries).

DynamoDB provides an excellent solution for managing hierarchical data and, as a fully managed AWS service, delivers reliability, security, and scalability while providing single-digit millisecond performance at any scale. A relational database would require costly joins across multiple tables and ongoing tuning to maintain its performance, while a graph database supports much more complex object relationships (such as multiple node and edge connections) than our use case requires.

Solution overview

The first step in any solution with DynamoDB is to define our access patterns, design a basic data model, and then transform this into a DynamoDB structure that will support our access patterns. We continue to refer to these related objects as components, following our EV battery automotive use case.

Let’s visualize the various battery components we will discuss throughout this post. The following diagram indicates a vehicle (V) at the top of the hierarchy, followed by batteries(B), modules(M), and cells(C). The diagram illustrates a vehicle that has two batteries, each with its respective modules and cells.

The diagram shows a clear parent-child relationship. The vehicle V1 is the parent of batteries B11 and B12, and so on. It’s also visible that there’s a dependency between the components: C1233 is dependent on M123, which are both dependent on B12, which is dependent on V1.

The hierarchy and dependencies mean that the components of the graph have a clearly defined position. For example, if you were going to give a driver direction to C1233, you’d say, “you start at V1, take a slight right to B12, another slight right to M123, and one more right to C1233.”

Thus, the path to C1233 is known to be V1|B12|M123|C1233.

Create data model

Now that we understand the hierarchy of the data, we can begin to create our DynamoDB data model. The first step is to document the access patterns. Let’s look at some example access patterns for our use case.

Access pattern Example Path Result

Retrieve all the immediate child components

for a specific battery

B12 V1|B12 M121[C1211, C1212, C1213]
M122[C1221, C1222, C1223]
M123[C1231, C1232, C1233]
Lookup a single component B11 V1|B11 B11
Lookup all components of a vehicle V1 V1 EVERYTHING
Lookup all the parent(s) of a given component C1111 V1|B11|M111|C1111 V1, B11, M111

Now that we have clearly defined the access patterns, we can start to think through how to represent data in DynamoDB to efficiently solve those patterns.

The entities in our data model are vehicles, batteries, modules, and cells. While a given entity can contain any number of attributes including make, model and colour, we will focus on the regulatory use case to trace all components and sub-components contained within a vehicle.

To efficiently trace the lineage of a given component we can include an ID for the component, an attribute that describes any parent-child relationship and the recursive path to the component. Thus, the attributes that we will add to a given entity to help trace its lineage are ParentId, GraphId, and Path.

Here is what our table would look like at this point.

Partition Key
(ComponentId)
Location on graph
(GraphId)
Parent-child relationship
(ParentId)
Path
V1 V1#1 V1
B11 V1#1 V1 V1|B11
M111 V1#1 V1 V1|B11|M111
C1111 V1#1 V1 V1|B11|M111|C1111

Using NoSQL Workbench for DynamoDB we can view an example of our design in a clear manner:

Components base table - NoSQL Workbench

Partition Key, Base Table and Global Secondary Index Design

Let’s pause for a moment and talk about why we are using ComponentId as our primary key. DynamoDB organizes data by partition key. Using an identifier that has a large number of distinct values (high cardinality) lets DynamoDB distribute data across a greater number of partitions, which helps with query efficiency and performance. In our case, the ComponentId is unique, which provides a high cardinality identifier to serve as our partition key. Using just the ComponentId, we can efficiently query any information for a given component with low response times.

However, our use case also requires us to query based on attributes other than our partition key. For example, we want to query all the components in a given vehicle, say vehicleId = 1. We could scan our data to find each component where the ParentId = vehicleId = 1, but there is a more efficient method.

DynamoDB Global Secondary Indexes (GSI) can help you efficiently query based on an attribute in your table that is not your partition key. The result is an index that effectively acts as an entirely new partition key and makes queries more efficient. Creating a GSI on ParentId allows us to more efficiently query all the components of a given vehicle. All we need to know is the Id of the vehicle and the query will return all results.

In our model, we are using GSIs to access queries on non-key base table attributes. The GSIs contain specific attributes from the base table which are organized by a partition key that is different from that of the base table.

To facilitate querying the immediate children of a component, we create a GSI that uses item collections. An item collection is a good mechanism to model one-to-many relationships in DynamoDB, which is a good fit for the access pattern that requires us to lookup all children of a given component. To use item collections we specify both a partition key and sort key in the GSI. ParentId is the partition key, and ComponentId is the sort key. Users and applications query this GSI using ParentId as the partition key and the children will be represented as a list of ComponentId elements.

Item collections can be used to find all recursive children of a component, we create a second GSI where GraphId is the partition key, and Path is the sort key. Users and applications query this GSI by using GraphId as the partition key which represents the top-level component, and can trace each ComponentId using the sort key Path.

The following table summarizes our table key configurations.

Table Element Partition Key Sort Key Mapping Attributes
Base table ComponentId ParentId, GraphId, Path
GSI1 ParentId ComponentId
GSI2 GraphId Path ComponentId

This is how our access patterns now look with GSIs. Please note that there could be additional attributes included here that relate to the item itself!

GSI1: Determine immediate child-parent relationships

To identify all of the immediate children of a parent component (such as the cells included within a module), we create a GSI using ParentId as a partition key and ComponentId as a sort key, as demonstrated in the following table. It is possible to find out how many batteries are assembled into V1, or which cells are assembled into M111. By utilizing this GSI, we avoid the alternative which would be to scan the base table and then filter the results which would be a very expensive operation, in comparison.

Hierarchical components - GSI1 - NoSQL Workbench

GSI2: Determine the genealogy of a component

To determine the genealogy of a given component, we introduce the GSI2 which is composed of GraphId as the partition key and Path as a sort key. By using GraphId and the begins_with operation on the sort key (Path), we can interrogate the full lineage of a component in a tree as illustrated in the following table (only partial table included for understanding)

Hierarchical components - GSI2 - NoSQL Workbench

Write-sharding for GSI optimization

The base table demonstrates the hierarchical tree starting from the top-level component (V1, in our example). As the number of components in a vehicle could potentially grow to hundreds or thousands in number (especially when we move beyond only tracking the battery), we need to generate a more scalable tree for enumerating the components across vehicles by using a write-sharding technique. Without the use of write-sharding, DynamoDB could encounter many requests targeting a relatively small number of partitions, resulting in a hot partition. A hot partition can result in throttling if the partition exceeds access limitations of 3000 Read Capacity Units (RCU) and 1000 Write Capacity Units (WCU) per second.

Write-sharding, and the associated partition management is accomplished by appending a suffix such as #1, #2, and #N to the top-level componentID and placing this value into another attribute (GraphID, in our model), which is then used to populate the Global Secondary Index GSI2. In our example, GraphId for V1 is created as V1#N, where N is a number in the 1–N range.

The range of N is determined by the overall tree size for a vehicle, and is used for proper data distribution in the GSI. Determining the number of partitions (value of N) is covered in detail in Example of modeling relational data in DynamoDB. Based on our expected data velocity, we use two partitions (N=2) in our example.

In order to query all of the data for this vehicle, we query the two partitions in parallel, using two client application threads, resulting in a lower response time. The application then needs to merge the parallel responses to get the final summarized result.

Query the component data

We have defined the access patterns for our hierarchical data store, and designed a DynamoDB data model for fulfilling the access patterns. In the next section, we demonstrate the actual DynamoDB queries that need to be run to retrieve the data from in the most efficient manner possible. For faster response times, we design our tables and indexes so that our applications can use Query instead of Scan.

Look up the information of a single component

To look up the details for a single component and using the AWS CLI, we query the base table using ComponentId where, “ComponentId“ = ”B11“:

aws dynamodb query \
--table-name ComponentDB \
--key-condition-expression "ComponentId = :pk" \
--expression-attribute-values '{":pk": {"S": "B11"}}' \
--return-consumed-capacity TOTAL

The query returns the following record.

ComponentId ParentId GraphId Path
B11 V1 V1#1 V1|B11

Retrieve all of the immediate child components for a component

To get all of the immediate children of a given component, we query GSI1 using ParentId = "B11":

aws dynamodb query \
--table-name ComponentDB \
--index-name GSI1 \
--key-condition-expression "ParentId = :pk" \
--expression-attribute-values '{":pk": {"S": "B11"}}' \
--return-consumed-capacity TOTAL

The query returns the following record.

ParentId ComponentId
B11 M111 M112 M113

The query result indicates that the immediate children of B11 are M111, M112, and M113.

Retrieve a recursive list of all downstream child components for a specific component

To recursively query all downstream components for a specific component, such as B11, we perform the following query using GSI2 which stores a hierarchical tree for the top-level component (in this case the vehicle, or V1). This method consists of a two-step approach to get all the recursive components that exist for a specific component in the tree, resulting in minimal DynamoDB API.

  1. Find the top-level element and Path of your component (B11). For this we will query the base table by using ComponentId = "B11" that will return the Path of that component in the hierarchical tree. Select the GraphId and Path attributes.
aws dynamodb query \
--table-name ComponentDB
--key-condition-expression "ComponentId = :pk"
--expression-attribute-values '{":pk": {"S": "B11"}}'
--projection-expression "GraphId, Path" \
--return-consumed-capacity TOTAL

The query returns the following record.

GraphId Path
V1#1 V1|B11

From the GraphId (V1#1), determine all other partitions based on N number we used when we inserted data into the table. In our case, available partitions will be V1#1 and V1#2 because we used N = (1,2).

  1. Query GSI2 using a multi-threaded client, such as Python. Use the query arguments GraphId = "V1#1" and begins_with(Path, “V1|B11|”) in first thread, and GraphId = "V1#2" and begins_with(Path, “V1|B11|”) in second thread. Following the run of both queries, merge the result from both threads to get the final result.

The result returns all the components related to B11.

For the first thread, query GSI2 by using GraphId = "V1#1" AND BEGINS_WITH(Path, "V1|B11|"):

aws dynamodb query \
--table-name ComponentDB \
--index-name GSI2 \
--key-condition-expression "GraphId = :pk AND begins_with(Path, :sk)" \
--expression-attribute-values '{":pk": {"S": "V1#1"}, ":sk": {"S": "V1|B11|"}}' \
--return-consumed-capacity TOTAL

The query returns the following results.

GraphId Path ComponentId
V1#1 V1|B11|M111
V1|B11|M111|C1111
V1|B11|M111|C1112
V1|B11|M111|C1113
V1|B11|M112
V1|B11|M112|C1121
V1|B11|M112|C1122
V1|B11|M112|C1123
V1|B11|M113
V1|B11|M113|C1131
V1|B11|M113|C1132
V1|B11|M113|C1133
M111
C1111
C1112
C1113
M112
C1121
C1122
C1123
M113
C1131
C1132
C1133

For the second thread, query GSI2 by using GraphId = "V1#2" AND BEGINS_WITH(Path, "V1|B11|"):

aws dynamodb query \
--table-name ComponentDB \
--index-name GSI2 \
--key-condition-expression "GraphId = :pk AND begins_with(Path, :sk)" \
--expression-attribute-values '{":pk": {"S": "V1#2"}, ":sk": {"S": "V1|B11|"}}' \
--return-consumed-capacity TOTAL

The query doesn’t return any result in our case.

  1. Select the ComponentId attribute to return all the child components for B11.
  2. Merge the result from both threads.

The downstream components of B11 are as follows.

ComponentId
M111
C1111
C1112
C1113
M112
C1121
C1122
C1123
M113
C1131
C1132
C1133

Determine the lineage of a given component

To find the lineage (parent, grandparent, great-grandparent, and so on) of the C1211 component, query the base table using ComponentId = "C1211":

aws dynamodb query \
--table-name ComponentDB \
--key-condition-expression "ComponentId = :pk" \
--expression-attribute-values '{":pk": {"S": "C1211"}}' \
--return-consumed-capacity TOTAL

The query returns the following record.

ComponentId ParentId GraphId Path
C1211 M121 V1#2 V1|B12|M121|C1211

To reduce the size of the result data, you can use a projection expression to return only the Path attribute:

aws dynamodb query \
--table-name ComponentDB \
--key-condition-expression "ComponentId = :pk" \
--expression-attribute-values '{":pk": {"S": "C1211"}}' \
--projection-expression "Path" \
--return-consumed-capacity TOTAL

The query returns the following record.

Path
V1|B12|M121|C1211

Now, split the path using the pipe (|) and take the first N-1 components to get ancestors.

The query result shows that the ancestors of C1211 are V1, B12, and M121.

Conclusion

In this post we demonstrated an approach for modeling a hierarchical relationship between objects in DynamoDB, using automotive component management as an example use case. You can use this method for other relational data such other multi-component manufacturing goods, asset management, resource management, and organizational modeling. The choice to use DynamoDB as a storage target for this type of hierarchical relationship allows you to take advantage of DynamoDB, a serverless, NoSQL database with single-digit millisecond performance at any scale. This technical choice is also reinforced if DynamoDB is already part of an existing and supported software stack within a project team.

For very complex or deep object relationships, graph databases provide a more appropriate technology option, albeit with the trade-off of introducing new complexity and technical overhead for the development and operational staff supporting the target architecture.

We hope that with the understanding of the hierarchical data model presented here, along with the real-world automotive use case, others will be inspired to use DynamoDB in a similar manner. Please feel free to reach out if you have any feedback or questions regarding the approach presented in this blog.

For more information on getting started with DynamoDB, please see our Developer Guide.


About the Authors

Moinul Al-MamunMoinul Al-Mamun is a Senior Global Big Data Architect at AWS Professional Services, brings 16 years of expertise in Big Data analytics, Data Warehousing, Business Intelligence, and Software Engineering. He excels in crafting scalable and cost-effective data platforms for global clients using AWS Database and Data Analytics services, with a focus on Data Mesh and Data Lakes. Moinul is also a prolific author, contributing valuable insights to AWS Perspective Guidance and the blog.

Michael SteinMichael Stein is a Principal Consultant with AWS Professional Services, working in Germany since 2013. Michael has delivered solutions globally to customers in the industrial/automotive sectors, with a focus on advanced networking, container technologies, and IoT connectivity. Michael’s passion is connecting plants and manufacturing facilities to cloud infrastructure with a mission to drive technical innovation and improve efficiency, while maintaining secure and sustainable best-practices.