Amazon DynamoDB single-table design using DynamoDBMapper and Spring Boot
A common practice when creating a data model design, especially in the relational database management system (RDMS) world, is to start by creating an entity relationship diagram (ERD). Afterwards, you normalize your data by creating a table for each entity type in your ERD design.
The term normalization refers to the process of organizing the columns (attributes) and tables (relations) of a relational database to minimize data redundancy. The practice of creating ERDs works even with NoSQL database systems such as Amazon DynamoDB.
The patterns provided by modules such as Spring Data, which is used by Spring Boot based application for data access, still heavily depend on these patterns from the RDMS world. However, normalizing your data in this way doesn’t yield optimal results when you’re using a nonrelational database. Relational databases use joins to combine records from two or more tables, but those joins are expensive. However, DynamoDB does not support joins. Instead, data is pre-joined and denormalized into a single-table.
This blog post shows how to implement an ERD design by using a single-table design approach instead of using multiple tables. We use the higher-level programming interface for DynamoDB called DynamoDBMapper to demonstrate an example implementation based on Spring Boot.
In this post, we use the Ski Resort Data Model that is provided as an example in NoSQL Workbench for DynamoDB. This example model provides several entities and defines the following access patterns:
- Retrieval of all dynamic and static data for a given ski lift or overall resort, facilitated by the table
- Retrieval of all dynamic data (including unique lift riders, snow coverage, avalanche danger, and lift status) for a ski lift or the overall resort on a specific date, facilitated by the table
- Retrieval of all static data (including if the lift is for experienced riders only, vertical feet the lift rises, and lift ride time) for a certain ski lift facilitated by the table
- Retrieval of the date of data recorded for a certain ski lift or the overall resort sorted by total unique riders, facilitated by the
SkiLiftstable’s global secondary index
With dynamic and static data in a single table, we can construct queries that return all needed data in a single interaction with the database. This is important for speeding up the performance of the application for these specific access patterns. However, there is a potential downside, the design of your data model is tailored towards supporting these specific access patterns. Which could conflict with other access patterns, making those less efficient. Because of this trade-off it’s important to prioritize your access patterns and optimize for performance as well as cost based on priority.
To apply the single-table design successfully in your application, you need to understand your application’s data access patterns. Access patterns are dictated by your design, and using a single-table design requires a different way of thinking about data modeling. You can learn more about this pattern from the AWS re:Invent 2020 talks from Alex DeBrie (AWS Data Hero), Data modeling with DynamoDB – Part 1 and Data modeling with DynamoDB – Part 2. Additionally, Amazon DynamoDB Office Hours with Rick Houlihan (senior practice manager at AWS) are a great source of information that include examples of modeling real-world applications.
Usually, you don’t know all the access patterns beforehand. Iterate your design and continue to improve it before actually putting the application into use.
In this blog post’s example application, we use the following stack:
- Amazon Corretto 11, the no-cost, multiplatform, production-ready distribution of the Open Java Development Kit (OpenJDK)
- Spring Boot version 2.4, Spring’s convention-over-configuration solution for creating stand-alone, production-grade Spring-based applications
- Apache Maven, a software project management and comprehension tool
- Amazon DynamoDB Local, the downloadable version of DynamoDB you can use to develop and test applications in your development environment
- AWS SDK for Java v1, specifically for the higher-level programming interface for DynamoDB, which is called DynamoDBMapper
- Project Lombok, a java library that reduces boilerplate code by using annotations in your classes
- JUnit 5, unit testing framework for Java based applications
The first iteration of our data model is shown in the following table.
This table uses the DynamoDB concept called composite primary key. A composite primary key is composed of two attributes. The first attribute is the partition key (PK) and the second attribute is the sort key (SK). DynamoDB uses the partition key’s value as input to an internal hash function. The output from the hash function determines the partition (physical storage internal to DynamoDB) in which the item will be stored. All items with the same partition key value are stored together, in sorted order by sort key value. The values for the partition key and sort key in this table start with a prefix like
<PREFIX>#, which makes values easier to understand. Such a prefix also allows you to create simple queries on the sort key that filter on items starting with a certain prefix.
Prerequisites for this solution
For this walkthrough, you should have the following prerequisites:
- Java Development Kit (JDK), such as Amazon Corretto installed, version 11 or higher
- Apache Maven, which you can install locally or use the Maven wrapper that is provided with the example project
Implementing the solution
We focus on two access patterns in this post and provide integration tests that demonstrate the functionality by using DynamoDB Local. Integration tests provide examples that can be a good starting point when you plan to implement a similar access pattern in your own application.
We focus on the following two patterns:
- Retrieval of all dynamic and static data for a given ski lift or overall resort.
- Retrieval of the date of dynamic data recorded for a certain ski lift or the overall resort sorted by total unique riders. To make this query efficient, we use a global secondary index on the DynamoDB table.
Follow these steps to create an environment in which to test these access patterns:
- Create the Spring Boot application.
- Add domain classes, providing a mapper between Java POJOs and the DynamoDB model. To reduce the amount of boilerplate code we need to write, we use Project Lombok annotations to generate most of this code.
- Add integration tests to validate the access patterns by using DynamoDB Local.
The example project can be found in this GitHub repo.
Using the combination of Spring Boot with Project Lombok is common practice, because the use of Project Lombok minimizes boilerplate code and thereby improves the developer productivity in creating Spring Boot based applications. The Spring Data model is often used for accessing databases. Implementing the data access layer of your application without Spring Data and instead using the higher-level programming interface provided by the AWS SDK for Java has some advantages. For example, you can create a dedicated project for data access, allowing you to not only use this library in your Spring Boot applications but also in other plain Java code. Creating your domain classes that provide the mapping between the application logic and DynamoDB is easier when you combine Project Lombok and the AWS SDK for Java. The following code example demonstrates how to use the Project Lombok annotations and DynamoDBMapper annotations together to create a Java POJO representing the static lift stats domain class. The Project Lombok annotations minimizes the boilerplate code and the DynamoDBMapper annotations provide a mapping between this class and its properties to tables and attributes in DynamoDB. For example the
@DynamoDBTable annotations allows DynamoDBMapper to link the
getPK() method to the partition key in the table
The following code block creates a
QueryRequest expressing to DynamoDB that we want all data from the table that share the same partition key represented by the attribute
liftPK. The result of this request is retrieved from DynamoDB by performing a query:
The results of this query can contain items of different types of objects, both
LiftStaticStats objects. The
DynamoDBMapper class isn’t suited to implement this query because its typed methods don’t allow for a query result that contains different types of objects. However, for this access pattern it is important to retrieve the data set containing different types of objects with just one query to DynamoDB. Because the
QueryResult classes are able to deal with query results containing different types of data objects, using the
QueryResult classes is the best alternative for implementing this query.
Second access pattern
Our second access pattern is the retrieval of the date of dynamic data recorded for a certain ski lift or the overall resort sorted by total unique riders. We need to sort this data by the number of unique riders, but the table design doesn’t facilitate an easy query for such a use case. For this reason, we introduce a global secondary index to support our access pattern. The partition key (PK) remains the same, but we use the total unique riders property as the sort key (SK). Do we need more data for this access pattern? Yes: the date, but other attributes aren’t relevant, so those are not included in global secondary index.
The following table provides some example data in which the items are sorted by the total unique lift riders.
With just one query, it’s very easy to get a list for a specific lift sorted by the total unique lift riders. The only additional data retrieved by this query is the date. The integration test in the project called
GlobalSecondaryIndexTestIT.testRetrieveDateOfLiftDataSortedByTotalUniqueLift() implements this scenario. See the following code, in which we use the
DynamoDBMapper to query the global secondary index using an expression that will only return objects of the type
Run tests in the project by using Maven
To run our tests, we run the following command in the root folder of the project:
./mvnw clean verify
The output shows the results of running the tests, including access to DynamoDB Local. The test results are not that important. We used these tests to demonstrate how different access patterns can be implemented and thereby providing a starting point for integrating the single-table design in Java applications.
You also can find the test results in
This post showed how to complement the functionality provided by the AWS SDK for Java with the functionality provided by Project Lombok. Such an approach allows for an efficient programming model in Spring Boot–based applications as well as any other Java application.
Furthermore, you can extend the same concept in this post to simple functions, including AWS Lambda functions. Within a project, you can use this data access layer in applications based on Spring Boot and deployed on Amazon Elastic Container Service (Amazon ECS) on AWS Fargate. Similarly, you can use the data access layer within the same project in smaller scoped functions deployed as lightweight Lambda functions. This way, you can avoid the added overhead of Spring Boot. This is one of the main advantages of using the components provided by the AWS SDK for Java instead of implementations based on modules such as Spring Data.
This post’s example project demonstrates functionality by using DynamoDB Local, but also provides a great stepping stone to start developing your own Java-based applications and functions.
About the author
Arjan Schaaf is a cloud infrastructure architect at AWS Professional Services, based in the Netherlands. He helps customers solve complex challenges by providing solutions that use AWS services. When not working, Arjan likes Alpine activities, backyard BBQ, and spending time with family and friends.