Data modeling with NoSQL Workbench for Amazon DynamoDB

When using a NoSQL database such as Amazon DynamoDB, I tend to make different optimization choices than what I am accustomed to with relational databases. At the beginning, it was not easy for me, because my relational database experience was telling me to do things differently.

To help with that, AWS released NoSQL Workbench for Amazon DynamoDB, a client-side application, available for Windows, macOS, and now also on Linux, that you can use to build scalable, high-performance data models, and simplify query development and testing. With NoSQL Workbench, you can:

Define your data model using one or more tables.
Visualize the data model to understand how it works in different scenarios.
Build the data plane operations for multiple programming languages.

In this blog post, I am using NoSQL Workbench, that is now generally available, to design the data model for a common use case: managing URL bookmarks for multiple customers. Even if the use case is relatively simple, there are still many interesting considerations to make.

Data modeler

For this application, I need to store information about bookmarks and customer accounts. I start by writing down the list of attributes I want to store, with a brief description of what I am planning to store:

Customer
- customerId – a universally unique identifier (UUID)
- email
- fullName – for example, “Shirley Rodriguez”
- userPreferences – a JSON object describing user settings for my app
- creationDate
- updateDate
Bookmark
- URL – “https://…”
- customerId – the customer storing this bookmark
- folder – to organize bookmarks in folders, for example “Cloud”
- title – by default the HTML title of the page
- description – optional
- creationDate
- updateDate

A first approach, similar to what you’d do with a relational database, is to create two tables, one for customers, and one for bookmarks. The customerId attribute links information between the two tables. Let’s see this design in practice. Using the Data modeler section in NoSQL Workbench, I create a new data model with two tables.

The following screenshot shows the details for the Customer table.

The following screenshot shows the details for the Bookmark table.

The Customer table has customerId as primary key. For the Bookmark table, I use a composite primary key, where customerId is the partition key, and url is the sort key. In this way, if two customers store the same URL, I can have both in the table.

DynamoDB doesn’t have a fixed schema, only the attributes in the primary key (the partition key and optionally the sort key) must be defined at table creation. Each item stored in the table can then use different attributes on top of those in the primary key.

For date fields, such as createDate and updateDate, I use the ISO 8601 standard to store them as strings, like “20200325T091641.123”. This has the advantage that string comparison preserves the order of dates.

I store userPreferences as JSON strings. For simplicity, I am using an empty JSON object (such as “{}”) in examples hereafter. Depending the actual application, I could use this to store language or sort preferences, for example something like:

{
  “language”: “en”,
  “sort”: “date”,
  “sortDirection”: “ascending”
}

This first approach, using two tables, definitely works. But what happens in my application when customers log in to display their bookmarks? Each time, the application runs two requests to DynamoDB, the first to get the customer data in the Customer table, the second to get the customer bookmarks in the Bookmark table.

This is something quite common when moving a relational data model to NoSQL without changes: table joins are “unrolled” into multiple requests to the database. This is not efficient, especially if you start to have three or more requests instead of one.

When using a NoSQL database, you should design your data model to optimize for the application’s access patterns. You should ask yourself:

How am I inserting data?
How am I going to update that data?
What kind of queries am I going to use to retrieve information? How often?

You shouldn’t start the design of the data model if those questions are not clear, otherwise you risk to update it too often, slowing down development.

Going back to our bookmark application, can I design a better data model? Yes!

In my case, I know that most of the times the application is showing, for a specific customer, a summary of the customer information, and the list of customer bookmarks.

Wouldn’t it be great if I could retrieve all this information with a single query? To do that, I store the customer information and the customer’s bookmarks in the same table as a collection of related items. Let’s see how that works.

The primary key of the CustomerBookmark table is composite: the partition key is always the customerId, while the sort key can hold different values. I use the sort key to store URL for bookmarks, or a concatenated string such as “CUST#” followed by customerId for customer data. In this way, the sort key is an overloaded attribute, and I use a generic name for it: sk.

Adding the customerId as part of the sort key for customer data is redundant, but in this way the sk attribute has a high cardinality (meaning that it contains values that are very uncommon or unique) and can be used later as partition key for a secondary index without creating hot partitions.

When defining the table in NoSQL Workbench, attributes can be assigned to one or more facets. A facet is a virtual construct in NoSQL Workbench, not a functional construct in DynamoDB itself, that allows you to visualize the different kind of data you store in the same table. For each facet, you can configure aliases for the partition key and for the sort key. For overloaded attributes, aliases describe what that attribute is supposed to hold for that facet.

In my case, I have two facets for the CustomerBookmark table, depending on which data is stored in an item: customer data or bookmarks. Each of the two facets has a different alias for the sk sort key: type for items storing customer data, url for items storing bookmark information. We’ll see how to use facets in the next section.

Now that I am storing these related items together in the same index, my application can run a single query by customerId and get all the info it needs: customer data and bookmarks. Let’s make this clear using the Visualizer.

Visualizer

To literally “see” the advantages of this new approach, I move to the Visualizer section of NoSQL Workbench. One of the advantages of using the Visualizer, is that you can add sample data to enrich visualizations.

I can also see the sample data from the perspective of the Global Secondary Indexes (GSI) I created, specifically:

ByEmail index to get customer data by email address. Note that items without an email attribute, such as those hosting bookmark information, are not retrieved by the index.
ByUrl index to get bookmark data by URL. Note that I added customerId as sort key for the index, because the same URL can be bookmarked by multiple customers.
ByCustomerFolder index to get, for each customer, bookmarks by folder.

Selecting the facets for the CustomerBookmark table, I see how data is rendered when items are hosting customer or bookmark data:

The Customer facet shows customer information such as email addresses and full names. The sort key is using the type alias, and begins with “CUST#”.
The Bookmark facet describes bookmark data with values such as folders and titles. The sort key is using the url alias, and does not start with “CUST#”.

Using DynamoDB Local

You can develop and test your DynamoDB applications in your local development environment using DynamoDB Local. In this way, you can work on your application locally, even without an internet connection. I download and set up DynamoDB Local on my laptop following these instructions. If you prefer, you can use Apache Maven or a Docker image.

In the directory where I decompressed the archive, I start DynamoDB Local:

java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb

In another window, I test that the local endpoint works using the AWS Command Line Interface (CLI):

aws dynamodb list-tables --endpoint-url http://localhost:8000

{
    "TableNames": []
}

It works, and there are currently no tables, as expected.

In the NoSQL Workbench Visualizer, I select to commit the data model to DynamoDB, then I add a new DynamoDB Local connection:

After that, I repeat the previous AWS CLI command to list DynamoDB tables available locally. Now I see the CustomerBookmark table:

aws dynamodb list-tables --endpoint-url http://localhost:8000

{
    "TableNames": [
        "CustomerBookmark"
    ]
}

All sample data has been added to the table. I run a table scan for a quick check. For the sake of brevity, I am not showing the results here:

aws dynamodb scan --table-name CustomerBookmark --endpoint-url http://localhost:8000

Operation builder

Moving to the Operation builder section of NoSQL Workbench, I can build operations and execute them on a connection, using DynamoDB Local or a real AWS account. You can build operations to put, update, or delete items in a table, or to query or scan a table or an index. There is also support for transactions.

Using the Operation builder, I can simulate what operations my application would do and generate the code for Python, JavaScript (Node.js), or Java. I can use conditions and parameters to filter data, scan an index forward or backward, or limit the number of items to evaluate.

For example, to simulate my bookmark application, I execute a single query operation on the CustomerBookmark table to get all the info (customer data and bookmarks) for a specific customerId:

When executing the query on my DynamoDB Local connection, I get the info I expect from the sample data:

As an additional tuning, I can change the sort order of the query, using the Scan index format option, to get the customer data as first item. I can accomplish the same result changing the prefix I use in the sk sort key to identify customer data, for example using the lowercase string “cust#”.

Satisfied by the result, I select the option to generate code to kick start my application. For this operation, I select Python as the output programming language. The code you see in this screenshot is just a subset of what is being generated:

Similarly, I can use the operations builder to query secondary indexes. For example, I can see how many customers bookmarked a specific website selecting the ByUrl index:

Again, I can execute the query on the sample data to validate the result:

Finally, I generate the code for the query, using the ByUrl secondary index, this time selecting Java as output programming language:

Conclusions

NoSQL Workbench for DynamoDB helped me refine the design of the data model, based on my access patterns, and generate the code to start the implementation of my application. I could use DynamoDB Local to experiment without the need for an internet connection.

NoSQL Workbench supports also IAM roles and temporary AWS security credentials. For example, if your organization uses AWS Single Sign-On (SSO), your users can sign in to your Active Directory, a built-in AWS SSO directory, or another identity provider connected to AWS SSO, and map that identity to an IAM role that you can use for authentication with NoSQL Workbench.

Now that NoSQL Workbench for DynamoDB is generally available, transforming your ideas in real applications is much easier. To start, download NoSQL Workbench here, and tell me what are you going to use it for!

About the Author

Danilo Poccia works with startups and companies of any size to support their innovation. In his role as Principal Evangelist at Amazon Web Services, he leverages his experience to help people bring their ideas to life, focusing on serverless architectures and event-driven programming, and on the technical and business impact of machine learning and edge computing. He is the author of AWS Lambda in Action from Manning. Follow him on Twitter @danilop.

AWS Database Blog