AWS Database Blog
Should Your DynamoDB Table Be Normalized or Denormalized?
In the context of databases, the term normalization refers to the process of organizing the columns (attributes) and tables (relations) of a relational database to minimize data redundancy. The general recommendation for relational databases is to use a normalized schema. This blog post will cover the right approach for designing tables in Amazon DynamoDB.
Understanding the differences between relational data design and NoSQL databases such as DynamoDB and identifying application use cases are crucial for designing the right schema for DynamoDB tables. As a general rule, DynamoDB tables should be designed with a denormalized schema for a few key reasons:
- DynamoDB is schemaless: When you create a table in DynamoDB, you specify only the primary key attributes, such as partition key or partition key and sort key. You do not define any other attributes in advance.
- The schemaless nature of DynamoDB enables low latency key-value access for internet-scale applications. When you denormalize data, you only need to query sections of your table, which scales to millions of requests unlike a tightly woven relational schema.
- DynamoDB does not support join operations across tables.
The following section explains the benefits of building your schema using a denormalized model with a single DynamoDB table:
-
- With modern application design such as microservice architectures, it is easier to scale and faster to develop applications with a denormalized schema because each service stores and processes data related to a single business function.
- Store and process small items with few attributes. The DynamoDB API for read and write data from a table is measured via read capacity units (RCU) and write capacity units (WCU). For items up to 4 KB in size, one RCU can perform two eventually consistent or 1 strongly consistent read requests per second. For items up to 1 KB in size, one WCU can perform one write request per second. By organizing your data into a single table, you can effectively use the provisioned RCU and WCU.
- Your applications need to read and write data in a high-traffic environment, without being concerned about synchronization of data across multiple tables. For example, consider a table named Annotations. It contains information about customers’ annotations on books they have read across different devices. Customers must be able to update their annotations quickly as they switch from one device to another in near-real time. In this scenario, it makes sense to store all of the annotations in one table.
Partition Key
(Customer_id:bookid:content_type)Annotation_Type Description Cust1:Book1xxx:pdf Lastread { “deviceType”: “iPad”, page=100, Posstart:1,Posend:50} Cust1:Book2xxx:ebook Highlight { “deviceType”: “Android”, page=231, Posstart:1,Posend:50} Cust1:Book3xxx:text Mark { “deviceType”: “Kindle”, page=242, Posstart:1,Posend:50} - Your applications need attributes that must be changed atomically with respect to each other in an item or multiple items by using DynamoDB transactions. In this case, all attributes related to those items must be stored in the same table.
- You need to access data that is logically related and queried together. In a relational database management system (RDBMS), you would usually store this data in separate tables and perform a join operation to retrieve data. For example, consider two tables, Customer and Address, in an RDBMS.
Table: Customer
Custid | Transaction_Id | Price | Order_date | Delivery_code |
---|---|---|---|---|
cust1 | XXX.. | 200 | 2013-10-01 | 98001 |
cust2 | XXX.. | 100 | 2013-09-05 | 98002 |
cust3 | XXX.. | 500 | 2013-10-08 | 98003 |
cust1 | XXX.. | 1000 | 2013-09-12 | 98004 |
Table: Address
Zipcode | City_name | Country |
---|---|---|
98001 | Bellevue | USA |
98002 | Renton | USA |
98003 | Redmond | USA |
98004 | Seattle | USA |
To retrieve the address for a given customer order, you could use the following SQL query:
SQL
SELECT a.custid, a.transaction_id, b.city_name, b.addres
FROM customer a, address b
WHERE a.customer=’cust1’ and a.delivery_code=b.zipcode
In DynamoDB, it does not make sense to use a normalized schema model like this. DynamoDB does not support join operations, but you don’t need joins if you denormalize your data model.
Custid (Partition Key) | Transaction_Id (Sort Key) | Price | Order_date | Zipcode | City_name | Country |
---|---|---|---|---|---|---|
cust1 | XXX.. | 200 | 2013-10-01 | 98001 | Bellevue | USA |
cust2 | XXX.. | 100 | 2013-09-05 | 98002 | Renton | USA |
cust3 | XXX.. | 500 | 2013-10-08 | 98003 | Redmond | USA |
cust1 | XXX.. | 1000 | 2013-09-12 | 98004 | Seattle | USA |
In this simple design, the combined customer and address data consumes only a few hundred bytes each. These small items are ideal for high-speed reads and writes using DynamoDB.
As emphasized earlier, most well-designed applications require only one table, unless there is a specific reason for using multiple tables. The following are a few scenarios in which you might consider using multiple tables.
-
- You need to support a business use case that requires modification of items across tables as a single, all-or-nothing operation. You can achieve this by using DynamoDB ACID transactions. For example, consider a shipment system that needs to map items with containers for tracking purposes. This use case is common in the freight and cargo shipping industry. For this scenario, you have two tables. In the Containers table, each container has an inventory of items and the maximum capacity it can hold. In the Goods table, items that need to be shipped are stored.
-
- First, check that the item is not reserved with any container and the container has the necessary space to accommodate it.
- If those conditions are satisfied, map the item to the container.
- The item is then added to the container’s inventory list.
Table: Containers
container_id
(Partition Key)
shipmentlist freecapacity cxr1 {item001} 5 cx2 {item005,item006} 20 cx3 {item003,item010, itemN} 0 Table: Goods
item_id
(Partition key)
container_id description item001 cx1 {
“Type”: “Book shelf”,
“Vendor”: “XYZ company”
}
item002 {
“Type”: “Computer desk”,
“Vendor”: “IAD company”
}
Item003 cx3 {
“Type”: “Home organizer”,
“Vendor”: “TTT company”
}
-
- You need to support a business use case that requires modification of items across tables as a single, all-or-nothing operation. You can achieve this by using DynamoDB ACID transactions. For example, consider a shipment system that needs to map items with containers for tracking purposes. This use case is common in the freight and cargo shipping industry. For this scenario, you have two tables. In the Containers table, each container has an inventory of items and the maximum capacity it can hold. In the Goods table, items that need to be shipped are stored.
To apply this concept to our use case here, you can bundle the following actions together as a single TransactWriteItems operation and make a call to DynamoDB.
Action Type | Table | Item | Condition Expression |
UPDATE | Goods | Key (id: item001) set container_id=cx1 | attribute_not_exists{container_id} |
UPDATE | Containers | Key {id:cx1) , set freecapacity= freecapacity-1, shipmentlist=list_append (shipmentlist,item001) | {freecapacity >0} |
- Your application might have different access control, access pattern, and RCU/WCU needs for DynamoDB tables. Here are a few examples:
- Let’s consider an example of a smart home application that tracks devices such as a doorbell in an Internet of Things (IoT) domain.The application has two tables: a DeviceStatus table (
partition_key=deviceid
) that stores information such as the health status and firmware of the devices updated periodically via a polling model, and a DeviceActivity table (partition_key=deviceid
andsort_key=timestamp
) that records events such as a button push as it occurs by using a push model. These tables have different operational characteristics. The DeviceStatus table traffic is predictable and consistent traffic, so it uses DynamoDB provisioned capacity mode for managing throughput. On the other hand, the DeviceActivity table uses on-demand capacity mode for managing its unpredictable traffic rate. - Using DynamoDB as a materialized view of data stored from one or more data stores for quick lookup. For example, let’s assume you have a DynamoDB Table A (for an online application) and a legacy application running on SQL systems. Your application can create a precomputed or aggregate dataset in Table B by reacting to the changes from Table A via DynamoDB Streams and pull the data from the SQL data source.
- Support a high volume of time-series data events by creating one table per period with different RCU/WCU settings.
- Let’s consider an example of a smart home application that tracks devices such as a doorbell in an Internet of Things (IoT) domain.The application has two tables: a DeviceStatus table (
Conclusion
Many of the assumptions you make for a relational schema design do not apply to a NoSQL database such as DynamoDB. In the end, the schema you choose depends primarily on the access patterns for your applications and the size of the data items. We encourage you to see the DynamoDB Developer Guide for best practices about schema design and reference design patterns.
About the Author
Gowri Balasubramanian is a senior solutions architect at Amazon Web Services. He works with AWS customers to provide guidance and technical assistance on both relational as well as NoSQL database services, helping them improve the value of their solutions when using AWS.