AWS Database Blog

Should Your DynamoDB Table Be Normalized or Denormalized?

Gowri Balasubramanian is a solutions architect for Amazon Web Services.

In the context of databases, the term normalization refers to the process of organizing the columns (attributes) and tables (relations) of a relational database to minimize data redundancy. The general recommendation for relational databases is to use a normalized schema. This blog post will cover the right approach for designing tables in Amazon DynamoDB.

For a NoSQL database like DynamoDB, the choice of a normalized or denormalized schema depends on your use case. As a guideline, though, DynamoDB tables should be designed with a denormalized schema for two main reasons:

  • DynamoDB is schemaless:  When you create a table in DynamoDB, you specify only the primary key attribute(s), such as partition key or partition key and sort key.  You do not define any other attributes in advance.
  • DynamoDB does not support join operations across tables.

Remember, this is a guideline. There are some situations where you should use a normalized schema instead.

Consider normalized schema when:

  • You need to store items that are larger than 400 KB.  The maximum item size in DynamoDB is 400 KB.  Large attributes (such as blocks of text) can be stored in a separate DynamoDB table or in Amazon S3.
  • You expect varied access patterns.  For example, consider a Product_order table that is accessed whenever a customer orders a product.  Now consider a Product_availability table that is accessed only occasionally.  These tables will have different read and write capacity requirements.
  • Your applications will perform many updates. In DynamoDB, a write capacity unit (WCU) is defined as one write per second, for an item up to 1 KB in size.  Frequent writes of items that are larger than 1 KB will impact the number of WCUs consumed per write.   Even if you update a single attribute, the WCU calculation is based on the size of the entire item.

Consider denormalized schema when:

  • You need to store small items, with few attributes.  For reads, the item size should not exceed 4 KB.  (A read capacity unit (RCU) is defined as one read per second, for an item up to 4 KB in size.)  For writes, the item size should not exceed 1 KB (the size of one WCU).
  • Your applications need to read and write data in a high-traffic environment, without being concerned about consistency and synchronization of data across multiple tables.
    For example, consider a table named Annotations. It contains information about customers’ annotations on books they have read across different devices. Customers must be able to update their annotations quickly, as they switch from one device to another, in near real time. In this scenario, it makes sense to store all of the annotations in one table:

    Partition Key
    (Customer_id:bookid:content_type)
    Annonation_Type Description
    Cust1:Book1xxx:pdf Lastread { “deviceType”: “ipad”, page=100, Posstart:1,Posend:50}
    Cust1:Book2xxx:ebook Highlight { “deviceType”: “Andriod”, page=231, Posstart:1,Posend:50}
    Cust1:Book3xxx:text Mark { “deviceType”: “Kindle”, page=242, Posstart:1,Posend:50}
  • Your applications need attributes that must be changed atomically with respect to each other in an item. In this case, all attributes related to that item must be stored in the same table.
  • You need to access data that is logically related and queried together. In a relational database management system (RDBMS), you would usually store this data in separate tables and perform a join operation to retrieve data. For example, consider two tables, Customer and Address, in an RDBMS:Table: Customer
    Custid Transaction_Id Price Order_date Delivery_code
    cust1 XXX.. 200 2013-10-01 98001
    cust2 XXX.. 100 2013-09-05 98002
    cust3 XXX.. 500 2013-10-08 98003
    cust1 XXX.. 1000 2013-09-12 98004

    Table: Address

    Zipcode City_name Country
    98001 Bellevue USA
    98002 Renton USA
    98003 Redmond USA
    98004 Seattle USA

    To retrieve the address for a given customer order, you could use the following SQL query:

    SELECT a.custid, a.transaction_id, b.city_name, b.addres
    FROM customer a, address b
    WHERE a.customer=’cust1’ and a.delivery_code=b.zipcode

    In DynamoDB, it does not make sense to use a normalized schema model like this. DynamoDB does not support join operations, but you don’t need joins if you denormalize your data model.

    Custid (Partition Key) Transaction_Id (Sort Key) Price Order_date Zipcode City_name Country
    cust1 XXX.. 200 2013-10-01 98001 Bellevue USA
    cust2 XXX.. 100 2013-09-05 98002 Renton USA
    cust3 XXX.. 500 2013-10-08 98003 Redmond USA
    cust1 XXX.. 1000 2013-09-12 98004 Seattle USA

    In this simple design, the combined customer and address data consumes only a few hundred bytes each. These small items are ideal for high-speed reads and writes using DynamoDB.

Conclusion
There is no one-size-fits-all guideline for deciding on a normalization strategy.  This is especially true if you are migrating an application from a relational database.  Many of the assumptions you make for a relational schema design do not apply in a NoSQL database like DynamoDB.

In the end, the schema you choose depends primarily on the access patterns for your applications and the size of the data items.

Reference

For more information, see the Amazon DynamoDB Developer Guide: