Design a Database for a Mobile App
with Amazon DynamoDB
Module 5: Partial Normalization
You will learn about partial normalization
Overview
In the previous module, you added an inverted index to your table. The inverted index adds additional query patterns for relational data. However, we saw that we still had a problem. While we could query all Friendship entities for a given User entity, the Friendship entity doesn’t contain information about the user being followed.
In this module, you will learn about partial normalization.
Time to Complete
20 minutes
Partial Normalization
In the second module, we saw that you should not try to fake a relational pattern when modeling with DynamoDB. One core tenant of relational model is normalization of data in which you avoid duplicating data in multiple places. Normalization is a nice pattern in relational databases, but you may require costly joins to reassemble your data when querying.
With DynamoDB, you often want to denormalize your data. Denormalization helps to avoid joins and improve query performance. To do this, you may copy attributes from one item into another item that refers to it in order to avoid fetching both items during a query.
However, there are times when denormalization can complicate your data model. For example, the data model has a Friendship entity that refers to both a user that is being followed and a user that is following. You could copy all of the attributes of each User entity into the Friendship entity when you create the Friendship entity. Then, when you retrieve the Friendship entity, you would also have all details about both users as well.
This could pose a problem whenever a user changes information in their profile. For example, if a user changes their profile picture, the data in each Friendship entity containing that user would be stale. You would need to update each Friendship entity that contained the user whenever there is an update.
In steps below, you will see how to use partial normalization and the BatchGetItem API call to handle this situation.
Implementation
-
Use partial normalization to find followed users
In this step, you will see how to find followed users. These are the users that a particular user is following in your application. You will also see how to retrieve all of the data about the users being followed.
As noted in the introduction to this module, you may want to use a partial normalization technique around friendships and users. Rather than storing the full information about each user in the Friendship entity, you can use the BatchGetItem API to retrieve information about a user in a Friendship entity.
In the code you downloaded, there is a file in the application/ directory called find_and_enrich_following_for_user.py. The contents of this script are shown below.
import boto3 from entities import User dynamodb = boto3.client('dynamodb') USERNAME = "haroldwatkins" def find_and_enrich_following_for_user(username): friend_value = "#FRIEND#{}".format(username) resp = dynamodb.query( TableName='quick-photos', IndexName='InvertedIndex', KeyConditionExpression="SK = :sk", ExpressionAttributeValues={":sk": {"S": friend_value}}, ScanIndexForward=True ) keys = [ { "PK": {"S": "USER#{}".format(item["followedUser"]["S"])}, "SK": {"S": "#METADATA#{}".format(item["followedUser"]["S"])}, } for item in resp["Items"] ] friends = dynamodb.batch_get_item( RequestItems={ "quick-photos": { "Keys": keys } } ) enriched_friends = [User(item) for item in friends['Responses']['quick-photos']] return enriched_friends follows = find_and_enrich_following_for_user(USERNAME) print("Users followed by {}:".format(USERNAME)) for follow in follows: print(follow)
The find_and_enrich_following_for_user function is similar to the find_follower_for_user function you used in the last module. The function accepts a username for whom you want to find the followed users. The function first makes a Query request using the inverted index to find all of the users that the given username is following. It then assembles a BatchGetItem to fetch the full User entity for each of the followed users and returns those entities.
This results in two requests to DynamoDB, rather than the ideal of one. However, it’s satisfying a fairly complex access pattern, and it avoids the need to constantly update Friendship entities every time a user profile is updated. This partial normalization can be a great tool for your modeling needs.
Execute the script by running the following command in your terminal.
python application/find_and_enrich_following_for_user.py
Your console should output a list of users followed by the given username.
Users followed by haroldwatkins: User<ppierce -- Ernest Mccarty> User<vpadilla -- Jonathan Scott> User<david25 -- Abigail Alvarez> User<jacksonjason -- John Perry> User<chasevang -- Leah Miller> User<frankhall -- Stephanie Fisher> User<nmitchell -- Amanda Green> User<tmartinez -- Kristin Stevens> User<natasha87 -- Walter Carlson> User<geoffrey32 -- Mary Martin>
Note that you are now dealing with User entities rather than Friendship entities. The User entity will have the most complete, up-to-date information about your user. While it took two requests to get there, it may still be a better option than full denormalization and the data integrity issues that result from it
Conclusion
In this module, we saw how partial normalization and the BatchGetItem API call can be helpful to maintain data integrity across objects while still keeping query requests low.
In the next module, we’ll use DynamoDB transactions as we add a reaction to a photo or follow a user.