Design a Database for a Mobile App
with Amazon DynamoDB
Module 3: Core Usage: Users, Photos, Friends, and Reactions
You will design the primary key for the DynamoDB table and enable the core access patterns
Overview
In the previous module, we defined the mobile application’s access patterns. In this module, we will design the primary key for the DynamoDB table and enable the core access patterns.
When designing the primary key for a DynamoDB table, keep the following best practices in mind:
- Start with the different entities in your table. If you are storing multiple different types of data in a single table, such as employees, departments, customers, and orders, be sure your primary key has a way to distinctly identify each entity and enable core actions on an individual items.
- Use prefixes to distinguish between entity types. Using prefixes to distinguish between entity types can prevent collisions and assist in querying. If you have both customers and employees in the same table, the primary key for a customer could be CUSTOMER#<CUSTOMERID> while the primary key for an employee could be EMPLOYEE#<EMPLOYEEID>.
- Focus on single-item actions first, and then add multiple-item actions if possible. For a primary key, it’s important that you can satisfy the read and write options on a single item by using the single-item APIs -- GetItem, PutItem, UpdateItem, and DeleteItem. If you also can satisfy multiple-item read patterns with the primary key by using Query, that’s great. If not, you can always add a secondary index to handle the Query use cases.
With these best practices in mind, let’s design the primary key and perform some basic actions.
Time to Complete
40 minutes
Implementation
-
Design the primary key
Let’s consider the different entities, as suggested in the preceding introduction. In the mobile application, we have the following entities:
- Users
- Photos
- Reactions
- Friendship
These entities show three different kinds of data relationships.
First, each user on your application will have a single user profile represented by a User entity in your table.
Next, a user will have multiple photos represented in your application, and a photo will have multiple reactions. These are both one-to-many relationships.
Finally, the Friendship entity is a representation of a many-to-many relationship. The Friendship entity represents when one user is following another user in your application. It is a many-to-many relationship as one user may follow multiple other users, and a user may have multiple followers.
Having a many-to-many mapping is usually an indication that you will want to satisfy two Query patterns, and our application is no exception. On the Friendship entity, we have an access pattern that needs to find all users that follow a particular user as well as an access pattern to find all of the users that a given user follows.
Because of this, we’ll use a composite primary key with both a HASH and RANGE value. The composite primary key will give us the Query ability on the HASH key to satisfy one of the query patterns we need. In the DynamoDB API specification, the partition key is called HASH and the sort key is called RANGE, and in this guide we will use the API terminology interchangeably and especially when we discuss the code or DynamoDB JSON wire format.
Note that the one-to-one entity -- User -- doesn’t have a natural property for the RANGE value. Because it’s a one-to-one mapping, the access patterns will be a basic key-value lookup. Since your table design requires a RANGE property, you can provide a filler value for the RANGE key.
With this in mind, let’s use the following pattern for HASH and RANGE values for each entity type:
Entity HASH RANGE User USER#<USERNAME>
#METADATA#<USERNAME>
Photo USER#<USERNAME>
PHOTO#<USERNAME>#<TIMESTAMP>
Reaction REACTION#<USERNAME>#<TYPE>
PHOTO#<USERNAME>#<TIMESTAMP>
Friendship USER#<USERNAME>
#FRIEND#<FRIEND_USERNAME>
Let’s walk through the preceding table.
First, for the User entity, the HASH value will be USER#<USERNAME>. Notice that you’re using a prefix to identify the entity and prevent any possible collisions across entity types.
For the RANGE value on the User entity, we’re using a static prefix of #METADATA# followed by the username value. For the RANGE value, it’s important that you have a value that is known, such as the username. This allows for single-item actions such as GetItem, PutItem, and DeleteItem.
However, you also want a RANGE value with different values across different User entities to enable even partitioning if you use this column as a HASH key for an index. For that reason, you append the username to the RANGE key.
Second, the Photo entity is a child entity of a particular User entity. The main access pattern for photos is to retrieve photos for a user ordered by date. Whenever you need something ordered by a particular property, you will need to include that property in your RANGE key to allow for sorting. For the Photo entity, use the same HASH key as the User entity, which will allow you to retrieve both a user profile and the user’s photos in a single request. For the RANGE key, use PHOTO#<USERNAME>#<TIMESTAMP> to uniquely identify a photo in your table.
Third, the Reaction entity is a child entity of a particular Photo entity. There is a one-to-many relationship to the Photo entity and thus will use similar reasoning as with the Photo entity. In the next module, you will see how to retrieve a photo and all of its reactions in a single query using a secondary index. For now, note that the RANGE key for a Reaction entity is the same pattern as the RANGE key for a Photo entity. For the HASH key, we use the username of the user that is creating the reaction as well as the type of reaction applied. Appending the type of reaction allows a user to add multiple reaction types to a single photo.
Finally, the Friendship entity uses the same HASH key as the User entity. This will allow you to fetch both the metadata for a user plus all of the user’s followers in a single query. The RANGE key for a Friendship entity is #FRIEND#<FRIEND_USERNAME>. In Step 4 below, you will learn why to prepend the Friendship entity’s RANGE key with a “#”.
In the next step, we create a table with this primary key design.
-
Create a table
Now that we have designed the primary key, let’s create a table.
The code you downloaded in Step 3 of Module 1 includes a Python script in the scripts/ directory named create_table.py. The Python script’s contents are as follows:
import boto3 dynamodb = boto3.client('dynamodb') try: dynamodb.create_table( TableName='quick-photos', AttributeDefinitions=[ { "AttributeName": "PK", "AttributeType": "S" }, { "AttributeName": "SK", "AttributeType": "S" } ], KeySchema=[ { "AttributeName": "PK", "KeyType": "HASH" }, { "AttributeName": "SK", "KeyType": "RANGE" } ], ProvisionedThroughput={ "ReadCapacityUnits": 5, "WriteCapacityUnits": 5 } ) print("Table created successfully.") except Exception as e: print("Could not create table. Error:") print(e)
The preceding script uses the CreateTable operation using Boto 3, the AWS SDK for Python. The operation declares two attribute definitions, which are typed attributes to be used in the primary key. Though DynamoDB is schemaless, you must declare the names and types of attributes that are used for primary keys. The attributes must be included on every item that is written to the table and thus must be specified as you are creating a table.
Because you’re storing different entities in a single table, your primary key can’t use attribute names like UserId. The attribute means something different based on the type of entity being stored. For example, the primary key for a user might be its USERNAME, and the primary key for a reaction might be its TYPE. Accordingly, we use generic names for the attributes -- PK (for partition key) and SK (for sort key).
After configuring the attributes in the key schema, we specify the provisioned throughput for the table. DynamoDB has two capacity modes: provisioned and on-demand. In provisioned capacity mode, you specify exactly the amount of read and write throughput you want. You pay for this capacity whether you use it or not.
In DynamoDB on-demand capacity mode, you can pay per request. The cost per request is slightly higher than if you were to use provisioned throughput fully, but you don’t have to spend time doing capacity planning or worrying about getting throttled. On-demand mode works great for spiky or unpredictable workloads. We’re using provisioned capacity mode in this lab because it fits within the DynamoDB free tier.
To create the table, run the Python script with the following command.
python scripts/create_table.py
The script should return this message: “Table created successfully.”
In the next step, we bulk-load some example data into the table.
-
Bulk-load data into the table
In this step, we’re going to bulk load some data into the DynamoDB table we created in the preceding step. This means that in succeeding steps, we will have sample data to use.
In the scripts/ directory, there is a file called items.json. This file contains 967 sample items that were randomly generated for our project. These items include User, Photo, Friendship, and Reaction entities. You can open the file if you want to see some of the example data.
The scripts/ directory also has a file called bulk_load_table.py that will read the items in items.json and bulk write them to the DynamoDB table. The contents of that file are as follows:
import json import boto3 dynamodb = boto3.resource('dynamodb') table = dynamodb.Table('quick-photos') items = [] with open('scripts/items.json', 'r') as f: for row in f: items.append(json.loads(row)) with table.batch_writer() as batch: for item in items: batch.put_item(Item=item)
In this script, rather than using the low-level client in Boto 3, we use a higher-level Resource object. Resource objects provide an easier interface for using the AWS APIs. The Resource object is useful in this situation because it batches our requests. The BatchWriteItem API operation accepts up to 25 items in a single request. The Resource object will handle that batching for us rather than making us chop up our data into requests of 25 items or less
Run the bulk_load_table.py script and load your table with data by running the following command in the terminal.
python scripts/bulk_load_table.py
You can ensure that all your data was loading by running a Scan operation and returning the count.
Run the following command to use the AWS CLI to get the count:
aws dynamodb scan \ --table-name quick-photos \ --select COUNT
This should display the following results.
{ "Count": 967, "ScannedCount": 967, "ConsumedCapacity": null }
You should see a Count of 967, indicating all of your items were loading successfully.
In the next step, we show how to retrieve multiple entity types in a single request, which can reduce the total network requests you make in your application and enhance application performance.
-
Retrieve multiple entity types in a single request
As we said in the previous module, you should optimize DynamoDB tables for the number of requests it receives. We also mentioned that DynamoDB does not have joins that a relational database has. Instead, you design your table to allow for join-like behavior in your requests.
In this step, we’ll see how to retrieve multiple entity types in a single request. In our application, we may want to fetch information about a user. This would include all of the information in the user’s profile on the User entity as well as all of the photos that have been uploaded by a user.
This request spans two entity types -- the User entity and the Photo entity. However, this doesn’t mean we need to make multiple requests.
In the code you downloaded, there is a file in the application/ directory called fetch_user_and_photos.py. This script shows how you can structure your code to retrieve both a User entity and the Photo entities that were uploaded by the user in a single request.
The following code composes the fetch_user_and_photos.py script
import boto3 from entities import User, Photo dynamodb = boto3.client('dynamodb') USER = "jacksonjason" def fetch_user_and_photos(username): resp = dynamodb.query( TableName='quick-photos', KeyConditionExpression="PK = :pk AND SK BETWEEN :metadata AND :photos", ExpressionAttributeValues={ ":pk": { "S": "USER#{}".format(username) }, ":metadata": { "S": "#METADATA#{}".format(username) }, ":photos": { "S": "PHOTO$" }, }, ScanIndexForward=True ) user = User(resp['Items'][0]) user.photos = [Photo(item) for item in resp['Items'][1:]] return user user = fetch_user_and_photos(USER) print(user) for photo in user.photos: print(photo)
At the top, we import the Boto 3 library and some simple classes to represent the objects in our application code. You can see the definitions for those entities in the application/entities.py file if you’re interested.
The real work is happening in the fetch_user_and_photos function that’s defined in the module. This is similar to a function you would define in your application to be used by any endpoints that need this data.
In this function, you first make a Query request to DynamoDB. The Query specifies a HASH key of USER#<Username> to isolate the returned items to a particular user.
Then, the Query specifies a RANGE key condition expression that is between #METADATA#<Username> and PHOTO$. This Query will return a User entity, as its sort key is #METADATA#, as well as all of the Photo entities for this user, whose sort keys start with PHOTO#. Sort keys of the String type are sorted by ASCII character codes. The dollar sign ($) comes directly after the pound sign (#) in ASCII, so this ensures that we will get all of the Photo entities.
Once we receive a response, we then assemble our items into objects known by our application. We know that the first item returned will be our User entity, so we create a User object from the item. For the remaining items, we create a Photo object for each one and then attach the array of users to the User object.
The end of the script shows the usage of the function and prints out the resulting objects. You can run the script in your terminal with the following command.
python application/fetch_user_and_photos.py
It should print the User object and all Photo objects to the console:
User<jacksonjason -- John Perry> Photo<jacksonjason -- 2018-05-30T15:42:38> Photo<jacksonjason -- 2018-06-09T13:49:13> Photo<jacksonjason -- 2018-06-26T03:59:33> Photo<jacksonjason -- 2018-07-14T10:21:01> Photo<jacksonjason -- 2018-10-06T22:29:39> Photo<jacksonjason -- 2018-11-13T08:23:00> Photo<jacksonjason -- 2018-11-18T15:37:05> Photo<jacksonjason -- 2018-11-26T22:27:44> Photo<jacksonjason -- 2019-01-02T05:09:04> Photo<jacksonjason -- 2019-01-23T12:43:33> Photo<jacksonjason -- 2019-03-03T02:00:01> Photo<jacksonjason -- 2019-03-03T18:20:10> Photo<jacksonjason -- 2019-03-11T15:18:22> Photo<jacksonjason -- 2019-03-30T02:28:42> Photo<jacksonjason -- 2019-04-14T21:52:36>
This script shows how you can model your table and write your queries to retrieve multiple entity types in a single DynamoDB request. In a relational database, you use joins to retrieve multiple entity types from different tables in a single request. With DynamoDB, you specifically model your data, so that entities you should access together are located next to each other in a single table. This approach replaces the need for joins in a typical relational database and keeps your application high-performing as you scale up.
Conclusion
In this module, we designed a primary key and created a table. Then, we bulk-loaded data into the table and saw how to query for multiple entity types in a single request.
With our current primary key design, we are able to satisfy the following access patterns:
- Create user profile (Write)
- Update user profile (Write)
- Get user profile (Read)
- Upload photo (Write)
- View photos for User (Read)
- View friends for a user (Read)
In the next module, we will add a secondary index and learn about the inverted index technique. Secondary indexes allow you to support additional access patterns on your DynamoDB table.