In the previous module, you set up access to your core entities in DynamoDB. The primary key structure solved a number of our main access patterns. This includes all patterns to read or write a single entity, as well as patterns to fetch multiple related entities, such as all photos that belong to a particular user.

In this module, you will learn about using an inverted index, a common design pattern for DynamoDB.

Secondary indexes are crucial data modeling tools in DynamoDB. They allow you to reshape your data to allow for alternate query patterns.

An inverted index is a common secondary index design pattern with DynamoDB. With an inverted index, you create a secondary index that is the inverse of the primary key for your table. The HASH key for your table becomes the RANGE key in your index, and the RANGE key for your table becomes the primary key for your index.

An inverted index is helpful in two scenarios. First, an inverted index is useful to query the “other” side of a many-to-many relationship. This is the case for your Friendship entity. With your primary key structure, you can query all followers for a particular user with a query against the table’s primary key. When you add an inverted index, you will be able to find the users that a user is following (the “followed”) by querying the inverted index.

An inverted index is also useful to query a one-to-many relationship for an entity that is itself the subject of a one-to-many relationship. You can see this with the Reaction entity in your table. There can be multiple reactions on a single photo, and each reaction will include the photo to which the reaction applies, the username of the reacting user, and the type of reaction. However, because a user can have many photos, the main identifier for a photo is in its RANGE key -- PHOTO#<USERNAME>#<TIMESTAMP>. Because of that, you cannot use your primary key to tie reactions to photos.

To satisfy the “View photo and reactions” access pattern, the data model places the photo identifier for a Reaction entity in the RANGE key. Now, when querying the inverted index, you can use the photo identifier to select the photo and all of its reactions in a single request. This is shown in Step 2 below.

Time to Complete Module: 40 Minutes


  • Step 1: Create a secondary index

    To create a secondary index, you specify the primary key of the index, just like when you were creating a table. Note that the primary key for a global secondary index does not have to be unique. DynamoDB then copies your items into the index based on the attributes specified, and you can query it just like your table.

    An inverted index is a common pattern in DynamoDB where you create a secondary index that is the inverse of your table’s primary key. The HASH key for your table is specified as the RANGE key in your secondary index, and the RANGE key for your table is specified as the HASH key in your secondary index.

    Creating a secondary index is similar to creating a table. In the code you downloaded, there’s a file in the scripts/ directory named add_inverted_index.py. The contents of that file are shown below.

    import boto3
    
    dynamodb = boto3.client('dynamodb')
    
    try:
        dynamodb.update_table(
            TableName='quick-photos',
            AttributeDefinitions=[
                {
                    "AttributeName": "PK",
                    "AttributeType": "S"
                },
                {
                    "AttributeName": "SK",
                    "AttributeType": "S"
                }
            ],
            GlobalSecondaryIndexUpdates=[
                {
                    "Create": {
                        "IndexName": "InvertedIndex",
                        "KeySchema": [
                            {
                                "AttributeName": "SK",
                                "KeyType": "HASH"
                            },
                            {
                                "AttributeName": "PK",
                                "KeyType": "RANGE"
                            }
                        ],
                        "Projection": {
                            "ProjectionType": "ALL"
                        },
                        "ProvisionedThroughput": {
                            "ReadCapacityUnits": 5,
                            "WriteCapacityUnits": 5
                        }
                    }
                }
            ],
        )
        print("Table updated successfully.")
    except Exception as e:
        print("Could not update table. Error:")
        print(e)
    

    Whenever attributes are used in a primary key for the table or secondary index, they must be defined in AttributeDefinitions. Then, we Create a new secondary index in the GlobalSecondaryIndexUpdates property. For this secondary index, we specify the index name, the schema of the primary key, the provisioned throughput, and the attributes we want to project.

    Note that an inverted index is a name of a design pattern rather than an official property in DynamoDB. Creating an inverted index is just like creating any other secondary index.

    Create your inverted index by running the command below.

    python scripts/add_inverted_index.py

    You should see the following message in the console: “Table updated successfully.”

    In the next step, we will show how our inverted index can be used to find a photo.

  • Step 2: Query the inverted index to find a photo’s reactions

    Now that we have configured the secondary index, let’s use it to satisfy some of the access patterns.

    To use a secondary index, you only have two API calls available -- Query and Scan. With Query, you must specify the HASH key, and it returns a targeted result. With Scan, you don’t specify a HASH key, and the operation runs across your entire table. Scans are discouraged in DynamoDB except in specific circumstances because they access every item in your database. If you have a significant amount of data in your table, scanning can take a very long time

    We can use the Query API against our secondary index to find all reactions on a particular photo. Like you saw in the previous module, you can use this query to retrieve two types of entities in a single command. In this query, you can retrieve both a photo and its reactions.

    In the code you downloaded, there is a file in the application/ directory called fetch_photo_and_reactions.py. The contents of this script are shown below.

    import boto3
    
    from entities import Photo, Reaction
    
    dynamodb = boto3.client('dynamodb')
    
    USER = "david25"
    TIMESTAMP = '2019-03-02T09:11:30'
    
    
    def fetch_photo_and_reactions(username, timestamp):
        try:
            resp = dynamodb.query(
                TableName='quick-photos',
                IndexName='InvertedIndex',
                KeyConditionExpression="SK = :sk AND PK BETWEEN :reactions AND :user",
                ExpressionAttributeValues={
                    ":sk": { "S": "PHOTO#{}#{}".format(username, timestamp) },
                    ":user": { "S": "USER$" },
                    ":reactions": { "S": "REACTION#" },
                },
                ScanIndexForward=True
            )
        except Exception as e:
            print("Index is still backfilling. Please try again in a moment.")
            return False
    
        items = resp['Items']
        items.reverse()
    
        photo = Photo(items[0])
        photo.reactions = [Reaction(item) for item in items[1:]]
    
        return photo
    
    
    photo = fetch_photo_and_reactions(USER, TIMESTAMP)
    
    if photo:
        print(photo)
        for reaction in photo.reactions:
            print(reaction)
    

    The fetch_photo_and_reactions function is similar to a function you would have in your application. The function accepts a username and timestamp and makes a query against the InvertedIndex to find the photo and reactions for the photo. Then it assembles the returned items into a Photo entity and multiple Reaction entities that can be used in your application.

    python application/fetch_photo_and_reactions.py

    You should see output a photo and its five reactions.

    Photo<david25 -- 2019-03-02T09:11:30>
    Reaction<ylee -- PHOTO#david25#2019-03-02T09:11:30 -- smiley>
    Reaction<kennedyheather -- PHOTO#david25#2019-03-02T09:11:30 -- smiley>
    Reaction<jenniferharris -- PHOTO#david25#2019-03-02T09:11:30 -- +1>
    Reaction<geoffrey32 -- PHOTO#david25#2019-03-02T09:11:30 -- +1>
    Reaction<chasevang -- PHOTO#david25#2019-03-02T09:11:30 -- +1>

    Note that the secondary index takes a moment to backfill. You may get an error message indicating that backfilling is in progress. If so, try again in a few minutes.

    In the next step, we’ll see how to use the inverted index to fetch all the users that a given user is following.

  • Step 3: Find followed users

    In the previous step, you saw how to use an inverted index to fetch a one-to-many relationship for an entity that was itself the subject of a one-to-many relationship. In this step, you will use the inverted index to fetch the “other” side of a many-to-many relationship.

    The primary key in the table is allows you to find all of the followers of a particular user, but it won’t let you find all the users that someone is following. With the inverted index, it’s flipped -- you can find all the users followed by a particular user.

    In the code you downloaded, there is a file in the application/ directory called find_following_for_user.py. The contents of this script follows.

    import boto3
    
    from entities import Friendship
    
    dynamodb = boto3.client('dynamodb')
    
    USERNAME = "haroldwatkins"
    
    
    def find_following_for_user(username):
        resp = dynamodb.query(
            TableName='quick-photos',
            IndexName='InvertedIndex',
            KeyConditionExpression="SK = :sk",
            ExpressionAttributeValues={
                ":sk": { "S": "#FRIEND#{}".format(username) }
            },
            ScanIndexForward=True
        )
    
        return [Friendship(item) for item in resp['Items']]
    
    
    
    follows = find_following_for_user(USERNAME)
    
    print("Users followed by {}:".format(USERNAME))
    for follow in follows:
        print(follow)
    

    The find_following_for_user function is similar to a function you would have in your application. The function accepts a username for whom you want to find the followed users. The function then queries the inverted index to find all Friendship entities where the following user is the given username.

    Run the script by running the following command in your terminal.

    python application/find_following_for_user.py

    Your console should output a list of users followed by the given username:

    Users followed by haroldwatkins:
    Friendship<chasevang -- haroldwatkins>
    Friendship<david25 -- haroldwatkins>
    Friendship<frankhall -- haroldwatkins>
    Friendship<geoffrey32 -- haroldwatkins>
    Friendship<jacksonjason -- haroldwatkins>
    Friendship<natasha87 -- haroldwatkins>
    Friendship<nmitchell -- haroldwatkins>
    Friendship<ppierce -- haroldwatkins>
    Friendship<tmartinez -- haroldwatkins>
    Friendship<vpadilla -- haroldwatkins>

    Note that while this returns all the Friendship entities for a user, the information in a Friendship entity is pretty sparse. It only includes the username of the followed user but not the full user profile. In the next module, we will discuss how to use partial normalization to efficiently handle situations like these.

  • Conclusion

    In this module, we added a secondary index to our table using the inverted index pattern. This satisfied two additional access patterns:

    • View photo and reactions (Read)
    • View followed for user (Read)

    When retrieving all followed users for a user, we saw a problem that each Friendship entity was missing information about the followed user. In the next module, we will see how to use partial normalization to help with this access pattern.