One of the biggest adjustments for users who are new to DynamoDB and NoSQL is how to model data to filter across an entire dataset. For example, in our game, we need to find game sessions with open spots so that we can show users which game session they can join.
In a relational database, you would write some SQL to query the data.

SELECT * FROM games
	WHERE status = “OPEN”

DynamoDB can filter results on a Query or Scan operation, but DynamoDB doesn’t work like a relational database. A DynamoDB filter applies after the initial items that match the Query or Scan operation have been retrieved. The filter reduces the size of the payload sent from the DynamoDB service, but the number of items retrieved initially is subject to the DynamoDB size limits.

Fortunately, there are a number of ways you can allow filtered queries against your dataset in DynamoDB. To provide efficient filters on your DynamoDB table, you need to plan the filters into your table’s data model from the beginning. Remember the lesson we learned in the second module of this lab: Consider your access patterns, and then design your table.

In the following steps, we use a global secondary index to find open games. Specifically, we will use the sparse index technique to handle this access pattern.

Time to Complete Module: 40 Minutes


  • Step 1: Model a sparse secondary index

    Secondary indexes are crucial data modeling tools in DynamoDB. They allow you to reshape your data to allow for alternate query patterns. To create a secondary index, you specify the primary key of the index, just like when you previously created a table. Note that the primary key for a global secondary index does not have to be unique for each item. DynamoDB then copies items into the index based on the attributes specified, and you can query it just like you do the table.

    Using sparse secondary indexes is an advanced strategy in DynamoDB. With secondary indexes, DynamoDB copies items from the original table only if they have the elements of the primary key in the secondary index. Items that don’t have the primary key elements are not copied, which is why these secondary indexes are called “sparse.”

    Let’s see how this plays out for us. You might remember that we have two access patterns for finding open games:

    • Find open games (Read)
    • Find open games by map (Read)

    We can create a secondary index using a composite primary key where the HASH key is the map attribute for the game and the RANGE key is the open_timestamp attribute for the game, indicating the time the game was opened.

    The important part for us is that when a game becomes full, the open_timestamp attribute is deleted. When the attribute is deleted, the filled game is removed from the secondary index because it doesn’t have a value for the RANGE key attribute. This is what keeps our index sparse: It includes only open games that have the open_timestamp attribute.

    In the next step, we create the secondary index.

  • Step 2: Create a sparse secondary index

    In this step, we create the sparse secondary index for open games (games that are not full already).

    Creating a secondary index is similar to creating a table. In the code you downloaded, you will find a script file in the scripts/ directory named add_secondary_index.py. The contents of that file are the following.

    import boto3
    
    dynamodb = boto3.client('dynamodb')
    
    try:
        dynamodb.update_table(
            TableName='battle-royale',
            AttributeDefinitions=[
                {
                    "AttributeName": "map",
                    "AttributeType": "S"
                },
                {
                    "AttributeName": "open_timestamp",
                    "AttributeType": "S"
                }
            ],
            GlobalSecondaryIndexUpdates=[
                {
                    "Create": {
                        "IndexName": "OpenGamesIndex",
                        "KeySchema": [
                            {
                                "AttributeName": "map",
                                "KeyType": "HASH"
                            },
                            {
                                "AttributeName": "open_timestamp",
                                "KeyType": "RANGE"
                            }
                        ],
                        "Projection": {
                            "ProjectionType": "ALL"
                        },
                        "ProvisionedThroughput": {
                            "ReadCapacityUnits": 1,
                            "WriteCapacityUnits": 1
                        }
                    }
                }
            ],
        )
        print("Table updated successfully.")
    except Exception as e:
        print("Could not update table. Error:")
        print(e)

    Whenever attributes are used in a primary key for a table or secondary index, they must be defined in AttributeDefinitions. Then, we Create a new secondary index in the GlobalSecondaryIndexUpdates property. For this secondary index, we specify the index name, the schema of the primary key, the provisioned throughput, and the attributes we want to project.

    Note that we did not have to specify that our secondary index is intended to be used as a sparse index. That is purely a function of the data you put in. If you write items to your table that do not have the attributes for your secondary indexes, they will not be included in your secondary index.

    Create your secondary index by running the following command.

    python scripts/add_secondary_index.py

    You should see the following message in the console: “Table updated successfully.”

    In the next step, we use the sparse index to find open games by map.

  • Step 3: Query the sparse secondary index

    Now that we have configured the secondary index, let’s use it to satisfy some of the access patterns.

    To use a secondary index, you have two API calls available: Query and Scan. With Query, you must specify the HASH key, and it returns a targeted result. With Scan, you don’t specify a HASH key, and the operation runs across your entire table. Scans are discouraged in DynamoDB except in specific circumstances because they access every item in your database. If you have a significant amount of data in your table, scanning can take a very long time. In the next step, we show you why Scans can be a powerful tool when used with sparse indexes.

    We can use the Query API against the secondary index we created in the previous step to find all open games by map name. The secondary index is partitioned by map name, allowing us to make targeted queries to find open games.

    In the code you downloaded, a find_open_games_by_map.py file is in the application/ directory. The contents of this script follow.

    import boto3
    
    from entities import Game
    
    dynamodb = boto3.client('dynamodb')
    
    def find_open_games_by_map(map_name):
        resp = dynamodb.query(
            TableName='battle-royale',
            IndexName="OpenGamesIndex",
            KeyConditionExpression="#map = :map",
            ExpressionAttributeNames={
                "#map": "map"
            },
            ExpressionAttributeValues={
                ":map": { "S": map_name },
            },
            ScanIndexForward=True
        )
    
        games = [Game(item) for item in resp['Items']]
    
        return games
    
    games = find_open_games_by_map("Green Grasslands")
    for game in games:
        print(game)

    In the preceding script, the find_open_games_by_map function is similar to a function you would have in your application. The function accepts a map name and makes a query against the OpenGamesIndex to find all open games for the map. It then assembles the returned entities into Game objects that can be used in your application.

    Execute this script by running the following command in your terminal.

    python application/find_open_games_by_map.py

    The terminal will show the following output with four open games for the Green Grasslands map.

    Open games for Green Grasslands:
    Game<14c7f97e-8354-4ddf-985f-074970818215 -- Green Grasslands>
    Game<3d4285f0-e52b-401a-a59b-112b38c4a26b -- Green Grasslands>
    Game<683680f0-02b0-4e5e-a36a-be4e00fc93f3 -- Green Grasslands>
    Game<0ab37cf1-fc60-4d93-b72b-89335f759581 -- Green Grasslands>
    sudo cp -r wordpress/* /var/www/html/

    In the next step, we use the Scan API to scan the sparse secondary index.

  • Step 4: Scan the sparse secondary index

    In the previous step, we saw how to find games for a particular map. Some players may prefer to play a specific map, so this is useful. Other players may be willing to play a game at any map. In this section, we show how to find any open game in the application, regardless of the type of map. To do this, we use the Scan API.

    In general, you do not want to design your table to use the DynamoDB Scan operation because DynamoDB is built for surgical queries that grab the exact entities you need. A Scan operation grabs a random collection of entities across your table, so finding the entities you need can require multiple round trips to the database.

    However, sometimes Scan can be useful. In our situation, we have a sparse secondary index, meaning that our index shouldn’t have that many entities in it. In addition, the index includes only those games that are open, and that is exactly what we need.

    For this use case, Scan works great. Let’s see how it works. In the code you downloaded, a find_open_games.py file is in the application/ directory. The contents of the file follow.

    import boto3
    
    from entities import Game
    
    dynamodb = boto3.client('dynamodb')
    
    def find_open_games():
        resp = dynamodb.scan(
            TableName='battle-royale',
            IndexName="OpenGamesIndex",
        )
    
        games = [Game(item) for item in resp['Items']]
    
        return games
    
    games = find_open_games()
    print("Open games:")
    for game in games:
        print(game)

    This code is similar to the code in the previous step. However, rather than using the query() method on the DynamoDB client, we use the scan() method. Because we’re using scan(), we don’t need to specify anything about the key conditions like we did with query(). We’re just having DynamoDB return a bunch of items in no specific order.

    Run the script with the following command in your terminal.

    python application/find_open_games.py

    Your terminal should print a list of nine games that are open across a variety of maps.

    Open games:
    Game<c6f38a6a-d1c5-4bdf-8468-24692ccc4646 -- Urban Underground>
    Game<d06af94a-2363-441d-a69b-49e3f85e748a -- Dirty Desert>
    Game<873aaf13-0847-4661-ba26-21e0c66ebe64 -- Dirty Desert>
    Game<fe89e561-8a93-4e08-84d8-efa88bef383d -- Dirty Desert>
    Game<248dd9ef-6b17-42f0-9567-2cbd3dd63174 -- Juicy Jungle>
    Game<14c7f97e-8354-4ddf-985f-074970818215 -- Green Grasslands>
    Game<3d4285f0-e52b-401a-a59b-112b38c4a26b -- Green Grasslands>
    Game<683680f0-02b0-4e5e-a36a-be4e00fc93f3 -- Green Grasslands>
    Game<0ab37cf1-fc60-4d93-b72b-89335f759581 -- Green Grasslands>
    

    In this step, we saw how using the Scan operation can be the right choice in specific circumstances. We used Scan to grab an assortment of entities from our sparse secondary index to show open games to players.

    In the next steps, we satisfy two access patterns:

    • Join game for a user (Write)
    • Start game (Write)

    To satisfy the “Join game for a user” access pattern in the following steps, we’re going to use DynamoDB transactions. Transactions are popular in relational systems for operations that affect multiple data elements at once. For example, imagine you are running a bank. One customer, Alejandra, transfers $100 to another customer, Ana. When recording this transaction, you would use a transaction to make sure the changes are applied to the balances of both customers rather than just one.

    DynamoDB transactions make it easier to build applications that alter multiple items as part of a single operation. With transactions, you can operate on up to 10 items as part of a single transaction request.

    In a TransactWriteItem API call, you can use the following operations:

    • Put: For inserting or overwriting an item.
    • Update: For updating an existing item.
    • Delete: For removing an item.
    • ConditionCheck: For asserting a condition on an existing item without altering the item.

     

    In the next step, we use a DynamoDB transaction when adding new users to a game while preventing the game from becoming overfilled.

  • Step 5: Add users to a game

    The first access pattern we address in this module is adding new users to a game.

    When adding a new user to a game, we need to:

    • Confirm that there are not already 50 players in the game (each game can have a maximum of 50 players).
    • Confirm that the user is not already in the game.
    • Create a new UserGameMapping entity to add the user to the game.
    • Increment the people attribute on the Game entity to track how many players are in the game.

    Note that accomplishing all of these things requires write actions across the existing Game entity and the new UserGameMapping entity as well as conditional logic for each of the entities. This is the kind of operation that is a perfect fit for DynamoDB transactions because you need to work on multiple entities in the same request, and you want the entire request to succeed or fail together.

    In the code you downloaded, a join_game.py script is in the application/ directory. The function in that script uses a DynamoDB transaction to add a user to a game.

    The contents of the script follow.

    import boto3
    
    from entities import Game, UserGameMapping
    
    dynamodb = boto3.client('dynamodb')
    
    GAME_ID = "c6f38a6a-d1c5-4bdf-8468-24692ccc4646"
    USERNAME = 'vlopez'
    
    
    def join_game_for_user(game_id, username):
        try:
            resp = dynamodb.transact_write_items(
                TransactItems=[
                    {
                        "Put": {
                            "TableName": "battle-royale",
                            "Item": {
                                "PK": {"S": "GAME#{}".format(game_id) },
                                "SK": {"S": "USER#{}".format(username) },
                                "game_id": {"S": game_id },
                                "username": {"S": username }
                            },
                            "ConditionExpression": "attribute_not_exists(SK)",
                            "ReturnValuesOnConditionCheckFailure": "ALL_OLD"
                        },
                    },
                    {
                        "Update": {
                            "TableName": "battle-royale",
                            "Key": {
                                "PK": { "S": "GAME#{}".format(game_id) },
                                "SK": { "S": "#METADATA#{}".format(game_id) },
                            },
                            "UpdateExpression": "SET people = people + :p",
                            "ConditionExpression": "people <= :limit",
                            "ExpressionAttributeValues": {
                                ":p": { "N": "1" },
                                ":limit": { "N": "50" }
                            },
                            "ReturnValuesOnConditionCheckFailure": "ALL_OLD"
                        }
                    }
                ]
            )
            print("Added {} to game {}".format(username, game_id))
            return True
        except Exception as e:
            print("Could not add user to game")
    
    join_game_for_user(GAME_ID, USERNAME)

    In this script’s join_game_for_user function, the transact_write_items() method performs a write transaction. This transaction has two operations.

    In the transaction’s first operation, we use a Put operation to insert a new UserGameMapping entity. As part of that operation, we specify a condition that the SK attribute should not exist for this entity. This ensures that an entity with this PK and SK doesn’t already exist. If such an entity did already exist, that would mean this user already joined the game.

    The second operation is an Update operation on the Game entity to increment the people attribute by one. As part of this operation, we add a conditional check that the current value of people is not greater than 50. As soon as 50 people join a game, the game is full and ready to begin.

    Run this script with the following command in your terminal.

    python application/join_game.py

    The output in your terminal should indicate that the user was added to the game.

    Added vlopez to game c6f38a6a-d1c5-4bdf-8468-24692ccc4646

    Note that if you try to run the script again, the function fails. User vlopez has been added to the game already, so trying to add the user again does not satisfy the conditions we specified.

    The addition of DynamoDB transactions greatly simplifies the workflow around complex operations like these. Without transactions, this would have required multiple API calls with complex conditions and manual rollbacks in the event of conflicts. Now, we can implement such complex operations with fewer than 50 lines of code.

    In the next step, we handle the “Start game (Write)” access pattern.

  • Step 6: Start a game

    As soon as a game has 50 users, the creator of the game can start the game to initiate gameplay. In this step, we show how to handle this access pattern.

    When our application backend receives a request to start the game, we check three things:

    • The game has 50 people signed up.
    • The requesting user is the creator of the game.
    • The game has not already started.

    We can handle each of these checks in a condition expression in a request to update the game. If all of these checks pass, we need to update our entity in the following ways:

    • Remove the open_timestamp attribute so that it does not appear as an open game in the sparse secondary index from the previous module.
    • Add a start_time attribute to indicate when the game started.

    In the code you downloaded, a start_game.py script is in the application/ directory. The contents of the file are as follows.

    import datetime
    
    import boto3
    
    from entities import Game
    
    dynamodb = boto3.client('dynamodb')
    
    GAME_ID = "c6f38a6a-d1c5-4bdf-8468-24692ccc4646"
    CREATOR = "gstanley"
    
    def start_game(game_id, requesting_user, start_time):
        try:
            resp = dynamodb.update_item(
                TableName='battle-royale',
                Key={
                    "PK": { "S": "GAME#{}".format(game_id) },
                    "SK": { "S": "#METADATA#{}".format(game_id) }
                },
                UpdateExpression="REMOVE open_timestamp SET start_time = :time",
                ConditionExpression="people = :limit AND creator = :requesting_user AND attribute_not_exists(start_time)",
                ExpressionAttributeValues={
                    ":time": { "S": start_time.isoformat() },
                    ":limit": { "N": "50" },
                    ":requesting_user": { "S": requesting_user }
                },
                ReturnValues="ALL_NEW"
            )
            return Game(resp['Attributes'])
        except Exception as e:
            print('Could not start game')
            return False
    
    game = start_game(GAME_ID, CREATOR, datetime.datetime(2019, 4, 16, 10, 15, 35))
    
    if game:
        print("Started game: {}".format(game))

    In this script, the start_game function is similar to the function you would have in your application. It takes a game_id, requesting_user, and start_time, and it runs a request to update the Game entity to start the game.

    The ConditionExpression parameter in the update_item() call specifies each of the three checks that we listed earlier in this step—the game must have 50 people, the user requesting that the game start must be the creator of the game, and the game cannot have a start_time attribute, which would indicate it already started.

    In the UpdateExpression parameter, you can see the changes we want to make to our entity. First we remove the open_timestamp attribute from the entity, and then we set the start_time attribute to the game’s start time.

    Run this script in your terminal with the following command.

    python application/start_game.py

    You should see output in your terminal indicating that the game was started successfully.

    Started game: Game<c6f38a6a-d1c5-4bdf-8468-24692ccc4646 -- Urban Underground>

    Try to run the script a second time in your terminal. This time, you should see an error message that indicates you could not start the game. This is because you have already started the game, so the start_time attribute exists. As a result, the request failed the conditional check on the entity.

    You might recall that there is a many-to-many relationship between the Game entity and the associated User entities, and the relationship is represented by a UserGameMapping entity.

    Often, you want to query both sides of a relationship. With our primary key setup, we can find all the User entities in a Game. We can enable querying all Game entities for a User by using an inverted index.

    In DynamoDB, an inverted index is a secondary index that is the inverse of your primary key. The RANGE key becomes your HASH key and vice versa. This pattern flips your table and allows you to query on the other side of your many-to-many relationships.

    In the following steps, we add an inverted index to the table and show how to use it to retrieve all Game entities for a specific User. 

  • Step 7: Add an inverted index

    In this step, we add an inverted index to the table. An inverted index is created like any other secondary index.

    In the code you downloaded, a add_inverted_index.py script is in the scripts/ directory. This Python script adds an inverted index to your table.

    The contents of that file are as follows.

    import boto3
    
    dynamodb = boto3.client('dynamodb')
    
    try:
        dynamodb.update_table(
            TableName='battle-royale',
            AttributeDefinitions=[
                {
                    "AttributeName": "PK",
                    "AttributeType": "S"
                },
                {
                    "AttributeName": "SK",
                    "AttributeType": "S"
                }
            ],
            GlobalSecondaryIndexUpdates=[
                {
                    "Create": {
                        "IndexName": "InvertedIndex",
                        "KeySchema": [
                            {
                                "AttributeName": "SK",
                                "KeyType": "HASH"
                            },
                            {
                                "AttributeName": "PK",
                                "KeyType": "RANGE"
                            }
                        ],
                        "Projection": {
                            "ProjectionType": "ALL"
                        },
                        "ProvisionedThroughput": {
                            "ReadCapacityUnits": 1,
                            "WriteCapacityUnits": 1
                        }
                    }
                }
            ],
        )
        print("Table updated successfully.")
    except Exception as e:
        print("Could not update table. Error:")
        print(e)
    

    In this script, we call an update_table() method on our DynamoDB client. In the method, we pass details about the secondary index we want to create, including the key schema for the index, the provisioned throughput, and the attributes to project into the index.

    Run the script by typing the following command in your terminal.

    python scripts/add_inverted_index.py

    Your terminal will display output that your index was created successfully.

    Table updated successfully.

    In the next step, we use our inverted index to retrieve all Game entities for a particular User.

  • Step 8: Retrieve games for a user

    Now that we’ve created our inverted index, let’s use it to retrieve the Game entities played by a User. To handle this, we need to query the inverted index with the User whose Game entities we want to see.

    In the code you downloaded, a find_games_for_user.py script is in the application/ directory. The contents of the file are as follows.

    import boto3
    
    from entities import UserGameMapping
    
    dynamodb = boto3.client('dynamodb')
    
    USERNAME = "carrpatrick"
    
    
    def find_games_for_user(username):
        try:
            resp = dynamodb.query(
                TableName='battle-royale',
                IndexName='InvertedIndex',
                KeyConditionExpression="SK = :sk",
                ExpressionAttributeValues={
                    ":sk": { "S": "USER#{}".format(username) }
                },
                ScanIndexForward=True
            )
        except Exception as e:
            print('Index is still backfilling. Please try again in a moment.')
            return None
    
        return [UserGameMapping(item) for item in resp['Items']]
    
    games = find_games_for_user(USERNAME)
    
    if games:
        print("Games played by {}:".format(USERNAME))
        for game in games:
            print(game)
    

    In this script, we have a function called find_games_for_user() that is similar to a function you would have in your game. This function takes a user name and returns all the games played by the given user.

    Run the script in your terminal with the following command.

    python application/find_games_for_user.py

    The script should print all of the games played by the user carrpatrick.

    Games played by carrpatrick:
    UserGameMapping<25cec5bf-e498-483e-9a00-a5f93b9ea7c7 -- carrpatrick -- SILVER>
    UserGameMapping<c6f38a6a-d1c5-4bdf-8468-24692ccc4646 -- carrpatrick>
    UserGameMapping<c9c3917e-30f3-4ba4-82c4-2e9a0e4d1cfd -- carrpatrick>
    

    In this module, we added a secondary index to the table. This satisfied two additional access patterns:

    • Find open games by map (Read)
    • Find open games (Read)

    To accomplish this, we used a sparse index that included only the games that were still open for additional players. We then used both the Query and Scan APIs against the index to find open games.

    Also, we saw how to satisfy two advanced write operations in the application. First, we used DynamoDB transactions when a user joined a game. With transactions, we handled a complex conditional write across multiple entities in a single request.

    Second, we implemented the function for a creator of a game to start a game when it’s ready. In this access pattern, we had an update operation that required checking the value of three attributes and updating two attributes. You can express this complex logic in a single request through the power of condition expressions and update expressions.

    Thirds, we satisfied the final access pattern by retrieving all Game entities played by a User. To handle this access pattern, we created a secondary index using the inverted index pattern to allow querying on the other side of the many-to-many relationship between User entities and Game entities.

    In the next module, we clean up the resources we created.