Handle conditional write errors in high concurrency scenarios with Amazon DynamoDB

We are excited to announce a new feature in Amazon DynamoDB that enhances the developer experience by simplifying the handling of ConditionalCheckFailedException. The new ReturnValuesOnConditionCheckFailure parameter for single write operations lets you return a copy of an item as it was during a failed write attempt, reducing the need for a read request if you want to investigate the cause of the failure and retry your request.

Conditional checks have always been a powerful tool in DynamoDB, allowing you to apply conditions when performing item modifications such as put, update, or delete operations. However, until now, receiving a ConditionalCheckFailedException would provide only minimal information, requiring an additional read operation to understand the specific cause of the failure.

Now, with the new ReturnValuesOnConditionCheckFailure parameter, you have the option to return the item, as it was at the time of failure, in the response message. There are several use cases where this parameter can help. In this post we dive deeper into how it can be used with counters to improve efficiency when dealing with concurrent updates.

Overview of condition checks

Environments with a high number of concurrent updates require a mechanism to ensure that a record is not altered with out-of-order updates coming from multiple users. Counters with conditional writes can effectively provide the versioning information to ensure that only valid writes succeed.

We see counters implemented in DynamoDB across customers of all sizes and industries. A common implementation is to use optimistic locking with condition checks in DynamoDB as a concurrency control mechanism to handle concurrent updates to items in a table. One example of a concurrency control mechanism is optimistic locking which helps prevent conflicts and maintains data integrity by allowing only the intended changes to be applied when multiple clients attempt to change the same item. When using optimistic locking, each item in DynamoDB has a version attribute, typically a numeric value. Before updating an item, the client retrieves the current version value. When making the update request, the client includes a condition that the version must match the previously retrieved value. If the version matches, showing that the item hasn’t been changed, the update may proceed, and the version is incremented. However, if the version doesn’t match, it shows that another client has already changed the item, and the update request is rejected and returns a ConditionalCheckFailedException, requiring the client to get the latest version by reading the item and reattempting the update. This acts as a validation check to ensure that the item hasn’t been changed by another client since the version was obtained.

Now, to minimize time and conserve Read Capacity Units, developers can use the ReturnValuesOnConditionCheckFailure parameter, which enables DynamoDB to return the item’s state at the moment the condition check failed. This functionality empowers you to promptly assess the cause of the condition failure and make informed decisions regarding whether to proceed with the intended update or change your approach.

ReturnValuesOnConditionCheckFailure in action

Imagine we are modeling a video game where we have two teams, with 100 players each competing against each other side to side. Each team needs to create different sandwiches as a challenge. The more sandwiches you produce, the faster the challenges will appear. Some players will be cooks and most of the others will collect the required ingredients to create the sandwich. For example, we need two slices of bread, two lettuce leaves, two tomato slices and two bacon strips to produce a BLT.

Because we have several cooks, they will consume the ingredients as they produce sandwiches. When the team doesn’t have enough ingredients to create the sandwich, the game needs to show the team what food they are missing so the collectors can get it. The game calculates a team score, so they can compete against other teams on the leaderboard, but also it provides individual scores to identify who was the best player in the team.

Because each player is motivated to get the highest score, they are inclined to grab all the sandwich orders or get all the ingredients the cooks require. Each sandwich is identified by a sandwich ID and it’s unique for the team, which makes it impossible for two teams to make the same sandwich. When one team receives the confirmation, the sandwich is made, the other cooks won’t have the possibility of making the sandwich.

In our cooking game example, we’ll store the distinct entities together by using a DynamoDB design concept called “single table design.” In single table designs, multiple entities exist in the same table and are identified by their unique partition key structure. For example, we’ll use GID#<game_id> to uniquely each game, where GID# is a static prefix on each item in the table that represents “Game Identifier.” The prefix is then concatenated with the hash (#) symbol and a twenty-one characters nanoid unique string that represents the <game_id>.

We’ll also use item collections to efficiently query the products gathered and sandwiches created by each team. We can use a sort key to implement this design pattern, where TID# is a static prefix that represents “Team Identifier” concatenated with the hash symbol and <team_id>, which is also an eight characters nanoid unique string (TID#<team_id>). Thus, each partition key (GID#<game_id>) will have multiple related items grouped under it, allowing us to more efficiently query all the items for a given game.

The single table design approach allows us to store data for other entities in the table, too. For example, all the team members’ information, which we earlier used as a sort key, can also be its own entity by creating a partition key for each TID#<team_id>. We can do the same for each role on the team, such as a “cook” or “gatherer” role, by creating a <role>#<user_id> sort key. This combination of partition keys and sort keys allow us to efficiently retrieve all the users in a team by using the Query operation. It is possible to get a more granular result and retrieve only the team members that are cooks by using the “begins with” key condition expression.

Finally, in our example, we also use the sandwich entity, which it will be represented by SID#<sandwich_id> as the partition key and sort key, where SID# is a static prefix that represents “Sandwich Identifier” concatenated with the hash symbol and sandwich_id, which is an eight characters nanoid unique string. This entity will contain the ingredients we need to make the sandwich.

Let’s pretend we are one player with the cook role, and we are about to make a BLT sandwich. The game will tell us for every sandwich how many and which ingredients we need to make it. We need to first read from our team’s collected ingredients to validate if we have the required food to produce the sandwich using a GetItem operation. The following image depicts a representation of a game stored as an item within our table, including our available ingredients.

That information is shown to the cook player, but they still need to press the Make button. This is where concurrency enters the picture. There is a high possibility the cooks will have the same information when they read, but only the first cook will get the sandwich done.

In this scenario, we need to ensure the sandwich is made only if there are enough ingredients to do so. This is where Conditional Updates come into the picture, with a condition that we will only update the item as long as the condition expressions is met. The following is a Python snippet for this condition check:

import boto3
import botocore.exceptions

#Constants
TABLE_NAME = "chef-royale"
GAME_ID = "NZzPTNp3XKig5ZnxeyFfQ" # NanoID - Unique game identifier
TEAM_ID = "GA5NPQ9g"              # Team's unique identifier
USER_ID = "h3PkZ3pek88ezIhK3ORvl" # NanoID - Unique user identifier
SANDWICH_ID = "mVeOsKX_"          # Unique identifier per sandwitch

#Init resources
table = boto3.resource("dynamodb").Table(TABLE_NAME)
primary_key = {"PK": f"GID#{GAME_ID}", "SK": f"TID#{TEAM_ID}"}

# Obtain how many items does the team has
current = table.get_item(Key=primary_key)

# Try to make the sandiwch
try:
    response = table.update_item(
        Key=primary_key,
        UpdateExpression="ADD sandwiches :this_sandwich SET bacon = bacon + :minus_two, bread_slice = bread_slice + :minus_two, lettuce = lettuce + :minus_two, tomato = tomato + :minus_two",
        ConditionExpression="bacon >= :two AND bread_slice >= :two AND lettuce >= :two AND tomato >= :two AND not contains(sandwiches, :sandwich_id)",
        ExpressionAttributeValues={
            ":sandwich_id": SANDWICH_ID,
            ":minus_two": -2,
            ":two": 2,
            ":this_sandwich": set([SANDWICH_ID])
        },
        ReturnValues="UPDATED_NEW",
    )
    if response["ResponseMetadata"]["HTTPStatusCode"] == 200:
        print("Success!")
        print(f"Updated values {response['Attributes']}")
    
except botocore.exceptions.ClientError as error:
    if error.response["Error"]["Code"] == "ConditionalCheckFailedException":
        print("The conditional expression is not met")
        current_value = error.response.get("Item")
        print(f"Current Value: {current_value}")
        print(f"Detail: {error.response}")
    else:
        print(f"There has been an error. Detail {error.response}")

With this code, when the conditions are met, we remove the amount of ingredients required for the sandwich, but we also add to a String Set the current sandwich_id. This operation can serve as a counter when the game is over (by measuring the length of the set, you can define the team’s score), but also to ensure no double counting. The condition section ensures there are enough items to make the sandwich, and the second condition validates no-one from the same team has already made that sandwich. This ensures consistency and atomicity in the operations, nobody wants a bug where 10 players can get points from the same sandwich!

The first time we run this code, we will indeed make our sandwich as our item has the necessary ingredients, and we should see a result such as the following:

{'bacon': Decimal('4'), 'lettuce': Decimal('0'), 'SK': 'TID#GA5NPQ9g', 'PK': 'GID#NZzPTNp3XKig5ZnxeyFfQ', 'tomato': Decimal('2'), 'bread_slice': Decimal('1'), 'sandwiches': {'8MPyZK63', 'mVeOsKX_'}}

We now have two sandwiches made, and we have consumed the ingredients required to make one of them

Assume now we have another player that still has the same “read” information who chooses the Cook button. From our script perspective, that means they’re trying to make the same sandwich. This time, the script produces the following error:

The conditional expression is not met
Detail: {'Error': {'Message': 'The conditional request failed', 'Code': 'ConditionalCheckFailedException'}, 'ResponseMetadata': {'RequestId': '1G1HBHH898VK5KK8QHPP8J4SGFVV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 400, 'HTTPHeaders': {'server': 'Server', 'date': 'Thu, 22 Jun 2023 20:39:33 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '120', 'connection': 'keep-alive', 'x-amzn-requestid': '1G1HBHH898VK5KK8QHPP8J4SGFVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '396270901'}, 'RetryAttempts': 0}, 'message': 'The conditional request failed'}

We know the player doesn’t meet the requirements to make the sandwich, but what condition was the one that failed? Are we short on lettuce? Did someone eat the bread? Or was it the bacon? Could it be possible that someone made the sandwich before? The only way to know is to issue another GetItem operation, because the information the player read only a couple of seconds ago is now stale.

With the introduction of ReturnValuesOnConditionCheckFailure feature, now we can validate the reason for the failure because we get the current item’s value as it is stored in the DynamoDB table. In the same exception logic handling, we now have access to the current item’s value and we can check which condition failed. The following updated code snippet that allows us to see the attributes’ values:

try:
    response = table.update_item(
        Key=primary_key,
        UpdateExpression="ADD sandwiches :this_sandwich SET bacon = bacon + :minus_two, bread_slice = bread_slice + :minus_two, lettuce = lettuce + :minus_two, tomato = tomato + :minus_two",
        ConditionExpression="bacon >= :two AND bread_slice >= :two AND lettuce >= :two AND tomato >= :two AND not contains(sandwiches, :sandwich_id)",
        ExpressionAttributeValues={
            ":sandwich_id": SANDWICH_ID,
            ":minus_two": -2,
            ":two": 2,
            ":this_sandwich": set([SANDWICH_ID])
        },
        ReturnValues="UPDATED_NEW",
        ReturnValuesOnConditionCheckFailure="ALL_OLD"
    )
    if response["ResponseMetadata"]["HTTPStatusCode"] == 200:
        print("Success!")
        print(f"Updated values {response['Attributes']}")
    
except botocore.exceptions.ClientError as error:
    if error.response["Error"]["Code"] == "ConditionalCheckFailedException":
        print("The conditional expression is not met")
        current_value = error.response.get("Item")
        print(f"Current Value: {current_value}")
        print(f"Detail: {error.response}")
    else:
        print(f"There has been an error. Detail {error.response}")

When run, we will see an error like the following, where we can understand that we ran out of lettuce, are short on bread, and someone made that sandwich before this request:

The conditional expression is not met
Current Value: {'bacon': {'N': '4'}, 'lettuce': {'N': '0'}, 'SK': {'S': 'TID#GA5NPQ9g'}, 'PK': {'S': 'GID#NZzPTNp3XKig5ZnxeyFfQ'}, 'tomato': {'N': '2'}, 'bread_slice': {'N': '1'}, 'sandwiches': {'SS': ['8MPyZK63', 'mVeOsKX_']}}
Detail: {'Error': {'Message': 'The conditional request failed', 'Code': 'ConditionalCheckFailedException'}, 'ResponseMetadata': {'RequestId': 'ICRV52MCH3P3TH0OGTOOM2PICRVV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 400, 'HTTPHeaders': {'server': 'Server', 'date': 'Thu, 22 Jun 2023 21:12:25 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '319', 'connection': 'keep-alive', 'x-amzn-requestid': 'ICRV52MCH3P3TH0OGTOOM2PICRVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '933435066'}, 'RetryAttempts': 0}, 'Item': {'bacon': {'N': '4'}, 'lettuce': {'N': '0'}, 'SK': {'S': 'TID#GA5NPQ9g'}, 'PK': {'S': 'GID#NZzPTNp3XKig5ZnxeyFfQ'}, 'tomato': {'N': '2'}, 'bread_slice': {'N': '1'}, 'sandwiches': {'SS': ['8MPyZK63', 'mVeOsKX_']}}}

Thanks to this new feature, application developers will save time since they don’t have to execute the extra GetItem operation required to understand what is the current value, reducing the number of read operations and also reducing the application logic required to run this scenario.

Access Control

In line with the AWS best practices of providing least privilege access to your AWS resources, the feature ReturnValuesOnConditionCheckFailure in DynamoDB follows the same approach. This feature allows granular control over permissions by focusing on the scope of the ReturnValues parameter. By carefully configuring the permissions related to ReturnValues, administrators can choose to either allow or deny the ability for DynamoDB to return the item as it existed at the moment of a condition check failure.

Applying the principle of least privilege ensures that users and applications are granted only the necessary permissions to perform their required actions. In the context of ReturnValuesOnConditionCheckFailure, this means that administrators can restrict access to the item’s previous state on a conditional check failure, providing an extra layer of security and preventing unauthorized viewing of sensitive data.

To configure the permission for ReturnValuesOnConditionCheckFailure, administrators can set the IAM condition dynamodb:ReturnValues to either None which in turn means it will not return a value, or ALL_OLD which would allow the value to be returned to the user. The below sample IAM policy denies the ability for the item to be returned on condition failure on PutItem or UpdateItem for a specific table:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "DenyReturnValues",
            "Effect": "Allow",
            "Action": [
                "dynamodb:PutItem",
                "dynamodb:UpdateItem"
            ],
            "Resource": "arn:aws:dynamodb:us-east-1:555555555555:table/ExampleTable",
            "Condition": {
                "StringEqualsIfExists": {
                    "dynamodb:ReturnValues": [
                        "NONE"
                    ]
                }
            }
        }
    ]
}

Additional resources

To successfully run the code as specified in this post, you need to use the Python SDK version 1.26.158. If you’re using any other SDK, remember to download the latest API version, so you can take advantage of ReturnValuesOnConditionCheckFailure. All the programing languages will return the item as part of the exception body. For more information about this feature, refer to the UpdateItem API reference documentation.

Conclusion

By incorporating the ReturnValuesOnConditionCheckFailure parameter, you can reduce additional read operations and simplify error handling. You can now retrieve detailed information directly from the server side when a ConditionalCheckFailedException occurs, providing you with increased efficiency and improved decision-making. To get started, add the new parameter to your PutItem, UpdateItem, or DeleteItem operations and set the value to ALL_OLD. You can use your favorite coding language in our getting started guide.

About the Authors

Esteban Serna is a Senior DynamoDB Specialist Solutions Architect. Esteban has been working with databases for the last 15 years, helping customers choose the right architecture to match their needs. Fresh from university he worked deploying the infrastructure required to support contact centers in distributed locations. When NoSQL databases were introduced, he fell in love with them and decided to focus on them because centralized computing was no longer the norm. Today, Esteban is focusing on helping customers design distributed massive scale applications that require single digit-millisecond latency using DynamoDB. Some people say he is an open book and he loves to share his knowledge with others.

Lee Hannigan is a Senior DynamoDB Specialist Solutions Architect. Lee has been a DynamoDB specialist for the past 4 years, with a strong background in big data technologies. With valuable insights gained from working with innovative startups, Lee brings a wealth of knowledge to his AWS customers in EMEA. He is passionate about helping AWS customers scale their applications, and his expertise lies in using DynamoDB and serverless technologies to achieve optimal performance and efficiency. By providing tailored solutions and guidance, Lee has successfully assisted hundreds of organizations in unlocking the full potential of DynamoDB and embracing serverless architectures. With a customer-centric approach and a deep understanding of AWS services, Lee is dedicated to empowering businesses to thrive in the world of cloud computing.

Kevin Willis is a Senior Product Manager on the DynamoDB team. He is focused on elevating the developer experience. With over a decade of experience in relational OLTP systems, he has become passionate about helping people with a similar background to get up to speed on using Amazon DynamoDB at scale.

AWS Database Blog

Handle conditional write errors in high concurrency scenarios with Amazon DynamoDB

Overview of condition checks

ReturnValuesOnConditionCheckFailure in action

Access Control

Additional resources

Conclusion

About the Authors

Resources

Blog Topics

Follow

Learn

Resources

Developers

Help