Rate-limiting calls to Amazon DynamoDB using Python Boto3, Part 1

In this post, I present a technique where a Python script making calls to Amazon DynamoDB can rate limit its consumption of read and write capacity units. The technique uses Boto3 event hooks to apply the rate limiting without having to modify the client code performing the read and write calls. It’s a proven solution used in the Bulk Executor for Amazon DynamoDB and its logic is available encapsulated within the DynamoDBMonitor class on GitHub.

In Part 2 I show you how to coordinate rate limits across separate processes using an Amazon Simple Storage Service (Amazon S3) folder for shared state.

Reasons to rate limit

I’ve previously published several blogs that illustrate scenarios where rate limiting can be useful:

To avoid table-level throttling, especially when in provisioned mode – For example, when performing a bulk task, you might want to run at a controlled rate to avoid spikes that generate throttles before automatic scaling can adjust. See Handle traffic spikes with Amazon DynamoDB provisioned capacity, options 4 and 5.
To avoid partition-level throttling – If your reads and writes might be hitting the same partition, you can rate limit to avoid creating hot partitions that generate throttles and impact other activity to the same partition. See Scaling DynamoDB: How partitions, hot keys, and split for heat impact performance.
To control costs – You might want to bound consumption to limit spend, or choose precise consumption to better utilize reserved capacity. See Cost-effective bulk processing with Amazon DynamoDB.

Overview of solution

The general approach to DynamoDB rate limiting with any SDK language is this:

Have the request send a ReturnConsumedCapacity parameter indicating it would like to know how much capacity was consumed. You can’t know the consumption in advance because, for example, a DeleteItem will consume write capacity based on the size of the item being deleted and a TransactWriteItems might consume either read or write capacity, because of how the ClientRequestToken gets handled
Pulls the consumption from the ConsumedCapacity data structure added to each response.
Track consumption over time and add short delays to rate limit, if needed.

For rate limiting with the Python SDK, as explained in Programming Amazon DynamoDB with Python and Boto3, there’s an event system that enables runtime extensibility. The Boto3 library has named event hooks where you can register a function to be called at certain points of the request/response process:

provide-client-params – Called early in the request processing. This gives us the opportunity to add the parameter ReturnConsumedCapacity if it’s not already present.
before-send – Called right before the request gets sent. This gives us the opportunity to add delays if required.
after-call – Called during the response handling. This gives us the opportunity to track the ConsumedCapacity.

By adding these hooks on the Session object, they will automatically run during calls to DynamoDB for clients built against that same Session object. This logic is encapsulated in a Python class called DynamoDBMonitor that’s about 100 lines of code. The following is a sample usage:

    # Create a session
    session = boto3.Session()

    # Construct a monitor around that session, specifying behavior
    DynamoDBMonitor(session, max_read_rate=1500, max_write_rate=500)

    # Use the session to get a .client() or .resource()
    resource = session.resource('dynamodb')

    # Perform reads and writes, and they will be rate limited
    ...

Code walkthrough

This section reviews the essential parts of the DynamoDBMonitor code.

The constructor takes the session (on which the event hooks will be applied) and the maximum read and write rates. It creates a read bucket and write bucket to track read and write consumption. Each bucket constructor accepts a per-second accumulation rate, an initial quantity of tokens to start with, and a maximum overall bucket capacity (after which any further accumulation spills out). Here we choose to start with an initial quantity equal to one second of accumulation, with a maximum capacity equal to two seconds of accumulation. This configuration aims to let processing get started quickly and allow some bursty usage.

The class then registers the event hooks to run our callbacks during request and response processing.

class DynamoDBMonitor:
    def __init__(self, session, max_read_rate=1500, max_write_rate=500):
        self.max_read_rate = max_read_rate
        self.max_write_rate = max_write_rate
        self._capacity_multiplier = 2

        # Token buckets for read and write. 
        self._read_bucket = TokenBucket(rate=self._max_read_rate,  initial=self._max_read_rate, capacity=self._max_read_rate  * self._capacity_multiplier)
        self._write_bucket = TokenBucket(rate=self._max_write_rate, initial=self._max_write_rate, capacity=self._max_write_rate * self._capacity_multiplier)

        # register hooks
        events = session.events
        events.register('provide-client-params.dynamodb.*', self._add_return_consumed_capacity)
        events.register('before-call.dynamodb.*', self._enforce_rate_limit)
        events.register('after-call.dynamodb.*', self._track_consumed_capacity)

The provide-client-params event hook adds the ReturnConsumedCapacity parameter if not already specified:

    def _add_return_consumed_capacity(self, params, **kwargs):
        if 'ReturnConsumedCapacity' not in params:
            params['ReturnConsumedCapacity'] = 'TOTAL'

The before-call event hook has to decide if a sleep is required. It makes a call to the appropriate bucket asking it to wait until its token count is above zero.

    def _enforce_rate_limit(self, params, model, **kwargs):
        operation = model.name
        is_read = operation in ('GetItem', 'BatchGetItem', 'Query', 'Scan', 'TransactGetItems')
        is_write = operation in ('PutItem', 'UpdateItem', 'DeleteItem', 'BatchWriteItem', 'TransactWriteItems')

        if is_read:
            self._read_bucket.wait_until_positive()
        elif is_write:
            self._write_bucket.wait_until_positive()
        else:
            return

The after-call event hook reads the ConsumedCapacity data structure out of the response and updates metrics. Tracking is only for the base table activities, not global secondary indexes (GSIs), and read/write metrics aren’t isolated per table.

    def _track_consumed_capacity(self, http_response, parsed, model, **kwargs):
        consumed = parsed.get('ConsumedCapacity')
        # ... complexity removed ...

        read = entry.get('ReadCapacityUnits')
        write = entry.get('WriteCapacityUnits')

        # Deduct from buckets (OK to go negative -> future callers will wait)
        if read:
            self._read_bucket.deduct(read)
        if write:
            self._write_bucket.deduct(write)

You can find the code for DynamoDBMonitor in the GitHub repo as part of the Bulk Executor for DynamoDB project. The code for TokenBucket is also there.

Other languages

This code relies on Python’s Boto3 event hook system. JavaScript and Java have similar systems for extensibility:

JavaScript has a middlewareStack (see Programming Amazon DynamoDB with JavaScript)
Java v2 has an ExecutionInterceptor (see Programming DynamoDB with the AWS SDK for Java 2.x)

Conclusion

Using the Boto3 event hook system makes it possible to track read and write capacity and add sleeps to impose rate limits without modifying the code performing the reads and writes. The DynamoDBMonitor class encapsulates this logic in reusable form. In Part 2, I show how to coordinate rate limits across separate processes using an S3 folder for shared state.

Try out this solution for your own use case, and share your feedback in the comments.

AWS Database Blog

Rate-limiting calls to Amazon DynamoDB using Python Boto3, Part 1

Reasons to rate limit

Overview of solution

Code walkthrough

Other languages

Conclusion

About the Author

Resources

Blog Topics

Follow

Learn

Resources

Developers

Help