Category: Launch


Amazon DynamoDB Accelerator (DAX): A Read-Through/Write-Through Cache for DynamoDB

Joseph Idziorek is a product manager at Amazon Web Services.

AWS recently launched Amazon DynamoDB Accelerator (DAX), a highly available, in-memory cache for Amazon DynamoDB. If you’re currently using DynamoDB or considering DynamoDB, DAX can offer you response times in microseconds and millions of requests per second.

When developers start using DAX, they tell us that the performance is great, and they love that DAX is API compatible with DynamoDB. This means they no longer have to set up and develop against a side cache. Instead of learning another database system with a new set of APIs and data types—and then rewriting applications to do the two-step dance needed for cache look-ups, population, and invalidation—you can simply point your existing DynamoDB application at the DAX endpoint. What used to take weeks and months now takes only moments with DAX.

How does DAX accomplish this? When you’re developing against DAX, instead of pointing your application at the DynamoDB endpoint, you point it at the DAX endpoint, and DAX handles the rest. As a read-through/write-through cache, DAX seamlessly intercepts the API calls that an application normally makes to DynamoDB so that both read and write activity are reflected in the DAX cache. For you, the API calls are the same, so there’s no need to rewrite the application.

In this post, we take a step back and describe what a read-through/write-through cache is, and how it’s different from other caching strategies. We also discuss the design considerations for these different caching strategies.

Side-cache
When you’re using a cache for a backend data store, a side-cache is perhaps the most commonly known approach. Canonical examples include both Redis and Memcached. These are general-purpose caches that are decoupled from the underlying data store and can help with both read and write throughput, depending on the workload and durability requirements.

For read-heavy workloads, side-caches are typically used as follows:

  1. For a given key-value pair, the application first tries to read the data from the cache. If the cache is populated with the data (cache hit), the value is returned. If not, on to step 2.
  2. Because the desired key-value pair was not found in the cache, the application then fetches the data from the underlying data store.
  3. To ensure that the data is present when the application needs to fetch the data again, the key-value pair from step 2 is then written to the cache.

Although the number of use cases for caching is growing, especially for Redis, the predominant workload is to support read-heavy workloads for repeat reads. Additionally, developers also use Redis to better absorb spikes in writes. One of the more popular patterns is to write directly to Redis and then asynchronously invoke a separate workflow to de-stage the data to a separate data store (for example, DynamoDB).

There are a few design points to note here. First, writes to cache are both eventually consistent and non-durable, so there is a possibility for data loss. Some applications in IoT, for example, can tolerate this trade-off. In addition, there are penalties in the form of multiple round-trips and additional connection handling.

(more…)