Scale your Amazon ElastiCache for Redis clusters at a lower cost with data tiering

Amazon ElastiCache for Redis provides a convenient way to manage Redis at scale. You can use it for a variety of workloads, ranging from caching, session stores, real-time analytics, gaming leaderboards, and messaging. We’re always looking for ways to increase price/performance for our customers. Some examples include releasing enhanced I/O handling, which increased the throughput per node by up to 83%, and support for M6g and R6g Graviton2-based nodes, which enable up to 57% price/performance improvements.

Today, we announced the availability of data tiering for the Graviton2-based R6gd node types in ElastiCache for Redis. When using R6gd nodes, ElastiCache automatically and transparently tiers data between DRAM and locally attached NVMe solid state drives (SSDs). SSDs provide slightly higher latencies for Redis workloads than memory, but also cost significantly less. When using clusters with data tiering, you can save over 60% per GB of capacity while having minimal performance impact on applications.

With the largest data tiering node size (cache.r6gd.16xlarge), you can now store up to 1 petabyte in a single 500 node cluster (500 TB when using one read replica). Data tiering is compatible with all Redis commands and data structures supported in ElastiCache. You don’t need any client-side change for using this feature. In this post, we describe how to use R6gd instances with data tiering in ElastiCache for Redis to scale capacity in a cost-optimal way.

How data tiering works

On a cluster with data tiering, ElastiCache monitors the last access time of every item it stores. When available memory (DRAM) is fully consumed, ElastiCache uses a least-recently used (LRU) algorithm to automatically move infrequently accessed items from memory to SSD. When data on SSD is subsequently accessed, ElastiCache automatically and asynchronously moves it back to memory before processing the request. If you have a workload that accesses only a subset of its data regularly, data tiering is a good option to scale your capacity cost-effectively.

ElastiCache for Redis stores data on NVMe SSDs using a purpose-built tiering engine, which is fine-tuned for high throughput and low latency. Security and data integrity were key areas of focus in the design of the tiering engine. Like all Graviton2-based hardware, ElastiCache R6gd nodes offer always-on 256-bit encrypted DRAM. Additionally, all items stored on NVMe SSDs are encrypted by default (even for clusters that didn’t configure encryption of data at rest) using an XTS-AES-256 block cipher implemented in a hardware module on the node. We perform data integrity validation using a crc32c checksum on each item read from NVMe SSDs.

The following diagram illustrates the high-level architecture of ElastiCache for Redis node with data tiering.

What our customers are saying

Rokt, a global leader in ecommerce marketing technology, participated in early access for ElastiCache data tiering. They use ElastiCache to power their low-latency, globally replicated identity graph. As they’re expecting their data capacity needs to accelerate, they were looking for a way to easily and economically accommodate this growth. Here’s what they had to say:

“ElastiCache allows Rokt to handle our dataset’s hyper growth without impacting our clients’ user experience. Given the nature of our workload, ElastiCache data tiering is perfect for us, allowing us to continue our growth at a fraction of the price with virtually unnoticeable performance impact. AWS enables Rokt to focus on product innovation rather than rethink underlying infrastructure, which is paramount as leaders in the ecommerce technology space.” — Corey Bertram, Chief Technology Officer, Rokt.

Get started with data tiering

To start using data tiering, complete the following steps:

On the ElastiCache console, choose Redis in the navigation pane.
Choose Create.
This opens the cluster creation workflow.
For engine version, choose the recently launched 6.2 engine.
Select a node type in the R6gd family (data tiering isn’t supported on older engine versions or other node families).
Choose Save.
Enter the remaining required cluster configuration parameters, such as name and subnet group, and choose Create.

After a few minutes, your cluster’s status changes to Available. You can then connect to the cluster using the Redis command line interface or any Redis client. To migrate data from an existing ElastiCache cluster, you can restore a backup into your new R6gd cluster.

Performance analysis

ElastiCache data tiering is designed to have minimal performance impact on applications that access a small portion of their data regularly. Data tiering is ideal for large workloads that access up to 20% of their data regularly and for applications that can tolerate additional latency when data on the SSD tier is accessed. For these workloads, the working set (that is, the set of data being accessed regularly) is served fully in memory, with the remainder of the dataset served from SSD.

To measure performance, we used redis-benchmark to generate load. We tested against a single node cache.r6gd.2xlarge cluster. The test setup used 396 million unique keys, 16-byte key length, 500-byte string values, and 200 client connections, with a 4:1 get/set ratio and no command pipelining. We used five Amazon Elastic Compute Cloud (Amazon EC2) instances in the same Availability Zone to generate the load. We configured the workload so that 10% of the requests were issued against items were stored on NVMe SSD and ran the benchmark continuously over multiple weeks. We constructed our test this way to provide a representative sample of how ElastiCache performs with data tiering active and when infrequently accessed items are needed again.

We observed that the cluster processed 240,000 commands every second. The P50 client-side latency was 800 microseconds, average client-side latency was 820 microseconds, and P99 client-side latency was 1.4 milliseconds.

Monitor clusters with data tiering using Amazon CloudWatch

With this launch, we’ve updated the Amazon CloudWatch metrics available for ElastiCache to reflect SSD usage on clusters with data tiering. Specifically, we’ve added four new metrics and additional metric dimensions for two of our preexisting metrics.

The new metrics come in two pairs: BytesReadFromDisk and BytesWrittenToDisk, which indicate how much data is being read from and written to the SSD tier, and NumItemsReadFromDisk and NumItemsWrittenToDisk, which indicate the volume of Redis items being read from and written to the SSD tier.

In addition to these four new metrics, we introduced the Tier metric dimension on the metrics CurrItems and BytesUsedForCache. Tier can have two values: Memory and SSD. For example, when you query the CurrItems metric, if you don’t specify any Tier, you retrieve the total number of items in your cluster, just as before. If you specify Tier=Memory or Tier=SSD, you see the total broken down by how many items are in memory vs. SSD, respectively. Note that these new metrics and metric dimensions are only available for clusters with data tiering.

Here’s a practical example of how to put the new metrics to use. Let’s say you’re observing high client-side latency—you could inspect NumItemsReadFromDisk. If its value is high (perhaps, relative to GetTypeCmds + SetTypeCmds, using CloudWatch metric math), this could indicate that the SSD is being more frequently accessed relative to memory than is ideal for data tiering. You could scale up to a larger R6gd node type or scale out by adding shards so that more RAM is available to serve your active dataset.

Conclusion

In this post, we showed how data tiering for ElastiCache for Redis provides a convenient way to scale your clusters at a lower cost to up to a petabyte of data. It can provide over 60% savings while having minimal performance impact for workloads that access a subset of their data regularly.

The R6gd node type with data tiering for ElastiCache for Redis is available today in the US East (Ohio), US East (N. Virginia), US West (N. California), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), and Europe (Ireland) Regions. For pricing, see Amazon ElastiCache pricing.

We’re excited to enable you to use data tiering for ElastiCache for Redis to scale your clusters at a lower cost. We’d love to hear your feedback and questions about data tiering, so please let us know on the Amazon ElastiCache discussion forum or in the comments.

About the authors

Joe Travaglini is a product manager on the Amazon ElastiCache team. Prior to joining ElastiCache, Joe spent 5 years as a product manager for Amazon Elastic File System, responsible for EFS’s security and compliance roadmap, and product lead for the launch of the EFS Standard-Infrequent Access storage class. Prior to EFS, Joe was Director of Products at Sqrrl, a cybersecurity analytics startup acquired by AWS in 2018.

Gourav Roy is a Principal Engineer for Amazon ElastiCache. He likes working on problems related to storage engine and distributed systems. Outside of work, he enjoys sport, hiking and spending time with family.

Prabhu Krishnamoorthy is a Senior Manager, Engineering in the AWS in-memory databases team. Prior to Amazon ElastiCache, Prabhu has led various engineering teams in the developer tooling and data platform domains. He enjoys solving distributed systems problems, travel and reading.

AWS Database Blog