Best practices for sizing your Amazon ElastiCache for Redis clusters
Amazon ElastiCache for Redis is a fully managed Redis- and Memcached-compatible service delivering real-time, cost-optimized performance for modern applications. It scales to hundreds of millions of operations per second with microsecond response time, and offers enterprise-grade security and reliability.
You can scale ElastiCache for Redis seamlessly to accommodate changes in your application usage patterns. It can be scaled up or down by changing the instance type used for cache nodes. Additionally, in cluster mode, it can be scaled out or in by changing the number of shards in the cluster.
When creating or modifying an ElastiCache for Redis cluster, you have several sizing-related choices: What instance type should I use? How many shards should I have? Should I prefer larger instances and a few shards, or the other way around?
Answering these questions correctly depends on identifying the type of workload you have, because different workloads utilize different resources. For instance, is your workload memory-bound (where memory is the most dominant resource), CPU-bound, or network-bound? In this post, we provide guidelines and recommendations to find a good balance between price and performance.
Types of workloads
The first step is to determine what type of workload you have:
- Memory-bound workload – The total size of your dataset often dictates the layout of your cluster. ElastiCache for Redis datasets can range from a few megabytes to terabytes of data. A dataset of few gigabytes can easily fit inside a single ElastiCache instance and can use only one shard (one primary node with zero or more replica nodes), whereas larger workloads might not fit in a single instance due to memory limits and must be distributed across multiple shards. A large dataset that is not accessed with high concurrency can be considered memory-bound.
- CPU-bound workload – Workloads that require a high degree of concurrency and scale due to high CPU utilization are CPU-bound. A single core can typically provide somewhere in the order of 100,000 requests per second (RPS) for simple GET/SET commands. However, the type of command and additional factors (such as TLS) can influence RPS capacity. For instance, an HGETALL command that iterates and returns 100 sub-objects from a hash can be 50–100 times more CPU-expensive than a GET command that returns the same number of bytes. Generally speaking, commands that iterate over multiple keys or sub-keys are considered slower. Getting a sense of which commands are fast and which are slow can be observed in ElastiCache using the
XXXCmdLatencymetric in Amazon CloudWatch. You can also find the computational complexity (O(1), O(N), and so on) of each command in the official Redis command reference. Additionally, you can obtain visibility into the slowest commands used by your application using the SLOWLOG
- Network-bound workload – Storing and retrieving large objects in Redis often makes your responses large in terms of bytes on the network and may suggest a network-bound This pattern may cause a breach of instance network bandwidth limits if you’re using too few shards or instance types that are too small. This can be observed using the
NetworkBandwidthInAllowanceExceededCloudWatch metrics. For network-bound workloads, you’ll generally have thousands of RPS with responses sized 10 KB or more.
ElastiCache for Redis instance types
ElastiCache for Redis provides several instance families for you to choose from. This includes general purpose instances (M-family), memory optimized (R-family) that have higher ratio of memory to cores, data tiering (R6gd family) that have their data tiered between memory and SSD, network optimized instances (C7gn) that have higher network bandwidth limits, and burstable instances (T-family), which are usually ideal for non-production use cases.
Each instance family has several different instance sizes. Each instance size has a different number of cores, total RAM, and network bandwidth, and therefore are best suited to different types of workloads.
Can ElastiCache for Redis utilize more than one vCPU?
Yes! The ElastiCache team is constantly working to improve performance on multi-CPU instances. With built-in features such as Enhanced I/O (Redis 5.0.3+) and Enhanced I/O Multiplexing (Redis 7+), customers using instances with additional CPUs can expect additional RPS capacity and lower latency. These features provide performance boosts and are available in instances that have 4 vCPUs or more. For more information on feature compatibility, refer to Supported node types.
Choosing the right instance size
When creating or scaling clusters for a particular workload, you have the option to choose either larger instances with fewer shards (scale-up) or smaller instances with more shards (scale-out). We recommend choosing the instance size that best fits your workload and then selecting the number of shards depending on the total required capacity.
The chosen instance size should account for several factors, which we discuss in this section.
Expected concurrency for a single key or hash slot
ElastiCache for Redis scales out by spreading the keyspace to multiple shards. Each key consistently maps to one of 16,384 hash slots, and each hash slot is owned by a single shard (a shard can own many or even all the hash slots). That is why a single key will always be served by a single shard, and therefore the selected instance size must support the maximum expected RPS for a single key or hash slot. This requires choosing an instance type that has the appropriate number of cores as well as sufficient network bandwidth capacity. For instance, suppose we have a key with a 4 KB value that, at peak times, is read at a rate of 10,000 requests per second. The extra bandwidth (above steady-state usage) required for any node in the cluster would be 300 Mbit/s.
Larger instance types have better support for usage spikes because in many applications, such hotspots can happen on different keys at different times.
For example, suppose our application requires a total of 1.2 Gbit/s bandwidth at steady state but occasionally sees a spike due to a hot key or hash slot. We could use either two shards based on m7g.large instances, or at the same cost, use a single shard based on an m7g.xlarge instance. In both cases, the total available bandwidth is 1.876 Gbit/s. However, with the two large instances, we use 600 Mbit/s at steady state on each instance, leaving only 337 Mbit/s extra capacity for single key usage spikes. On the xlarge instance, we use 1.2 Gbit/s, leaving 676 Mbit/s extra capacity.
Number of concurrent or new connections
In cluster mode, clients must maintain connections to all shards (and possibly all nodes) to be able to read arbitrary keys. This means that any node in the cluster must be able to support the total expected number of connections as well as the rate of new connections. Larger instance types are better suited to handle more connections for several reasons:
- Instance type networking capacity and limits are higher for larger instances than smaller instances.
- Additional vCPUs (for instance with 4 vCPUs and above) are utilized in ElastiCache for Redis to offload and parallelize both network I/O operations as well as TLS session establishment.
Encryption in transit (TLS) enabled clusters
When using TLS, we recommend utilizing Enhanced I/O features that are automatically included in ElastiCache for Redis version 6.2 and above using instance types m/r7g.xlarge or larger (4 or more vCPUs). These enhancements improve throughput and reduce client connection establishment time by offloading encryption to other vCPUs. Workloads that have many connections will see improved latency because long TLS handshakes won’t block the main thread from processing other requests.
Auto Scaling ElastiCache for Redis clusters
After you have identified the type of workload by its most dominant resource and have chosen the most appropriate instance type, it can be straightforward to create an effective auto scaling policy that can automatically detect and respond to usage changes and add additional shards. For instance, a CPU-bound workload can set policies based on the EngineCPUUtilization CloudWatch metric, a memory-bound workload can use the BytesUsedForCache metric, and a network-bound workload can use the NetworkBytesOut/In metric. With Auto Scaling policies, you can set a target value for one of these metrics according to the capacity of the instance type you have selected. For additional details, see Auto Scaling ElastiCache for Redis clusters.
With all these considerations in mind, let’s review the guidance for sizing workloads based on their type.
Naturally, in-memory databases are often memory-bound, and a memory-optimized instance type is a default choice.
The following table provides basic sizing guidelines for such workloads. It relies on R-family instances with instance size of xlarge and 2xlarge that are large enough to fit the requirements of most production workloads in terms of memory and number of connections.
|Total Data Size||Instance Type||Shards||Notes|
|0–416 GB||r7g.xlarge||1–16||Auto Scaling policy based on BytesUsedForCache|
|416 GB+||r7g.2xlarge||8+||Auto Scaling policy based on BytesUsedForCache|
Additionally, workloads with large datasets that don’t require a high level of concurrency may optimize cost and performance further by using data tiering-enabled instances (r6gd family). D instances have their data tiered between memory and local SSD (solid state drives) storage and can achieve over 60% cost savings compared to r6g nodes. Data tiering clusters are best suited for workloads where a small subset of keys are commonly accessed and there is a larger amount of data that is accessed infrequently.
These workloads are characterized by large requests and responses in terms of network bytes sent and received. Such workloads may often exceed the instance bandwidth limit before reaching memory or CPU limits. Each instance type has a defined bandwidth limit (see Memory optimized instances for more information). When the limit is hit, throttling may occur.
To demonstrate, an application running GET commands that return 25 KB values can expect a maximum RPS of 3,750 on a m6g.large instance type due to its 0.75 Gbps bandwidth limit. If the application sends a higher rate of commands, throttling will take place and packet delays or loss will eventually occur, leading to TCP retransmits, timeouts, and possibly interference to replication and cluster bus communication.
This is why, in network-bound workloads, it’s especially important to choose an instance size that can support a spike in requests for a particular key (or hash slot).
The following table provides guidelines for network-bound workloads. We use M type instances that have a lower price per network Gbps compared to R instances.
|Expected Bandwidth||Instance Type||Shards||Notes|
|0–3.7Gbps||m7g.xlarge||1–2||Auto Scaling policy based on NetworkBytesOut|
|3.7–120 Gbps||m7g.2xlarge||1–32||Auto Scaling policy based on NetworkBytesOut|
|120+ Gbps||m7g.4xlarge||16+||Auto Scaling policy based on NetworkBytesOut|
Customers running workloads where network bandwidth is more dominant may choose C7gn instances, which provide higher network performance but less memory compared to M and R instances:
|Expected Bandwidth||Instance Type||Shards||Notes|
|12.5+ Gbps||c7gn.xlarge||1+||Auto Scaling policy based on NetworkBytesOut|
We categorize CPU-bound workloads into two types:
- High RPS workloads – These workloads perform many fast commands (such as GET and SET)
- High computational complexity workloads – These workloads perform slow commands (such as EVAL, HGETALL, LRANGE, and so on)
High RPS workloads, consisting of mostly fast commands, usually require considerable CPU resources to read and write from the network as well as to perform TLS encryption and decryption (if using encryption in transit). These workloads can benefit from utilizing the built-in Enhanced I/O features, where network and TLS work is offloaded to additional threads. For a full list of compatible instance types and engine versions, see Supported node types.
|m7g.xlarge||1–32||Auto Scaling policy based on EngineCPUUtilization|
|m7g.2xlarge||16+||Auto Scaling policy based on EngineCPUUtilization|
High computational complexity workloads place the most strain on the main thread. As a result, these workloads gain less from additional cores and scale-out is usually the best way to gain additional capacity. These workloads can be identified using the
SetTypeCmdLatency metrics, where the average latency value is above 20 microseconds.
|m7g.large||1–32||Auto Scaling policy based on EngineCPUUtilization|
|m7g.xlarge||16+||Auto Scaling policy based on EngineCPUUtilization|
When creating or scaling ElastiCache for Redis clusters, it’s important to understand workload characteristics, the different instance types, and the ElastiCache for Redis scaling semantics to achieve good price-performance balance. We recommend using the guidelines in this post to simplify these choices. Data tiering is another powerful tool that should be considered where appropriate, as well as enabling Auto Scaling to deal with changing workload needs.
About the authors
Elad Bernstein is a Senior Software Engineer on the AWS ElastiCache team. He has 20 years of software development and software architecture experience ranging from low level kernel development, reverse engineering, OS internals and high-performance networking to distributed cloud systems. He is passionate about high performance code and highly scalable systems. In his spare time Elad enjoys cooking and traveling with his family.
Karthik Konaparthi is a Senior Product Manager on the Amazon In-Memory Databases team and is based in Seattle, WA. He is passionate about all things data and spends his time working with customers to understand their requirements and building exceptional products. In his spare time, he enjoys traveling to new places and spending time with his family.