How do I choose the correct instance type and size for my ElastiCache for Redis cluster?

Last updated: 2022-06-27

How do I select an Amazon ElastiCache for Redis node size and cluster configuration for my workloads to get the best performance?

Short description

When selecting a node size for your ElastiCache cluster and your cluster configuration keep the following in mind:

  • How much CPU processing power do I need?
  • How much data do I need to store?
  • Will the selected node size be able to handle the expected network traffic?
  • What cluster mode should I choose?

Resolution

How much CPU processing power do I need?

Review node options

Review the available node options. Because Redis is a single-threaded process, performance doesn't increase when you add CPUs to your node. The power of the single CPU core on the node determines the performance. Each node type has its own baseline performance. Current generation node types provide more memory and computational power at a lower cost compared to previous versions. For example, M5 and R5 instances offer better performance at a lower cost compared to M4 and R4.

Benchmark your nodes

It's a best practice to benchmark your nodes to get an estimate of how well the instance performs on your expected workload. To do this, use the redis-benchmark utility. For information on baseline recommendations, see Five workload characteristics to consider when right sizing Amazon ElastiCache Redis clusters.

When performing the benchmark tests, mimic the workload of your application traffic, including number of requests, key distribution, and item size. Monitor the Amazon CloudWatch metrics for memory usage, processor utilization, cache hits, and cache misses. You might notice that your cluster doesn't have the hit rate that you want or that keys are being evicted too often. If this occurs, choose a different node size with larger CPU and memory specifications.

Because Redis is single-threaded, multiply the reported CPU usage by the number of CPU cores to get the actual usage. For example, a four-core CPU reporting a 20 percent usage rate is actually the one core that Redis is running at 80 percent utilization.

How much data will I need to store?

To estimate storage needs, multiply the average item size to cache by the number of items that are in the cache at the same time. To estimate item size, serialize your cache items, count the characters, and then divide this sum by the number of shards in your cluster.

Keep in mind your Redis version's reserved-memory or reserved-memory-percent group parameter. This parameter reserves memory for non-data use such as system backups and general system stability. By default, this value is set to 25% of your max memory. Factor this in when determining what size is enough for your data.

For example, cache.r4.large has 12.3 GB of available memory for use. Because your reserved memory is set to 25%, the node stores up to 9.2 GB (12.3 x 75%) of data. The rest is reserved for other Redis functions and system stability.

Will the node type that I selected be able to handle the expected network traffic?

Make sure that the node type that you select can handle the amount of data being pushed without reaching the network maximum.

To determine the maximum throughput on an ElastiCache node, run a benchmark on a similar Amazon Elastic Compute Cloud (Amazon EC2) node.

For example, for a cache.m4.large node, run your tests on an EC2 m4.large instance. Running this test finds the theoretical maximum throughput for your node. It also provides an estimate of how much bandwidth that you can push between your client and your Redis node. A common side effect of maxing out your networking is a flat line on the bandwidth graph, indicating that you're maxing out your hardware. For instructions on using iperf3 to measure network performance between instances, see How do I benchmark network throughput between Amazon EC2 Linux instances in the same Amazon VPC?

Because ElastiCache nodes are similar to their EC2 counterparts, look at the Network Performance listed for each instance type. If you're maxing out the network throughput for your node, you might need to select the next node size up for better network performance. For example, an m4.large instance has moderate networking. However, if your workload is maxing out your networking and you are receiving poor performance, identify a new node type that has better networking performance. Better network performance might be listed as High or 10 Gigabit in the Network Performance column on the Pricing page.

Keep in mind that network performance also applies to the client. If your client is a t2.micro with low to moderate network performance and you're pushing data to a m4.10xlarge, then you max out the network throughput on your client first.

What cluster mode should choose?

Choosing the right mode for your application workload depends on the traffic:

If the primary load on your cluster consists of applications reading data, you can choose a Redis (cluster mode disabled) cluster. Then. scale your cluster to support more read operations by adding read replicas. There is a maximum of 5 read replicas. Keep in mind that cluster mode disabled clusters have only one shard. So, the node type must be large enough to accommodate all of the cluster's data, plus necessary overhead.

If the load on your cluster is write-heavy and your write workload exceeds what one node can offer, then use a Redis (cluster mode enabled) cluster. Cluster mode enabled clusters spread your keys out among multiple primary nodes. So, the write performance is split between multiple nodes instead of a single node.

If you have a large workload that needs extreme performance, use a cluster mode enabled cluster with multiple shards and multiple read replicas for each shard. For example, you can create a cluster with 15 shards and 5 replicas per primary node.


Did this article help?


Do you need billing or technical support?