Why am I seeing high or increasing memory usage in my ElastiCache cluster?

Last updated: 2022-09-15

I'm seeing high or increasing memory usage in my Amazon ElastiCache cluster. How is memory usage determined on ElastiCache cluster nodes?

Resolution

To determine overall memory usage on your cluster and its nodes, review these Redis metrics. These metrics are published in Amazon CloudWatch for each node in a cluster:

  • BytesUsedForCache: The total number of bytes allocated by Redis for dataset, buffers, and so on. This metric is derived from a Redis node's INFO command output. Use this metric to determine the memory utilization of your cluster.
  • FreeableMemory: This host level metric shows the amount of free memory available on the host. When memory usage increases due to cache data or by overhead, you can see the decrease in FreeableMemory. A decrease in FreeableMemory suggests low free memory on the host. Swap might occur if FreeableMemory is too low.
  • DataBaseMemoryUsagePercentage: This metric is derived from the Redis INFO command output. This is the percentage of memory used by the cluster node. Redis initiates the Redis maxmemory eviction policy after this metric reaches 100% of the threshold.

Keep in mind that by default ElastiCache for Redis reserves 25% of the max-memory for non-data usage, such as fail-over and backup. If you don't specify enough reserved memory for non-data usage, the chance of swapping increases. For more information, see Managing reserved memory.

Causes of sudden high memory usage

  • Recently added keys: Adding new key-value pairs causes an increase in memory usage. Adding elements on existing keys also increases memory usage. Check the SetTypeCmds metric to determine if there are recent data changes on the node. This metric logs the total number of write type commands and is derived from the Redis commandstats statistic.
  • Increase in buffer usage: Clients are connected to Redis over the network. If the client isn't reading from the cache fast enough, Redis keeps the response data in a memory space called the client output buffer. The client can continue reading from the buffer space. This is also true for Pub and Sub clients if the subscribed clients aren't reading fast enough.
    If there is a bottleneck in network bandwidth or if the cluster is continuously under heavy load, then the buffer usage might continue to accumulate. This accumulation causes memory exhaustion and performance degradation. By default, ElastiCache for Redis doesn't restrict growth in output buffer and each of the clients have their own buffer. Use the client-list command to check buffer usage.
  • Large number of new connections: A large number of new connections might elevate memory usage. All new connections create a file descriptor that consumes memory. The aggregate memory consumption with a large number of new connections might be high, leading to data eviction or OOM errors. Check the NewConnections metric for the total number of accepted new connections.
  • High swap usage: It's normal to see some swap usage on a cache node when there is free memory. However, too much swap usage might lead to performance issues. High swap usually starts happening in a node that's running under memory pressure, resulting in low freeable memory. Use the SwapUsage metric to monitor swap on the host.
  • High memory fragmentation: A high memory fragmentation indicates inefficiencies in memory management within the operating system. Redis might not free up memory when keys are removed. Use the MemoryFragmentationRatio metric to monitor the fragmentation ratio. If you're running into fragmentation issues, turn on the activedefrag parameter can for active memory defragmentation.
  • Big keys: A key with a large data size or a large number of elements in it is called a big key. You might see high memory usage as a result of big key even if the CurrItems metric stays low. To detect big keys in your data set, use the redis-cli --bigkeys command.

Best practices to control high memory usage

  • Use TTL on keys: You can specify TTL on keys for expiration. Doing this removes keys after expiration without waiting for memory pressure. This prevents cluttering Redis with unnecessary keys. A small number of evictions isn't a concern, but a high number of evictions means your node is running on memory pressure.
  • Use eviction policy: When cache memory starts to fill up, Redis evicts the keys to free up space based on the maxmemory-policy. The default maxmemory-policy policy is set to volatile_lru. It's a best practice to choose an eviction policy that is specific to the needs of your work load.
  • Allocate reserved memory: To avoid issues during failover or backup, it's a best practice to set reserved_memory_percentage to at least 25% for non-data usage. If there is not enough reserved memory to perform failover or backup, swap and performance issues occur.
  • Use connection pooling: Connection pooling helps control high numbers of new connections attempted by the Redis client. Review the AWS best practice guidelines for handling a large number of new connections.
  • Adjust output buffer size limits: You can adjust the output buffer limit to control the buffer space usage. ElastiCache for Redis parameter groups provide several parameters starting with client-output-buffer-limit-* to avoid unlimited growth of client output buffer usage. Be aware that there isn't a suggested limit for these parameters as every workload is unique. It's a best practice to benchmark your workload so that you can choose an appropriate value.
  • Consider using hash mapping: In Redis, the total memory footprint of the Redis DB is linear. It takes more memory with fewer individual keys than a single hash-mapped key with fewer fields. Hash mapping helps with data structures that have a large number of keys. In addition to hash mapping, you can take advantage of ziplist encoding, which reduces the memory footprint compared to hash-tables. Note that using hash mapping might cause a spike in Redis engine usage because this is a complex command that needs more CPU than set operations.
  • Scale the cluster: Sometimes you might experience memory pressure after taking the necessary precautions. If this occurs and if the usage is due to expected workload, consider performing an appropriate scaling to ease the memory bottleneck.
  • Set an alarm for memory usage. You can use CloudWatch alarms to initiate an alarm when memory usage crosses a preset threshold. Use the BytesUsedForCache or DatabaseMemoryUsagePercentage metric to create an alarm from the CloudWatch console for monitoring and scaling purposes.