I need to implement disaster recovery or fault tolerance for my Amazon ElastiCache Redis cluster data. What options are available?

An ElastiCache Redis cluster provides varying levels of data durability, performance, and cost for implementing disaster recovery or fault tolerance of your cached data.

The following options are listed by level of data protection, or data durability, from lowest to highest.

Daily automatic backups

When daily automatic backups are enabled, ElastiCache creates a backup of the cluster and writes all data from the cache to a Redis RDB file.

Important

A daily automatic backup is a suitable low-cost disaster recovery option.

  • Data loss potential - High (up to a day’s worth). Daily automatic backups are retained for up to 35 days.
  • Performance impact - Medium to high. Running multiple file backups throughout the day impacts performance. If the primary node is processing client requests for any significant duration, then consider enabling RDB snapshots on a designated persistence only secondary node. And simultaneously disable both RDB snapshots and AOF on the primary node and all other secondary nodes. By offloading backups from the primary node to a dedicated secondary node, you free up resources on the primary node and provide better performance for clients. For more information, see Performance Impact of Backups.
  • Cost - Low to medium. Storage costs increase with the number of backups and the data retention duration. For more information, see Backup Costs.

For more information, see Making Manual Backups. For comprehensive information about implementing backups for ElastiCache clusters running Redis, see ElastiCache for Redis Backup & Restore.

Manual backups (Redis append-only file (AOF))

Manual backups are retained indefinitely and are useful for testing and archiving. Manual backups can also be scheduled to occur up to 20 times per node within any 24-hour period. To improve overall performance of an ElastiCache Redis cluster backup, enable RDB snapshots on a designated persistence only secondary node. And simultaneously disable both RDB snapshots and AOF on the primary node and all other secondary nodes.

Important:

When using AOF, the following considerations apply:

  • AOF is not enabled by default. To enable AOF for a Redis cluster, create a parameter group with the appendonly parameter set to yes and assign the parameter group to your cluster. After you enable AOF for a Redis cluster, you can configure how often the AOF output buffer is written to disk by setting the value of the appendfsync parameter.
  • AOF is supported only for use with Redis versions 2.8.21 and earlier.
  • AOF is subject to the limitations described at Mitigating Failures: Redis Append Only Files (AOF).
  • AOF is disabled for T1 or T2 node types. For nodes of these types, the appendonly parameter value is ignored.
  • AOF is disabled for Multi-AZ replication groups. For Multi-AZ groups, the appendonly parameter value is ignored.

This is a suitable option for maintaining a high level of data persistence at a relatively low cost by using the functionality that is native to Redis versions 2.8.21 and earlier.

  • Data loss potential - Low to medium. Although AOF provides a measure of fault tolerance, it can't protect your data from a hardware-related cache node failure, so there is risk of data loss.
  • Performance impact - Low to high. AOF performance impact is highly correlated with the associated appendfsync parameter value, which controls how often the AOF output buffer is written to disk. The more frequently the output buffer is written to disk, the greater the impact on performance. Choosing the always option for this parameter causes the buffer to be flushed every time the cache data is modified, and therefore, this option isn't recommended. Instead, choose either the everysec or no option to write to disk every second or as needed. Because the AOF file can grow quickly, it's a best practice to verify your disk space requirements. Another performance consideration for AOF is the time required to replay an AOF file. You might need several minutes to populate the Redis nodes with the cache data. During this time, your application can satisfy queries only for uncached data by directly querying your database.
  • Cost - Low to medium. AOF cost is most highly correlated to the time requirements and performance considerations involved whenever you need to replay an AOF file. Disk-space requirements are greater than the snapshot options already described.

For more information, see ElastiCache for Redis Append Only Files (AOF).

Multi-AZ with Automatic Failover

Multi-AZ with Automatic Failover provides fault tolerance if your cluster’s read/write primary cluster node becomes unreachable or fails. Use this option when data retention, minimal downtime, and application performance are a priority.

  • Data loss potential - Low. Multi-AZ provides fault tolerance for every scenario, including hardware-related issues.
  • Performance impact - Low. Of the available options, Multi-AZ provides the fastest time to recovery, because there is no manual procedure to follow after the process is implemented. Automatic failover buys valuable time that is easily lost when responding to a failure by manually implementing a restore process.
  • Cost - Low to high. Multi-AZ is the lowest-cost option. Use Multi-AZ when you can't risk losing data as a result of hardware failure or you can't afford the downtime required by other options in your response to an outage.

For more information, see Minimizing Downtime Multi-AZ with Automatic Failover.


Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center

Published: 2016-01-27

Updated: 2018-10-30