I need to implement disaster recovery or fault tolerance for my ElastiCache Redis cluster data. What options are available for this purpose and what should I implement for my usage scenario?

An Amazon ElastiCache Redis cluster provides several options for implementing disaster recovery or fault tolerance of your cached data. The options are listed here in order of the level of data protection or ‘data durability’ provided, from lowest to highest.

Daily automatic backups

When daily automatic backups are enabled, ElastiCache creates a backup of the cluster and writes all data from the cache to a Redis .RDB file. Automatic backups can be retained for up to 35 days.

Important

  • Redis backup and restore is not supported for T2 node types.
  • Before implementing backup and restore, consider the limitations described at Constraints.
  • The backup process differs depending on the version of Redis you are running. Redis versions 2.8.22 and later implement a ‘forkless’ backup which can improve performance. For more information, see How Synchronization and Backup are Implemented and Performance Impact of Backups.

Manual backups

Manual backups are retained indefinitely and are useful for testing and archiving. Manual backups can also be scheduled to occur up to 20 times per node for any 24-hour period. To improve overall performance of an ElastiCache Redis cluster backup, you can enable RDB snapshots on a designated ‘persistence-only’ secondary node while simultaneously disabling both RDB snapshots and AOF on the primary node and all other secondary nodes.

Important

  • Redis backup and restore is not supported for T2 node types.
  • Before implementing backup and restore, consider the limitations described at Constraints.
  • The backup process differs depending on the version of Redis you are running. Redis versions 2.8.22 and later implement a ‘forkless’ backup which can improve performance. For more information, see How Synchronization and Backup are Implemented and Performance Impact of Backups.

Redis append-only file (AOF)

When this feature is enabled, the node writes all of the commands that change cache data to an append-only file stored on Amazon S3. When a node is rebooted and the cache engine starts, the AOF is "replayed"; the result is a warm Redis cache with all of the data intact. For more information, see Redis Append Only Files (AOF).

Important

When using AOF, the following considerations apply:

  • AOF is not enabled by default. To enable AOF for a Redis cluster, create a parameter group with the appendonly parameter set to yes and assign the parameter group to your cluster. After you enable AOF for a Redis cluster, you can configure how often the AOF output buffer is written to disk by setting the value of the appendfsync parameter.
  • AOF is only supported for use with Redis versions 2.8.21 and earlier.
  • AOF is subject to the limitations described at Mitigating Cluster Failures: Redis Append Only Files (AOF).
  • AOF is disabled for T1 or T2 node types. For nodes of these types, the appendonly parameter value is ignored.
  • AOF is disabled for Multi-AZ replication groups. For Multi-AZ groups, the appendonly parameter value is ignored.

Multi-AZ with Automatic Failover

Multi-AZ with Automatic Failover provides fault tolerance if your cluster’s read/write primary cluster node becomes unreachable or fails for any reason.

Each of the available options involves choosing between varying levels of data durability, performance, and cost:

Daily automatic backups

A daily automatic backup is a suitable low-cost disaster recovery option.

  • Data loss potential - High (up to a day’s worth). Daily automatic backups can be retained for only 35 days.
  • Performance impact - Low to medium. When the backup can be performed during a service window with relatively low client activity, performance impact is nominal. If client activity is steady or very unpredictable, you can offset the performance impact by enabling RDB snapshots on a designated ‘persistence-only’ secondary node while simultaneously disabling both RDB snapshots and AOF on the primary node and all other secondary nodes. For more information, see Performance Impact of Backups.
  • Cost - Low. Daily automatic backups are a very low-cost disaster recovery option when the loss of up to one day of data is acceptable. For more information, see Costs.

For more information, see Scheduling Automatic Backups. For comprehensive information about implementing backups for ElastiCache clusters running Redis, see ElastiCache Backup & Restore (Redis).

Manual backups

Manual backups also fall into the category of a disaster recovery option.

  • Data loss potential - Medium to high. Because manual backups can be scheduled to occur up to 20 times per node for any 24-hour period, manual backups can be suitable when a couple of hours of data loss is acceptable. Manual backups do not place any limits on data retention.
  • Performance impact - Medium to high. Running multiple file backups throughout the day impacts performance. If the primary node is busy processing client requests for any significant duration, consider enabling RDB snapshots on a designated ‘persistence-only’ secondary node while simultaneously disabling both RDB snapshots and AOF on the primary node and all other secondary nodes. By offloading backups from the primary node to a dedicated secondary node, you free up resources on the primary node and provide better performance for clients. For more information, see Performance Impact of Backups.
  • Cost - Low to medium. Storage costs increase with the number of backups and the data retention duration. For more information, see Costs.

For more information, see Taking Manual Backups. For comprehensive information about implementing backups for ElastiCache clusters running Redis, see ElastiCache Backup & Restore (Redis).

Redis append-only file (AOF)

This is a suitable option for maintaining a high level of data persistence at a relatively low cost using functionality that is native to Redis versions 2.8.21 and earlier.

  • Data loss potential - Low to medium. Although AOF provides a measure of fault tolerance, it cannot protect your data from a hardware-related cache node failure, so there is some risk of data loss.
  • Performance impact - Low to high. AOF performance impact is highly correlated with the associated appendfsync parameter value, which controls how often the AOF output buffer is written to disk. The more frequently the output buffer is written to disk, the greater the impact on performance. Choosing the ‘always’ option for this parameter causes the buffer to be flushed every time the cache data is modified and therefore is not recommended. Choose either the ‘everysec’ or ‘no’ options to write to disk every second or as needed instead. Keep in mind that the AOF file can grow quickly. Always test to verify your disk space requirements. Another performance consideration for AOF is the time required to replay an AOF file. You might need several minutes to populate the Redis nodes with the cache data. During this time, your application can only satisfy queries for uncached data by directly querying your database.
  • Cost - Low to medium. AOF cost is most highly correlated to the time requirements and performance considerations involved whenever you need to ‘replay’ an AOF file. Disk-space requirements are greater than the ‘snapshot’ options already described.

For more information, see Redis Append Only Files (AOF).

Multi-AZ with Automatic Failover

This is the option to use when data retention, minimal downtime, and application performance have the utmost priority.

  • Data loss potential - Low. Multi-AZ provides fault tolerance for every scenario, including hardware-related issues.
  • Performance impact - Low. Of the available options, Multi-AZ provides the fastest time to recovery because there is no manual procedure to follow after the process is implemented. Automatic failover buys valuable time that is easily lost when responding to a failure by manually implementing a restore process.
  • Cost - Low to high. Multi-AZ is the lowest-cost option when you cannot afford to risk losing data as a result of hardware failure or when you cannot afford the downtime required by other options to respond to an outage.

For more information, see Replication: Multi-AZ with Automatic Failover (Redis).

ElastiCache Redis, AOF, Multi-AZ with Automatic Failover, performance, data persistence, disaster recovery, fault tolerance, high availability, RDB


Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center

Published: 2016-1-27