Skip to main content

Amazon ElastiCache

How does Elasticache manage patches and upgrades?

We frequently upgrade our ElastiCache fleet with patches and upgrades being applied to instances seamlessly in one of the two ways: (a) service updates and (b) continuous managed maintenance. These updates are required to apply upgrades that strengthen security, reliability, and operational performance.

Service updates give you flexibility to apply them on your own. They are timed and may be moved into the maintenance window to be applied by us after their due date lapses. You have the option to manage updates yourself at any time prior to the scheduled maintenance window. When you manage an update yourself, your instance will receive the OS update when you relaunch the node and your scheduled maintenance window will be cancelled.

Continuous managed maintenance happens from time to time and directly in your maintenance windows without requiring any action from you. These updates are separate than those offered by service updates.

Service Updates

Open all

Service updates enable you to apply security patches or minor software updates at your discretion. These updates help strengthen security, reliability, and operational performance of your clusters.

There are three types of service updates: security-update, engine-update, and engine-major-version-update. We strongly recommend that you apply any updates of type security-update as soon as possible to ensure that your ElastiCache clusters are always up-to-date with current security patches. The engine-update type typically involves patches or minor engine version updates related to performance or stability optimizations to your current engine version. The update type engine-major-version-update involves upgrading the engine version of your cache to a newer major version, typically due the engine version of your cache reaching its scheduled End of Life date, and should be prepared for accordingly. To learn more about considerations related to major engine version upgrades, see the version management documentation.

When service updates applicable to your clusters become available, we will notify you via several channels, including the ElastiCache console, email, Amazon Simple Notification Service (SNS), AWS Personal Health Dashboard, and Amazon CloudWatch.

Updates available via our continuous managed maintenance are separate than those offered by service updates, and are directly scheduled in your maintenance windows without any action needed from your side. Service updates are timed and give you control on when you want to apply by the “Recommended Apply by Date”. If they are still not applied by then, ElastiCache may schedule these updates in your maintenance window.

If your ElastiCache cluster is participating in a HIPAA, PCI, or FedRAMP compliance program, you must apply service updates by their “Recommended Apply by Date” in order to maintain compliance. For more information, please see Self-Service Security Updates for Compliance.

For other clusters, we recommend that you apply service updates as per your business cadence. Even if you are unable to apply a service update by its “Recommended Apply by Date” you will be able to apply it until its “Update Expiration Date”. However, the “Update Expiration Date” can change anytime depending on the availability of new updates.

When a service update is applied to one or more clusters, the update is applied to no more than one node at a time within each shard until all selected clusters are updated. The nodes being updated will experience downtime of a few seconds, while the rest of the cluster will continue to serve traffic. There will be no change in the cluster configuration. Your CloudWatch metrics will resume once the node is available post-update.

Service updates are applied to the clusters in the same way as “Continuous Managed Maintenance Updates”, through node replacement. Please refer to the following questions on this page for details about how the update is applied and how to prepare your application to minimize the impact.

  • How does a node replacement impact my application?
  • What best practices should I follow for a smooth replacement experience and minimize data loss?
  • What client configuration best practices should I follow to minimize application interruption during maintenance?

ElastiCache will schedule a service update after the “Recommended Apply by Date” if the value of “Auto-Update after Due Date” attribute is “yes.” The update will be scheduled in the cluster’s maintenance window and you will receive a new notification one week in advance with the scheduled date before the updates are applied.

Yes, the node is replaced by a new empty node. The cache contents will no longer be there and will start fresh.

You can determine if you can opt out of a service update by verifying that the value of “Auto-Update after Due Date” attribute. If the value of “Auto-Update after Due Date” attribute of a service update is “no”, this service update can be opted out of. However, if the value of “Auto-Update after Due Date” attribute of a service update is “yes” and the recommended “Apply by Date” has passed, ElastiCache will automatically schedule the service update to any remaining clusters during an upcoming maintenance window. This automatic service update will be scheduled before the "Update expiration date” and you will receive a notification one week prior to the update with the scheduled time. We strongly recommend applying security updates even if they can be opted out of. If you choose to apply the service update to the remaining clusters prior to the maintenance window, ElastiCache will not reapply the service update during the maintenance window.

ElastiCache doesn't directly apply service updates during maintenance windows to provide you flexibility on when to apply them. You control when and which updates are applied to your self-designed clusters. Clusters that are not participating in the ElastiCache-supported compliance programs can choose to not apply these updates, or apply them at a reduced frequency throughout the year. This is true only when the value of “Auto-Update after Due Date” attribute of a service update is “no”. For more information, see Can I opt out of service updates?

No, service updates are mutually exclusive to the continuous managed maintenance updates applied directly by ElastiCache during your clusters’ maintenance windows.

A list of attributes are on the Service Updates page in the console, which can be found by following these instructions

To help determine how soon to apply the available service updates, you can refer to the “Severity” service update attribute which has the following values (in order of priority):

1. CriticalWe recommend that you apply this update immediately (within 14 days or less)
2. ImportantWe recommend that you apply this update as soon as your business flow allows (within 30 days or less)
3. MediumWe recommend that you apply this update within 60 days or less
4. LowWe recommend that you apply this update within 90 days or less

For more details, please refer to our Applying Updates documentation.

Service updates are released based on the severity and urgency of the update. 

This attribute reflects whether your cluster was updated by the “Recommended Apply by Date”. If a service update is applied after the “Recommended Apply by Date”, the attribute “Service Update SLA Met” is set to “no”.

This information is relevant for ElastiCache clusters participating in HIPAA, PCI DSS, and FedRAMP compliance programs. For more information, please see Self-Service Security Updates for Compliance.

Yes, unless noted otherwise in the service update “Description” attribute, service updates are always cumulative. If you miss applying them by the “Update Expiration Date”, they will be included in the next service update. Service updates of type “security” fall under this cumulative category.

No, service updates are applied at the cluster level. If you cancel an ongoing update, a cluster may have some nodes updated and some nodes not updated. In this case, the cluster will continue to show up in the list of clusters to apply the service update to. The cluster will continue to operate normally.

There are two cases when this may happen:

(a) You missed applying the service update that was optional and the update is now in “expired” status - clusters participating in compliance programs must always apply all the service updates.
(b) Your node is replaced for any other reason, such as a planned maintenance event or node failover, ElastiCache will provision a new node with the latest service updates included.

In both cases, the cluster will continue to operate normally.

New nodes contain all applicable service updates, so you can manually replace the existing nodes that haven’t been updated to get the latest updates.

Yes, a service update may be applicable to only Valkey, Redis OSS, or Memcached or they can be applicable to all engine types. You can look for the “Engine” and “Engine Version” service update attributes to determine the scope of each update.

While you can defer the service update by changing the maintenance window, we strongly recommend that you apply the update at the earliest as security at AWS is a shared responsibility. 

The scheduled update will only be applied to the cluster if the scheduled date matches the cluster's maintenance window.

Once you change the maintenance window and the scheduled date has passed, the service update will be rescheduled to the newly specified window in the following weeks. You will receive a new notification one week before the new date has been reached.

Your cluster may be part of different service updates. Most of the updates do not require you to apply them separately. Applying one update to your cluster will mark the other updates as completed wherever applicable. You may need to apply multiple updates to the same cluster separately if the status does not change to “completed” automatically.

To maintain cluster stability, ElastiCache applies updates to only one node at a time within each shard. If the service update cannot be applied to the entire cluster within a single maintenance window, it will be scheduled to continue in the next ones. You will receive new notifications on the next scheduled date and can prepare accordingly.

You cannot rollback the service update once it starts. If you find an issue after applying a service update, please reach out to AWS Support.

Continuous Managed Maintenance Updates

Open all

These updates are mandatory and applied directly in your maintenance windows without any action needed from your side. They are separate than those offered by service updates.

A replacement typically completes within a few seconds, but may take longer in certain instance configurations and traffic patterns. For example, Valkey or Redis OSS primary nodes may not have enough free memory and may be experiencing high write traffic. When an empty replica syncs from this primary, the primary node may run out of memory trying to address the incoming writes as well as sync the replica. In that case, the primary disconnects the replica and restarts the sync process. It may take multiple attempts for replica to sync successfully.

Memcached nodes do not need sync, so their replacement completes faster, irrespective of node sizes.

For Valkey or Redis OSS nodes, the replacement process is designed to make a best effort to retain your existing data and requires successful replication. For single node clusters, ElastiCache dynamically spins up a replica, replicates the data, and then fails over to it. For replication groups consisting of multiple nodes, ElastiCache replaces the existing replicas and syncs data from the primary to the new replicas. If Multi-AZ with auto-failover is enabled, replacing the primary triggers a failover to a read replica. For cluster configurations that are set up to use cluster clients and non-cluster configurations with auto failover enabled, the planned node replacements complete while the cluster serves incoming write requests. If Multi-AZ is disabled, ElastiCache replaces the primary and then syncs the data from a read replica. The primary node is unavailable during this time, leading to longer write interruption.

For Memcached nodes, the replacement process brings up an empty new node and terminates the current node. The new node will be unavailable for a short period during the switch. Once switched, your application may see performance degradation while the empty new node is populated with cache data.

For Valkey or Redis OSS nodes, the replacement process is designed to make a best effort to retain your existing data and requires successful replication. We try to replace just enough nodes from the same cluster at a time to keep the cluster stable.

You can provision primary and read replicas in different Availability Zones. In this case, when a node is replaced, the data will be synced from a peer node in a different Availability Zone. We also recommend that you upgrade your Redis OSS version to 5.0.6 or higher as those engine versions have improved stability and enable your clusters to continuously serve incoming write requests during patching activities if they have auto-failover enabled.

Finally, if your configuration includes only one primary and one single replica per shard, we recommend adding additional replicas prior to the patching. This will prevent reduced availability and risk during the patching process. For single node clusters, we recommend that sufficient memory is available to Valkey or Redis OSS as described here. For replication groups with multiple nodes, we also recommend scheduling the replacement during a period with low incoming write traffic.

For Memcached nodes, you need to schedule your maintenance window during periods with low incoming traffic to minimize application impact. Since Memcached stores data purely in memory, data loss cannot be avoided during node replacements, so it's crucial to test your application for failover scenarios. When nodes are replaced, they will be substituted with new empty nodes, and the existing cache contents will be completely removed, requiring a fresh start. During the switch, the new node will be temporarily unavailable, and applications may experience performance degradation while the empty new node gets populated with data.

For Valkey or Redis OSS, cluster mode configuration has the best availability during managed or unmanaged operations and it is always recommended to use a cluster mode supported client which connects to the cluster discovery endpoint. For cluster mode disabled, it is recommended to always use the primary endpoint for all the write operations. The individual node endpoints of the replica nodes can be used for all the read operations.

If auto-failover is enabled in the cluster, primary node may change, therefore, the application should confirm the role of the node and update all the read endpoints to ensure that you aren't causing a major load on the primary. With auto failover disabled, the role of the node will not change; however, the downtime in managed or unmanaged operations is higher as compared to clusters with auto failover enabled. 

We recommend that you allow ElastiCache to manage your node replacements for you during your scheduled maintenance window. You can specify your preferred time for replacements via the weekly maintenance window when you create an ElastiCache cluster. For changing your maintenance window to a more convenient time later, you can use the ModifyCacheCluster API or click on Modify in the ElastiCache Management Console.

If you choose to manage the replacement yourself, you can take various actions depending on your use case and cluster configuration:

• Change the Maintenance Window.
• Re-launch your instance using Backup & Restore process.
• If your cluster configuration is Cluster Mode Disabled

Replace a read-replica (Cluster-Mode Disabled) – A procedure to manually replace a read-replica in a replication group.
Replace the primary node (Cluster-Mode Disabled) – A procedure to manually replace the primary node in a replication group.
Replace a standalone node (Cluster-Mode Disabled) – Two different procedures to replace a standalone node.

• If your cluster configuration is Cluster Mode Enabled

Replace a node in cluster with one or more shards – You can either use backup and restore or scale-out followed by a scale-in to replace the nodes.

For more instructions on all these options, please refer to the Actions You Can Take When a Node is Scheduled for Replacement page.

For Memcached, you can just delete and re-create the clusters. Post-replacement, your instance should no longer have a scheduled event associated with it.

To receive notifications, you can set up Amazon SNS notifications for significant events such as a scheduled replacement event. You can do this in the ElastiCache Management Console under the Events section or by using the describe-events API to check for the upcoming ElastiCache:NodeReplacementScheduled event.

Yes, you can change your cluster’s maintenance window. For changing your maintenance window to a more convenient time later, you can use the API (ModifyCacheCluster or ModifyReplicationGroup) or click on Modify in the ElastiCache Management Console.

Once you change your maintenance window, ElastiCache will schedule your node for maintenance during the newly specified window. Please see examples on how the changes take effect below.

For example, let's say it's Thursday 11/09 at 1500 and the next maintenance window is Friday 11/10,at 1700. Following are 3 scenarios with their outcomes:

• You change your maintenance window to Friday at 1600 (after the current date time and before the next scheduled maintenance window). The node will be replaced on Friday 11/10 at 1600.
• You change your maintenance window to Saturday at 1600 (after the current date time and after the next scheduled maintenance window). The node will be replaced on Saturday 11/11 at 1600.
• You change your maintenance window to Wednesday 11/18 at 1600 (earlier in the week than the current date time). The node will be replaced next Wednesday 11/8 at 1600.

These replacements are needed to apply mandatory software updates to your underlying host. The updates help strengthen our security, reliability, and operational performance.

We may replace multiple nodes from the same cluster depending on the cluster configuration while maintaining cluster stability. For sharded clusters, we try not to replace multiple nodes in the same shard at a time. In addition, we try not to replace majority of the master nodes in the cluster across all the shards. For non-sharded clusters, we will attempt to stagger node replacements over the maintenance window as much as possible to continue maintaining cluster stability.

Yes, it is possible that these nodes will be replaced at the same time if your maintenance window for these clusters is configured to be the same.