What happens when I make a configuration change to my Amazon OpenSearch Service cluster?

Last updated: 2021-08-05

I'm trying to minimize the downtime during a configuration change. What happens if I make a configuration change to my Amazon OpenSearch Service (successor to Amazon Elasticsearch Service) cluster?

Resolution

When you change your OpenSearch Service cluster configuration, a blue/green deployment can be triggered. During a blue/green deployment, a cluster state changes to "Processing" while a new OpenSearch Service domain is being created. When your new domain is created, the following occurs:

  • The total number of nodes are doubled. Or, the total number of nodes is equal to the node count in the old and new domain.
  • The number of nodes are doubled until the old domain nodes are terminated.
  • If a shard allocation is finally in progress, the cluster state returns to "Active".

Note: During blue/green deployment, you might observe some latency. To avoid any latency issues, it's a best practice to run blue/green deployment when the cluster is healthy and there is low network traffic.

Configuration change duration

Your configuration change can take longer depending on the cluster size, workload, shard size, and shard count. Use the cat recovery command to monitor the status of your shard relocation.

To see which shards are still relocating, use the following command syntax:

Curl  -X GET "cluster_endpoint/_cat/recovery?v=true&pretty" | awk '/peer/ {print $1" "$2" "$3" "$4" "$18}' | grep -v 100\.0\%

To list the shard relocation by byte percentages, use the following command syntax:

Curl -X GET "https://<end_point>/_cat/recovery?v=true&pretty" | awk '/peer/ {print $1" "$2" "$3" "$4" "$18}' | tr -d "%" | sort -k 5 -n

Note: To sort the data by byte percentage (which is in the fifth column), you must specify "5" for -k.

If you observe minimal progress for the shard relocation, your cluster might be stuck.

Reasons your blue/green deployment process is stuck

Your blue/green deployment process might get stuck for the following reasons:

  • An unhealthy cluster state from before the configuration change.
  • Consistently high JVM memory pressure. Aim to keep your JVM memory pressure below 75% to avoid out of memory (OOM) issues.
  • Consistently high CPU utilization. Aim to keep your CPU utilization below 80%.
  • Too many shards on a cluster or incorrect shard sizing. It's a best practice to keep your shard count between 10 GiB and 50 GiB. For more information about indexing strategy, see Choosing the number of shards.
  • Invalid configuration setup or too many configuration changes at the same time. Make sure to verify your configuration settings and wait to send a configuration change until the first configuration change completes.
  • Insufficient disk space or capacity for the relocation process or requested instance type.
  • Lack of available IPs on the requested subnet for a cluster inside a virtual private cloud (VPC).
  • Using volume size for the instance type. Your volume size must be within the limit range.
  • Using index settings like "index.routing.allocation.require._name" or "NODE_NAME" or "index.blocks.write": true". These settings indicate a write block. Make sure to remove these settings from your index settings before you proceed.

For more information, see Why is my Amazon OpenSearch Service domain stuck in the "Processing" state?


Did this article help?


Do you need billing or technical support?