Why is OpenSearch Dashboards in red status on my Amazon OpenSearch Service domain?

Last updated: 2021-07-30

OpenSearch Dashboards keeps showing red status on my Amazon OpenSearch Service (successor to Amazon Elasticsearch Service) domain. Why is this happening and how do I troubleshoot this?

Short description

OpenSearch Dashboards shows green status when all health checks pass on every node of the OpenSearch Service cluster. If a health check fails, then OpenSearch Dashboards enters red status. OpenSearch Dashboards also shows red status when OpenSearch Service is in red cluster status. OpenSearch Dashboard's status can turn red for the following reasons:

  • Node failure caused by an issue with an Amazon Elastic Compute Cloud (Amazon EC2) instance or Amazon Elastic Block Store (Amazon EBS) volume. For more information about node crashes, see Why did my Amazon OpenSearch Service node crash?
  • Insufficient memory for your nodes.
  • Upgrading OpenSearch Service to a newer version.
  • Incompatibility between OpenSearch Dashboards and OpenSearch Service versions.
  • A single-node cluster is running with a heavy load and no dedicated leader nodes. The dedicated leader node could also be unreachable. For more information about how OpenSearch Service increases cluster stability, see Dedicated leader nodes.

Resolution

Use one or more of the following methods to resolve OpenSearch Dashboards red status for your OpenSearch Service domain.

Note: If your cluster shows a circuit breaker exception, first increase the circuit breaker limit. If you don't have a circuit breaker exception, try the other methods before you increase the circuit breaker limit.

Tune queries

If you're running complex queries (such as heavy aggregations), then tune the queries for maximum performance. Sudden spikes in heap memory consumption can be caused by the field data or data structures that are used for aggregation queries.

Review the following API calls to identify the cause of the spike, replacing os-endpoint with your domain endpoint:

$curl os-endpoint/_nodes/stats/breaker?pretty
$curl "os-endpoint/_nodes/stats/indices/fielddata?level=indices&fields=*"

For more information about managing memory usage, see Tune for search speed on the Elasticsearch website.

Use dedicated leader nodes

It's a best practice to allocate three dedicated leader nodes for each OpenSearch Service domain. For more information about improving cluster stability, see Get started with Amazon OpenSearch Service: Use dedicated leader instances to improve cluster stability.

Scale up

To scale up your domain, increase the number of nodes or choose an Amazon EC2 instance type that holds more memory. For more information about scaling, see How can I scale up or scale out my Amazon OpenSearch Service domain?

Check your shard distribution

Check the index that your shards are ingesting into to confirm that they are evenly distributed across all data nodes. If your shards are unevenly distributed, one or more of the data nodes could run out of storage space.

Use the following formula to confirm that the shards are distributed evenly:

Total number of shards = shards per node * number of data nodes

For example, if there are 24 shards in the index, and there are eight data nodes, then you will have three shards per node. For more information about the number of shards needed, see Get started with Amazon OpenSearch Service: How many shards do I need?

Check your versions

Important: Your OpenSearch Dashboards and OpenSearch Service versions must be compatible.

Run the following API call to confirm that your versions are compatible, replacing os-endpoint with your domain endpoint:

$curl os-endpoint/.kibana/config/_search?pretty

Note: An unsuccessful command can indicate compatibility issues between OpenSearch Dashboards and Supported OpenSearch Service versions. For more information about compatible OpenSearch Dashboards and Elasticsearch versions, see Set up on the Elasticsearch website.

Monitor resources

Set up Amazon CloudWatch alarms that notify you when resources are used above a certain threshold. For example, if you set an alarm for JVM memory pressure, then take action before the pressure reaches 100%. For more information about CloudWatch alarms, see Recommended CloudWatch alarms and Improve the operational efficiency of Amazon OpenSearch Service domains with automated alarms using Amazon CloudWatch.

Increase the circuit breaker limit

To prevent the cluster from running out of memory, try increasing the parent or field data circuit breaker limit. For more information about field data circuit breaker limits, see Circuit breaker on the Elasticsearch website.