How do I monitor my Amazon OpenSearch Service cluster using CloudWatch alarms?

3 minute read

I want to monitor my Amazon OpenSearch Service cluster for stability issues. How can I effectively monitor my cluster?

Resolution

Important: Different versions of Elasticsearch use different thread pools to process calls to the _index API.

Elasticsearch versions 1.5 and 2.3 use the index thread pool.
Elasticsearch versions 5.x, 6.0, and 6.2 use the bulk thread pool. (Currently, the OpenSearch Service console doesn't include a graph for the bulk thread pool.)
Elasticsearch versions 6.3 and later use the write thread pool.

To monitor the health of your OpenSearch Service cluster, set the recommended Amazon CloudWatch alarms and the following OpenSearch Service cluster metric alarms:

MasterReachableFromNode
KibanaHealthyNodes
DiskQueueDepth
ThreadpoolIndexQueue
ThreadpoolSearchQueue

You can configure your OpenSearch Service metric alarms like this:

MasterReachableFromNode:
Statistic = Maximum
Value = ‘=0’
Frequency = 1 period
Period = 1 minute
Issue: Leader node is down.

KibanaHealthyNodes:
Statistic = Average
Value = ‘=0’
Frequency = 1 period
Period = 1 minute
Issue: Indicates that the kibana index is unhealthy.

DiskQueueDepth:
Statistic = Average
Value = ‘>=100'
Frequency = 1 period
Period = 5 minutes
Issue: Disk Queue Depth is the number of I/O requests that are queued at a time against the storage. This could indicate a surge in requests or Amazon EBS throttling, resulting in increased latency.

ThreadpoolIndexQueue and ThreadpoolSearchQueue:
Statistic = Maximum
Value = ‘>=20’
Frequency = 1 period
Period = 1 minute
Issue: Indicates that there are requests getting queued up, which can be rejected. To verify the request status, check the CPU Utilization and Threadpool Index or Search rejects.

To set up an Amazon CloudWatch alarm for your OpenSearch Service cluster, perform the following steps:

1. Open the Amazon CloudWatch console.

2. Go to the Alarm tab.

3. Choose Create Alarm.

4. Choose Select Metric.

5. Choose ES for your metric.

6. Select Per-Domain and Per-Client Metrics.

7. Select a metric and choose Next.

8. Configure the following settings for your Amazon CloudWatch alarm:

Statistic = Maximum
Period to 1 minute
Threshold type = Static
Alarm condition = Greater than or equal to
Threshold value = 1

9. Choose the Additional configuration tab.

10. Update the following configuration settings:

Datapoints to alarm = Frequency stated above
Missing data treatment = Treat missing data as ignore (maintain the alarm state)

11. Choose Next.

12. Choose the action that you want your alarm to take, and choose Next.

13. Set a name for your alarm, and then choose Next.

14. Choose Create Alarm.

Note: If the alarm is triggered for CPUUtilization or JVMMemoryPressure, check your Amazon CloudWatch metrics to see if there's a spike coinciding with incoming requests. In particular, monitor these Amazon CloudWatch metrics: IndexingRate, SearchRate, and OpenSearchRequests.

Related information

ClusterBlockException

Using Amazon CloudWatch alarms

Topics

Serverless Analytics

Relevant content

debugging using cloudwatch logs from different
G V Navin
asked a year ago
Can I use packetbeat or some other mechanism to monitor search queries hitting my AWS OpenSearch cluster?
satwood
asked 6 months ago
Elasticsearch 6.2 cluster snapshot process stuck with missing file
csw-g
asked 2 years ago
How do I use Step Functions to create EMR clusters with different specifications?
Accepted Answer
Indira Balakrishnan
asked 3 years ago
ElasticSearch version change gets stuck in Amazon OpenSearch Service
adrianvanlan
asked 2 years ago
How do I stream data from CloudWatch Logs to a VPC-based Amazon OpenSearch Service cluster in a different account?
AWS OFFICIALUpdated 2 years ago
How do I troubleshoot high CPU utilization on my Amazon OpenSearch Service cluster?
AWS OFFICIALUpdated a year ago
How do I use CloudWatch alarms to monitor CloudTrail events?
AWS OFFICIALUpdated a month ago
How do I monitor AWS VPN tunnels using Amazon CloudWatch alarms?
AWS OFFICIALUpdated a year ago
Monitor the state of BGP peering sessions in a Transit Gateway Connect peer using CloudWatch
EXPERT
George Murimi
published 2 months ago