如何使用 CloudWatch 警示來監控 Amazon OpenSearch Service 叢集?

上次更新日期︰2021 年 9 月 30 日

我想要監控我的 Amazon OpenSearch Service (Amazon Elasticsearch Service 的後繼者) 叢集的是否有穩定性問題。如何有效地監控我的叢集?

解決方案

重要提示:不同版本的 Elasticsearch 會使用不同的執行緒集區來處理對 _index API 的呼叫。

  • Elasticsearch 1.5 和 2.3 版使用索引執行緒集區。
  • Elasticsearch 5.x、6.0 和 6.2 版使用大量執行緒集區。(目前,OpenSearch Service 主控台不包含大量執行緒集區的圖形。)
  • Elasticsearch 6.3 版和更高版本使用寫入執行緒集區。

若要監控 OpenSearch Service 叢集的運作狀態,請設定建議的 Amazon CloudWatch 警示和下列 OpenSearch Service 叢集指標警示

  • MasterReachableFromNode
  • KibanaHealthyNodes
  • DiskQueueDepth
  • ThreadpoolIndexQueue
  • ThreadpoolSearchQueue

您可以像這樣配置您的 OpenSearch Service 指標警示:

MasterReachableFromNode:
Statistic = Maximum
Value = ‘=0’
Frequency = 1 period
Period = 1 minute
Issue: Leader node is down.

KibanaHealthyNodes:
Statistic = Average
Value = ‘=0’
Frequency = 1 period
Period = 1 minute
Issue: Indicates that the kibana index is unhealthy.

DiskQueueDepth:
Statistic = Average
Value = ‘>=100'
Frequency = 1 period
Period = 5 minutes
Issue: Disk Queue Depth is the number of I/O requests that are queued at a time against the storage. This could indicate a surge in requests or Amazon EBS throttling, resulting in increased latency.

ThreadpoolIndexQueue and ThreadpoolSearchQueue:
Statistic = Maximum
Value = ‘>=20’
Frequency = 1 period
Period = 1 minute
Issue: Indicates that there are requests getting queued up, which can be rejected. To verify the request status, check the CPU Utilization and Threadpool Index or Search rejects.

若要為您的 OpenSearch Service 叢集設定 Amazon CloudWatch 警示,請執行下列步驟:

1.    開啟 Amazon CloudWatch 主控台

2.    前往 Alarm (警示) 索引標籤。

3.    選擇建立警示

4.    選擇選取指標

5.    為您的指標選擇 ES

6.    選取每個網域每個用戶端指標

7.    選取指標,然後選擇下一步

8.    為您的 Amazon CloudWatch 警示設定下列設定:

Statistic = Maximum
Period to 1 minute
Threshold type = Static
Alarm condition = Greater than or equal to
Threshold value = 1

9.    選擇其他組態分頁。

10.    更新下列組態設定:

Datapoints to alarm = Frequency stated above
Missing data treatment = Treat missing data as ignore (maintain the alarm state)

11.    選擇下一步

12.    選擇您要警示執行的動作,然後選擇 Next (下一步)。

13.    設定警示的名稱,然後選擇下一步

14.    選擇建立警示

注意:如果針對 CPUUtilizationJVMMemoryPressure 觸發警示,請檢查您的 Amazon CloudWatch 指標,以查看是否有與傳入請求相符的峰值情況。特別是監控這些 Amazon CloudWatch 指標:IndexingRateSearchRateOpenSearchRequests


此文章是否有幫助?


您是否需要帳單或技術支援?