How do I troubleshoot high JVM memory pressure on my Amazon Elasticsearch Service cluster?

Last updated: 2021-06-07

My Amazon Elasticsearch Service (Amazon ES) cluster has high JVM memory pressure. What do the different JVM memory pressure levels mean and how do I reduce them?

Resolution

The JVM memory pressure specifies the percentage of the Java heap in a cluster node. The following guidelines indicate what the JVM memory pressure percentages mean:

  • If JVM memory pressure reaches 75%, then Amazon ES triggers the Concurrent Mark Sweep (CMS) garbage collector. The garbage collection is a CPU-intensive process. If JVM memory pressure stays at this percentage for a few minutes, then you could encounter ClusterBlockException, JVM OutOfMemoryError, or other cluster performance issues.
  • If JVM memory pressure exceeds 92% for 30 minutes, then Amazon ES blocks all write operations.
  • If JVM memory pressure reaches 100%, then Amazon ES JVM is configured to exit and eventually restarts on OutOfMemory (OOM).

High JVM memory pressure can be caused by the following reasons:

  • Spikes in the numbers of requests to the cluster.
  • Aggregations, wildcards, and selecting wide time ranges in the queries.
  • Unbalanced shard allocations across nodes or too many shards in a cluster.
  • Field data or index mapping explosions.
  • Instance types that can't handle incoming loads.

You can resolve high JVM memory pressure issues by reducing traffic to the cluster. To reduce traffic to the cluster, follow these best practices:

  • Clear the field data cache with the POST /index_name/_cache/clear?fielddata=true API operation.
    Note: Clearing the cache can disrupt queries that are in progress.
  • Avoid aggregating on text fields or change the mapping type to keyword.
  • Scale the domain (so that the maximum heap size per node is 32 GB).
  • Enable slow logs to figure out faulty requests.
    Note: Verify that the JVM memory pressure is below 90%. For more information about slow Elasticsearch queries, see Advanced tuning: finding and fixing slow Elasticsearch queries on the Elasticsearch website.
  • Optimize search or indexing by choosing the correct number of shards. For more information about indexing and shard count, see Get started with Amazon Elasticsearch Service: How many shards do I need?
  • Reduce the number of shards by deleting old or unused indices.
  • For advanced users, you can update the parent, fielddata, or request circuit breaker settings according to your use case. For more information about JVM circuit breakers, see JVM OutOfMemoryError.

For more information about how to troubleshoot high JVM memory pressure, see Why did my Elasticsearch node crash?