How do I troubleshoot slow logs in Amazon Elasticsearch Service?

Last updated: 2020-08-26

I enabled Elasticsearch Search Slow Logs for my Amazon Elasticsearch Service (Amazon ES) domain. However, I receive an error, or the slow logs don't appear in my Amazon CloudWatch log group. How do I resolve this?

Resolution

I receive an error when I try to set up slow logs

If your AWS account exceeds ten resource policies for your Region, you receive the following error message in Amazon CloudWatch Logs:

"Unable to create the Resource Access Policy - You have reached the maximum limit for number of Resource Access Policies for CloudWatch Logs. Please select an existing policy and edit it or delete an older policy and try again."

To resolve this error message, create a resource policy that includes multiple log groups like this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
         "Effect": "Allow",
         "Principal": {
         "Service": "es.amazonaws.com"
        },
        "Action": [
         "logs:PutLogEvents",
         "logs:CreateLogStream"
        ],
        "Resource": [
         "ARN-Log-Group-1",
         "ARN-Log-Group-2",
         "ARN-Log-Group-3",
         "ARN-Log-Group-4"
        }
    ]
}

Note: The AWS Identity and Access Management (IAM) policy limit can't be increased.

I don't see any slow logs being delivered

If you don't see your slow logs being delivered to CloudWatch, check your IAM policy or Amazon ES thresholds.

Because Amazon ES requires permission to write to CloudWatch Logs, you must have the proper IAM policy to log your queries. To update your IAM policy, navigate to Search Slow Logs, and then choose Select Setup. Your IAM policy should look like the following example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "es.amazonaws.com"
      },
      "Action": [
        "logs:PutLogEvents",
        "logs:CreateLogStream"
      ],
      "Resource": "arn:aws:logs:us-east-1:588671893395:log-group:/aws/aes/domains/myes/search-logs:*"
    }
  ]
}

Also, be sure to set an appropriate timing threshold for your Amazon ES domain. For example, if all your requests complete before the set threshold, then logs aren't delivered to your log group.

You can also set individual INDEX level thresholds for each debug level (TRACE, DEBUG, INFO, and WARN). In the following example, the threshold for WARN debug levels is set to ten seconds for YOURINDEXNAME index in Kibana:

PUT /YOURINDEXNAME/_settings{"index.search.slowlog.threshold.query.warn": "10s"}

Note: You can set TRACE to "0" milliseconds to log all queries for your Amazon ES domain. However, logging all queries can affect Amazon ES domain performance, because it is a resource-intensive operation.

Then, check your threshold using the following command:

GET/YOURINDEXNAME/_settings?pretty

Your Amazon ES logs any queries that exceed the defined threshold.

Best practices

  • Avoid making multiple configuration changes (such as enabling or disabling logs that are published to CloudWatch) at the same time. Too many configuration changes at one time trigger multiple blue/green deployments. Multiple blue/green deployments can cause the Amazon ES domain to get stuck in a processing state. For more information about blue/green deployment, see Configuration changes.
  • Set your threshold for both the query phase and fetch phase to identify slow search queries.
  • Test with a low threshold value, and slowly increase the threshold to log only the queries that are affecting performance or requiring optimization.
  • Choose the appropriate number of shards for your Elasticsearch cluster to optimize cluster performance. For more information about shard maintenance, see Amazon Elasticsearch Service best practices.
  • For slow logs, enable logging at the TRACE, DEBUG, INFO, and WARN debug levels. Because each debug level logs different categories of information, it's a best practice to enable logging according to the request status.