How do I troubleshoot slow logs in Amazon OpenSearch Service?

Last updated: 2021-07-30

I enabled search slow logs for my Amazon OpenSearch Service (successor to Amazon Elasticsearch Service) domain. However, I receive an error, or the slow logs don't appear in my Amazon CloudWatch log group. How do I resolve this?

Resolution

I receive an error when I try to set up slow logs

If your AWS account exceeds ten resource policies for your Region, you receive the following error message in Amazon CloudWatch Logs:

"Unable to create the Resource Access Policy - You have reached the maximum limit for number of Resource Access Policies for CloudWatch Logs. Please select an existing policy and edit it or delete an older policy and try again."

To resolve this error message, create a resource policy that includes multiple log groups.

For example:

{
    "Version": "2012-10-17",
    "Statement": [
        {
         "Effect": "Allow",
         "Principal": {
         "Service": "es.amazonaws.com"
        },
        "Action": [
         "logs:PutLogEvents",
         "logs:CreateLogStream"
        ],
        "Resource": [
         "ARN-Log-Group-1",
         "ARN-Log-Group-2",
         "ARN-Log-Group-3",
         "ARN-Log-Group-4"
        }
    ]
}

Note: The AWS Identity and Access Management (IAM) policy limit can't be increased.

I don't see any slow logs being delivered

If you don't see your slow logs being delivered to CloudWatch, check your IAM policy or OpenSearch Service thresholds.

Because OpenSearch Service requires permission to write to CloudWatch Logs, you must have the proper IAM policy to log your queries. To update your IAM policy, navigate to Search Slow Logs, and then choose Select Setup.

For example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "es.amazonaws.com"
      },
      "Action": [
        "logs:PutLogEvents",
        "logs:CreateLogStream"
      ],
      "Resource": "arn:aws:logs:us-east-1:588671893395:log-group:/aws/aes/domains/myes/search-logs:*"
    }
  ]
}

Also, make sure to set an appropriate timing threshold for your domain. For example, if all your requests complete before the set threshold, then your logs won't be delivered to your log group.

You can also set individual INDEX level thresholds for each debug level (TRACE, DEBUG, INFO, and WARN).

For example, you can set the threshold for WARN debug levels to ten seconds for the YOURINDEXNAME index in OpenSearch Dashboards:

PUT /YOURINDEXNAME/_settings{"index.search.slowlog.threshold.query.warn": "10s"}

Note: You can set TRACE to "0" milliseconds to log all queries for your domain. However, because logging all queries is resource-intensive, your  domain performance might be impacted.

Then, check your threshold using the following command:

GET/YOURINDEXNAME/_settings?pretty

OpenSearch Service logs any queries that exceed the defined threshold.

Best practices

  • Avoid making multiple configuration changes (such as enabling or disabling logs that are published to CloudWatch) at the same time. Too many configuration changes at one time trigger multiple blue/green deployments. Multiple blue/green deployments can cause the OpenSearch Service domain to get stuck in a processing state. For more information about blue/green deployment, see Making configuration changes in OpenSearch Service.
  • Set your threshold for both the query phase and fetch phase to identify slow search queries.
  • Test with a low threshold value, and slowly increase the threshold to log only the queries that are affecting performance or requiring optimization.
  • Choose the appropriate number of shards for your cluster to optimize cluster performance. For more information about shard maintenance, see Amazon OpenSearch Service best practices.
  • For slow logs, enable logging at the TRACE, DEBUG, INFO, and WARN debug levels. Because each debug level logs different categories of information, it's a best practice to enable logging according to the request status.