How do I resolve an HTTP 503 Service Unavailable error in Amazon OpenSearch Service?
Last updated: 2021-07-30
When I query my Amazon OpenSearch Service (successor to Amazon Elasticsearch Service) domain, I get an HTTP 503 Service Unavailable error. How do I resolve this error?
A load balancer sits in front of each OpenSearch Service domain. The load balancer distributes incoming traffic to the data nodes. An HTTP 503 error indicates that one or more data nodes in the cluster is overloaded. When a node is overloaded by expensive queries or incoming traffic, it doesn't have enough capacity to handle any other incoming requests.
Note: You can use the RequestCount metric in Amazon CloudWatch to track HTTP response codes.
Use one of the following methods to resolve HTTP 503 errors:
Provision more compute resources
- Scale up your domain by switching to larger instances, or scale out by adding more nodes to the cluster. For more information, see Creating and managing Amazon OpenSearch Service domains.
- Confirm that you are using an instance type that is appropriate for your use case. For more information, see Choosing instance types and testing.
Reduce the resource utilization for your queries
- Confirm that you are following best practices for shard and cluster architecture. A poorly designed cluster can't use all available resources. Some nodes might be overloaded while other nodes sit idle. OpenSearch Service can't fetch documents from overloaded nodes. For more information about shard and cluster best practices, see Get started with Amazon OpenSearch Service: How many shards do I need?
- Reduce the number of concurrent requests to the domain.
- Reduce the scope of your query. For example, if you run a query for a specific time frame, reduce the date range. You can also filter the results by configuring the index pattern in OpenSearch Dashboards.
- Avoid running select * queries on large indices. Instead, use filters to query a part of the index and search as few fields as possible.
- Re-index and reduce the number of shards. The more shards you have in your cluster, the more likely it will result in a courier fetch error. Because each shard has its own resource allocation and overheads, a large number of shards can strain your cluster. To lower your shard count, see Why is my Amazon OpenSearch Service domain stuck in the "Processing" state?