How can I prevent HTTP 503 Service Unavailable errors in Amazon Elasticsearch Service?
Last updated: 2019-08-23
When I query my Amazon Elasticsearch Service (Amazon ES) domain, I get an HTTP 503 Service Unavailable error.
Note: There are many possible causes of this error. This article covers common root causes and solutions.
A load balancer sits in front of each Amazon ES domain. The load balancer distributes incoming traffic to the data nodes. An HTTP 503 error indicates that one or more data nodes in the cluster is overloaded, and the node doesn't have enough capacity to handle the request. This situation is often caused by excessive incoming traffic or expensive queries.
Tip: You can use the RequestCount Amazon CloudWatch metric to track HTTP response codes.
Use one of the following methods to resolve HTTP 503 errors:
Provision more compute resources
- Scale up your domain by switching to larger instances, or scale out by adding more nodes to the cluster. For more information, see Configuring Amazon ES Domains.
- Confirm that you are using an instance type that is appropriate for your use case. For more information, see Choosing Instance Types and Testing.
Reduce the resource utilization for your queries
- Confirm that you are following best practices for shard and cluster architecture. A poorly designed cluster can't use all available resources. Some nodes might be overloaded while other nodes sit idle. Elasticsearch can't fetch documents from overloaded nodes.
- Reduce the number of concurrent requests to the domain.
- Reduce the scope of your query. For example, if you query on time frame, reduce the date range, or filter the results by configuring the index pattern in Kibana.
- Avoid executing select * queries on large indices. Instead, use filters to query a part of the index and search as few fields as possible.
- Reindex and reduce the number of shards. The more shards you have in your Elasticsearch cluster, the more likely you are to get a courier fetch error. Because each shard has its own resource allocation and overheads, a large number of shards strains the cluster. To reduce the number of shards in your cluster, see My Amazon Elasticsearch Service domain has been stuck in the Processing state for a long time.