How can I improve indexing performance on my Elasticsearch cluster?

Last updated: 2019-08-23

I want to optimize indexing operations in Amazon Elasticsearch Service (Amazon ES) so that I can get maximum ingestion throughput.


Use one or more of the following methods to improve indexing performance on a cluster in Amazon ES.

Be sure that the shards for the index you're ingesting into are distributed evenly across the data nodes

Use the following formula to confirm that the shards are distributed evenly:

Number of shards for index = k * (number of data nodes), where k is the number of shards per node

For example, if there are 24 shards in the index, and there are eight data nodes, you should have three shards per node. For more information, see Get Started with Amazon Elasticsearch Service: How Many Shards Do I Need?

Increase refresh_interval to 60 seconds or more

Refreshing your Elasticsearch index makes your documents available for search. Although this is a lightweight operation, refreshing your index requires resources that would otherwise be used by the indexing threads.

The default refresh interval is one second. When you increase the refresh interval, the data node makes fewer API calls. The longer the refresh interval is, the faster indexing is. Increasing the refresh interval also helps prevent 429 errors.

Change replica count to zero

If you're anticipating an hour or two of heavy indexing, consider setting index.number_of_replicas to 0. Each replica duplicates the indexing process, so disabling replicas improves performance. When indexing is over, enable replicas again.

Important: If a node fails while replicas are disabled, you might lose data. Only disable replicas if you can tolerate data loss for an hour or two.

Experiment to find the optimal bulk request size

Start with a bulk request size of 5–15 MiB. Then, slowly increase the request size until indexing performance stops improving. For more information, see Using and Sizing Bulk Requests in the Elasticsearch documentation.

Note: Some instance types limit bulk requests to 10 MiB. For more information, see Network Limits.

Use an instance type that has SSD instance store volumes, such as I3

I3 instances provide fast, local, non-volatile memory express (NVMe) storage. These instances deliver significantly better ingestion performance than instances that use General Purpose SSD (gp2) Amazon Elastic Block Store (Amazon EBS) volumes. For more information, see Run Petabyte-Scale Clusters on Amazon Elasticsearch Service Using I3 instances.

Reduce response size

To reduce the size of the Elasticsearch response, use the filter_path parameter to exclude fields that you don't need. Be sure that you don't filter out fields that you need to identify or retry failed requests. These fields vary by client.

In the following example, the index-name, type-name, and took fields are excluded from the response:

curl -X POST "es-endpoint/index-name/type-name/_bulk?pretty&filter_path=-took,-items.index._index,-items.index._type" -H 'Content-Type: application/json' -d'
{ "index" : { "_index" : "test2", "_id" : "1" } }
{ "user" : "testuser" }
{ "update" : {"_id" : "1", "_index" : "test2"} }
{ "doc" : {"user" : "example"} }

For more information, see Reducing Response Size.

Increase the value of index.translog.flush_threshold_size

By default, index.translog.flush_threshold_size is set to 512 MB. This means that the translog is flushed when it reaches 512 MB. The heavier the indexing load, the more often the translog is flushed. When you increase index.translog.flush_threshold_size, the node performs this expensive operation less frequently. This usually improves indexing performance. Another benefit to increasing the size is that the cluster creates a few large segments instead of multiple small segments. Large segments merge less often, which that means more threads are used for indexing instead of merging.

The downside to increasing index.translog.flush_threshold_size is that translog flushes take longer. If a shard fails, recovery takes longer, because the translog is bigger.

Before increasing index.translog.flush_threshold_size, call the following API operation to get current flush operation statistics. Replace these values in the example:

  • es-endpoint: your Elasticsearch cluster endpoint
  • index-name: the name of your index
$ curl 'es-endpoint/index-name/_stats/flush?pretty'

In the output, note the number of flushes and the total time. In the following example, there were 124 flushes, which took 17690 milliseconds:

"flush" { "total" : 124, "total_time_in_millis" : 17690 }

To increase the flush threshold size, call the following API operation. In this example, the flush threshold size is set to 1024 MB, which is ideal for instances with more than 32 GB of memory. Choose the threshold size that works best for your use case.

$ curl -XPUT 'es-endpoint/index-name/_settings?pretty' -d '{"index":{"translog.flush_threshold_size" : "1024MB"}}'

Run the _stats API operation again to see how flush activity changed:

$ curl 'es-endpoint/index-name/_stats/flush?pretty' 

Note: It's a best practice to start by increasing index.translog.flush_threshold_size for the current index only. After you confirm that you're getting the desired results, apply the changes to the index template.

Disable the _all field

The _all field concatenates the values of all other fields into one string. It requires more CPU and disk space than other fields. Most use cases don't require the _all field. You can concatenate multiple fields using the copy_to parameter.

The _all field is disabled by default in Elasticsearch versions 6.0 and later. To disable the _all field in earlier versions, set enabled to false. Example:

curl -XPUT  <es-endpoint>/<index-name>?pretty -d '{"mappings" : {"type-name" : {"_all": {"enabled": false}}}}' -H 'Content-Type: application/json'

In addition to disabling the _all field, you can also prune the _source field. This is recommended only for advanced users. For more information, see _source field in the Elasticsearch documentation.

Did this article help you?

Anything we could improve?

Need more help?