Why is the Deleted Documents metric so high in my Amazon Elasticsearch Service cluster?

Last updated: 2020-07-14

I've deleted documents in my Amazon Elasticsearch Service (Amazon ES) cluster, but I don't see any disk space reclaimed. How do I free up more disk space?

Short description

In Amazon ES, the DeletedDocuments metric is a counter that shows the number of documents that are marked for deletion. The metric shows an increase after the delete requests are processed and the index segments are merged within your Elasticsearch cluster.

During a routine cleanup, Amazon ES automatically runs the force merge operation. During a force merge, the existing segments are merged into a new segment, and existing segments are also written onto by the new requests. While force merge doesn't expunge any deleted documents, the action saves disk space by reducing the number of index segments in your Elasticsearch cluster.

To maintain the index metadata while reclaiming disk space in your Elasticsearch cluster, consider the following approaches:

  • Check the number of deleted documents.
  • Confirm the size of your documents.
  • Expunge the deleted documents.
  • Reduce the number of documents in your Elasticsearch cluster.
  • Add storage space to your Amazon ES domain.

To reclaim disk space immediately, you can also delete an index instead of deleting individual documents. Deleting an index doesn't create any delete markers. Therefore, the disk space is immediately reclaimed.

Resolution

Check the number of deleted documents

To check the number of deleted documents in your Elasticsearch cluster, run the cluster stats API. The value obtained from the cluster stats API call appears in the DeletedDocuments metric for your Elasticsearch cluster.

The output returns a summation of deleted documents for all the indices present in the Elasticsearch cluster. This count can be checked using the "docs.deleted" field in the response output.

For example, if your cluster has three indices (index1, index2, and index3), you can run the index stats API call:

GET index1/_stats
…
"docs" : {
        "count" : 100,
        "deleted" : 1
      }
… 
GET index2/_stats
…
"docs" : {
        "count" : 100,
        "deleted" : 5
      }
… 
GET index3/_stats
…
"docs" : {
        "count" : 100,
        "deleted" : 8
      }
… 

The cluster stats API call would then add the "docs.deleted” field for all indices that are present in your Elasticsearch cluster:

…
"docs" : {
      "count" : 1227677521,
      "deleted" : 14
    } 
…

If you delete index2, the cluster stats API call calculates only the values for index1 and index3:

GET _cluster/stats
…
"docs" : {
      "count" : 1227677521,
      "deleted" : 9 
    } 

The segments are now merged and the index metadata for index2 is erased. As a result, the DeletedDocuments metric value decreased to 9.

Confirm the size of your documents

To check the document sizes and count for an index, use the cat indices API. Be sure that the new document is the same size as the existing document in your Elasticsearch cluster. Using the same document size makes sure that deleted documents don't take up additional disk space. Instead, Amazon ES is working in the background to free up disk space, merging the segments and automatically removing any deleted documents.

Expunge the deleted documents

To manually reclaim disk space, run the force merge API along with the only_expunge_deletes parameter set as “true”:

POST /<index-name>/_forcemerge?only_expunge_deletes=true

When you execute a forced merge, the old segments are merged into new segments and Amazon ES automatically expunges any deleted documents. As a result, the forced merge decreases the amount of disk space being used. After the new segments are created, the old segments are removed. For more information, see Lucene's handling of deleted documents on the Elasticsearch website.

Reduce the number of documents in your Elasticsearch cluster

To reduce the number of documents in your Elasticsearch cluster (without disabling any write operations), use force merge by itself. However, be aware of the following:

  • Perform force merge on your Elasticsearch cluster only when there is enough free storage space. The action is a resource-intensive operation.
  • The force merge operation triggers an I/O intensive process and blocks all new requests to the cluster until the merge is complete.
  • The force merge operation should be called only against read-only indices, when no additional data is written to the index. If force merge is called against a read/write index, the action can cause very large segments to be produced (>5Gb per segment). Then, the automatic merge policy doesn't consider these segments for future merges until the segments consist mostly of deleted documents. As a result, disk usage is increased and search performance worsens.

You can also use the delete by query API or delete API to manually delete any documents in your Elasticsearch cluster.

Reclaim disk space immediately

To reclaim disk space immediately, use the delete index API. This deletes an existing index and helps to free up disk space.

Note: It's a best practice to delete old indices that aren't being used. If you are deleting an active index, be sure to block the automatic creation of indices. For more information, see Create indices automatically on the Elasticsearch website.