Why does the rollover index action in my ISM policy keep failing in Amazon Elasticsearch Service?

Last updated: 2021-07-08

I want to use Index State Management (ISM) to roll over my indices on my Amazon Elasticsearch Service (Amazon ES) cluster. However, my index fails to roll over, and I receive an error. Why is this happening and how do I resolve this?

Short description

If you received a "Failed to rollover index" error, your rollover action might have failed for one of the following reasons:

  • Rollover target doesn't exist.
  • Rollover alias is missing.
  • Index name doesn't match the index pattern.
  • Rollover alias is pointing to a duplicated alias in an index template.
  • Maximum resource utilization on an Amazon ES cluster.

To resolve this issue, use the explain API to identify the cause of your error. Then, check your ISM policy. For more information about setting up the rollover action in your ISM policy, see How do I use Index State Management (ISM) to manage low storage space in Amazon Elasticsearch Service?

Resolution

Using the explain API

To identify the root cause of your "Failed to rollover index" error, use the explain API:

GET _opendistro/_ism/explain/logs-000001?pretty

Here's an example output of the explain API:

{
  "logs-000001" : {
    "index.opendistro.index_state_management.policy_id" : "rollover-workflow",
    "index" : "logs-000001",
    "index_uuid" : "JUWl2CSES2mWYXqpJJ8qlA",
    "policy_id" : "rollover-workflow",
    "policy_seq_no" : 2,
    "policy_primary_term" : 1,
    "rolled_over" : false,
    "state" : {
      "name" : "open",
      "start_time" : 1614738037066
    },
    "action" : {
      "name" : "rollover",
      "start_time" : 1614739372091,
      "index" : 0,
      "failed" : true,
      "consumed_retries" : 0,
      "last_retry_time" : 0
    },
    "retry_info" : {
      "failed" : false,
      "consumed_retries" : 0
    },
    "info" : {
      "cause" : "rollover target [rolling-indices] does not exist",
      "message" : "Failed to rollover index [index=logs-000001]"
    }
  }
}

This example output indicates that the indices failed to roll over because the target rollover alias (rolling-indices) didn't exist.

Rollover target doesn't exist

If the explain API returns the cause as "rollover target [rolling-indices] does not exist", then check whether the index was bootstrapped with the rollover alias:

GET _cat/aliases

The output lists all the current aliases in the cluster and their associated indices. If ISM indicates that your rollover target doesn't exist, then a rollover alias name and failed index association are missing.

To resolve the failed index association, attach the rollover alias to the index:

POST /_aliases
{
    "actions" : [
        { "add" : { "index" : "logs-000001", "alias" : "my-data" } }
    ]
}

After you attach the rollover alias, retry the rollover action on the managed index in Amazon ES:

POST _opendistro/_ism/retry/logs-000001

For more information, see Retry failed index on the Open Distro for Elasticsearch website.

When you retry the failed index, you might receive an "Attempting to retry" status message. If Amazon ES is attempting to retry, then wait for the next ISM cycle to run. ISM cycles run every 30 to 48 minutes. If the rollover action is successful, you receive the following message: "Successfully rolled over index".

Rollover alias is missing

If the explain API output identifies the cause of your rollover failure to be a missing rollover alias, then check the settings of the failed index:

GET <failed-index-name>/_settings

If you see that the index.opendistro.index_state_management.rollover_alias setting is missing, then manually add the setting to your index:

PUT /<failed-index-name>/_settings
{
  "index.opendistro.index_state_management.rollover_alias":"<rollover-alias>"
}

Use the retry failed index API to retry the rollover operation on the failed index. While the rollover action is being retried, update your policy template:

PUT _template/<template-name>

Make sure to use the same settings from your existing policy template so that your rollover alias is applied to the newly created indices. For example:

PUT _template/<existing-template>
{
  "index_patterns": ["<index-pattern*>"], 
  "settings": {
    "index.opendistro.index_state_management.policy_id": "<policy_id>",
    "index.opendistro.index_state_management.rollover_alias":"<rollover-alias>"
  }
}

Index name doesn't match the index pattern

If your ISM policy indicates that your rollover operation failed because your index name and index pattern don't match, then check the failed index's name. For successful rollovers, the index names must match the following regex pattern:

`^.*-\\d+$`

This regex pattern conveys that index names must include text followed by a hyphen (-), and one or more digits. If the index name doesn't follow this pattern, and your first index has data written onto it, then consider re-indexing the data. When you re-index the data, use the correct name for your new index. For example:

POST _reindex
{
  "source": {
    "index": "<failed-index>"
  },
  "dest": {
    "index": "my-new-index-000001"
  }
}

While the reindex data API is being run, detach the rollover alias from the failed index. Then, add the rollover alias to the new index so that the data source can continue to write the incoming data to a new index.

For example:

POST /_aliases
{
    "actions" : [
        { "remove" : { "index" : "<failed-index>", "alias" : "<rollover-alias>" } },
        { "add" : { "index" : "my-new-index-000001", "alias" : "<rollover-alias>" } }
    ]
}

Manually attach the ISM policy to the new index using the following API call:

POST _opendistro/_ism/add/my-new-index-* { "policy_id": "<policy_id>" }

Update the existing template to reflect the new index pattern name:

PUT _template/<existing temaplate>

Note: Your ISM policy and rollover alias must reflect the successive indices created with the same index pattern.

Rollover alias is pointing to a duplicated alias in an index template

If the explain API indicates that your index rollover failed because a rollover alias is pointing to a duplicated alias, then check your index template settings:

GET _template/<template-name>

Check whether your template contains an additional aliases section (with another alias that points to the same index):

"index_patterns": ["my-index*"],
 "settings": {
        "index.opendistro.index_state_management.policy_id": "rollover-policy",
        "index.opendistro.index_state_management.rollover_alias": "rollover-alias"
        },
 "aliases": {
        "another_alias": {
            "is_write_index": true
        }
    }

The presence of an additional alias confirms the reason for your rollover operation failure, because multiple aliases cause the rollover to fail. To resolve this failure, update the template settings without specifying any aliases:

PUT _template/<template-name>

Then, perform the retry API on the failed index:

POST _opendistro/_ism/retry/my-index-000001

Important: If an alias points to multiple indices, then make sure that only one index has write access enabled. The rollover API automatically enables write access for the index that the rollover alias points to. This means that you don't need to specify any aliases for the "is_write_index" setting when you perform the rollover operation in ISM.

Maximum resource utilization on an Amazon ES cluster

The maximum resource utilization on your cluster could be caused by either a circuit breaker exception or lack of storage space.

Circuit breaker exception

If the explain API returns a circuit breaker exception, your cluster was likely experiencing high JVM memory pressure when the rollover API was called. To troubleshoot JVM memory pressure issues, see How do I troubleshoot high JVM memory pressure on my Amazon Elasticsearch Service cluster?

After the JVM memory pressure falls below 75%, you can retry the activity on the failed index with the following API call:

POST _opendistro/_ism/retry/<failed-index-name>

Note: You can use index patterns (*) to retry the activities on multiple failed indices.

If you experience infrequent JVM spikes on your cluster, you can also update the ISM policy with the following retry block for the rollover action:

"actions": {
    "retry": {
        "count": 3,
        "backoff": "exponential",
        "delay": "10m"
    }
}

In your ISM policy, each action has an automated retry based on the count parameter. If your previous operation fails, check the "delay" parameter to see how long you'll need to wait for ISM to retry the action.

Lack of storage space

If your cluster is running out of storage space, Amazon ES triggers a write block on the cluster making all write operations return a ClusterBlockException. Your ClusterIndexWritesBlocked metric values will show a value of "1", indicating that the cluster is blocking requests. Therefore, any attempts to create a new index will fail. The explain API call will also return a 403 IndexCreateBlockException, indicating that the cluster is out of storage space. To troubleshoot the cluster block exception, see How do I resolve the 403 "index_create_block_exception" error in Amazon Elasticsearch Service?

After the ClusterIndexWritesBlocked metric returns to "0", retry the ISM action on the failed index. If your JVM memory pressure exceeds 92% for more than 30 minutes, a write block could be triggered. If you encounter a write block, you must troubleshoot the JVM memory pressure instead. For more information about how to troubleshoot JVM memory pressure, see How do I troubleshoot high JVM memory pressure on my Amazon Elasticsearch Service cluster?


Did this article help?


Do you need billing or technical support?