Why does the rollover index action in my ISM policy keep failing in Amazon OpenSearch Service?

Last updated: 2022-10-03

I want to use Index State Management (ISM) to roll over my indices on my Amazon OpenSearch Service cluster. However, my index fails to roll over, and I receive an error. Why is this happening and how do I resolve this?

Short description

If you received a "Failed to rollover index" error, your rollover action might have failed for one of the following reasons:

  • The rollover target doesn't exist.
  • The rollover alias is missing.
  • The index name doesn't match the index pattern.
  • The rollover alias is pointing to a duplicated alias in an index template.
  • You have maximum resource utilization on your cluster.

To resolve this issue, use the explain API to identify the cause of your error. Then, check your ISM policy. For more information about setting up the rollover action in your ISM policy, see How do I use ISM to manage low storage space in OpenSearch Service?

Note: The following resolution applies only to the OpenSearch API. For the legacy Open Distro API, refer to Open Distro's ISM API operations.

Resolution

Using the explain API

To identify the root cause of your "Failed to rollover index" error, use the explain API:

GET _plugins/_ism/explain/logs-000001?pretty

Here's an example output of the explain API:

{
     "logs-000001": {
          "index.plugins.index_state_management.policy_id": "rollover-workflow",
          "index": "logs-000001",
          "index_uuid": "JUWl2CSES2mWYXqpJJ8qlA",
          "policy_id": "rollover-workflow",
          "policy_seq_no": 2,
          "policy_primary_term": 1,
          "rolled_over": false,
          "state": {
               "name": "open",
               "start_time": 1614738037066
          },
          "action": {
               "name": "rollover",
               "start_time": 1614739372091,
               "index": 0,
               "failed": true,
               "consumed_retries": 0,
               "last_retry_time": 0
          },
          "retry_info": {
               "failed": false,
               "consumed_retries": 0
          },
          "info": {
               "cause": "rollover target [rolling-indices] does not exist",
               "message": "Failed to rollover index [index=logs-000001]"
          }
     }
}

This example output shows that the indices failed to roll over because the target rollover alias (rolling-indices) didn't exist.

The rollover target doesn't exist

If the explain API returns the cause as "rollover target [rolling-indices] does not exist", then check whether the index was bootstrapped with the rollover alias:

GET _cat/aliases

The output lists all the current aliases in the cluster and their associated indices. If ISM indicates that your rollover target doesn't exist, then a rollover alias name and failed index association are missing.

To resolve the failed index association, attach the rollover alias to the index:

POST /_aliases
{
     "actions": [{
          "add": {
               "index": "logs-000001",
               "alias": "my-data"
          }
     }]
}

After you attach the rollover alias, retry the rollover action on the managed index in OpenSearch Service:

POST _plugins/_ism/retry/logs-000001

For more information, see Retry failed index on the OpenSearch website.

When you retry the failed index, you might receive an "Attempting to retry" status message. If OpenSearch Service is attempting to retry, then wait for the next ISM cycle to run. ISM cycles run every 30 to 48 minutes. If the rollover action is successful, then you receive the following message: "Successfully rolled over index".

The rollover alias is missing

If the explain API output identifies the cause of your rollover failure to be a missing rollover alias, then check the settings of the failed index:

GET <failed-index-name>/_settings

If you see that the index.plugins.index_state_management.rollover_alias setting is missing, then manually add the setting to your index:

PUT /<failed-index-name>/_settings
{
     "index.plugins.index_state_management.rollover_alias" : "<rollover-alias>"
}

Use the retry failed index API to retry the rollover operation on the failed index. While the rollover action is being retried, update your policy template:

PUT _index_template/<template-name>

Make sure to use the same settings from your existing policy template so that your rollover alias is applied to the newly created indices. For example:

PUT _index_template/<existing-template> 
{
     "index_patterns": [
          "<index-pattern*>"
     ],
     "template": {
          "settings": {
               "plugins.index_state_management.rollover_alias": "<rollover-alias>"
          }
     }
}

The index name doesn't match the index pattern

If your ISM policy indicates that your rollover operation failed because your index name and index pattern don't match, then check the failed index's name. For successful rollovers, the index names must match the following regex pattern:

`^.*-\d+$`

This regex pattern conveys that index names must include text followed by a hyphen (-), and one or more digits. If the index name doesn't follow this pattern, and your first index has data written onto it, then consider re-indexing the data. When you re-index the data, use the correct name for your new index. For example:

POST _reindex
{
     "source": {
          "index": "<failed-index>"
     },
     "dest": {
          "index": "my-new-index-000001"
     }
}

While the reindex data API is running, detach the rollover alias from the failed index. Then, add the rollover alias to the new index so that the data source can continue to write the incoming data to a new index.

For example:

POST /_aliases
{
     "actions": [{
          "remove": {
               "index": "<failed-index>",
               "alias": "<rollover-alias>"
          }
     },
     {
          "add": {
               "index": "my-new-index-000001",
               "alias": "<rollover-alias>"
          }
     }]
}

Manually attach the ISM policy to the new index using the following API call:

POST _plugins/_ism/add/my-new-index-*
{
     "policy_id": "<policy_id>"
}

Update the existing template to reflect the new index pattern name. For example:

PUT _index_template/<existing-template> 
{
     "index_patterns": ["<my-new-index-pattern*>"],
}

Note: Your ISM policy and rollover alias must reflect the successive indices created with the same index pattern.

The rollover alias is pointing to a duplicated alias in an index template

If the explain API indicates that your index rollover failed because a rollover alias is pointing to a duplicated alias, then check your index template settings:

GET _index_template/<template-name>

Check whether your template contains an additional aliases section (with another alias that points to the same index):

{
     "index_patterns": ["my-index*"],
     "settings": {
          "index.plugins.index_state_management.rollover_alias": "<rollover-alias>"
     },
     "aliases": {
          "another_alias": {
               "is_write_index": true
          }
     }
}

The presence of an additional alias confirms the reason for your rollover operation failure, because multiple aliases cause the rollover to fail. To resolve this failure, update the template settings without specifying any aliases:

PUT _index_template/<template-name>

Then, perform the retry API on the failed index:

POST _plugins/_ism/retry/logs-000001

Important: If an alias points to multiple indices, then make sure that only one index has write access enabled. The rollover API automatically enables write access for the index that the rollover alias points to. This means that you don't need to specify any aliases for the "is_write_index" setting when you perform the rollover operation in ISM.

You have maximum resource utilization on your cluster

The maximum resource utilization on your cluster could be caused by either a circuit breaker exception or lack of storage space.

Circuit breaker exception

If the explain API returns a circuit breaker exception, your cluster was likely experiencing high JVM memory pressure when the rollover API was called. To troubleshoot JVM memory pressure issues, see How do I troubleshoot high JVM memory pressure on my OpenSearch Service cluster?

After the JVM memory pressure falls below 75%, you can retry the activity on the failed index with the following API call:

POST _plugins/_ism/retry/<failed-index-name>

Note: You can use index patterns (*) to retry the activities on multiple failed indices.

If you experience infrequent JVM spikes on your cluster, you can also update the ISM policy with the following retry block for the rollover action:

{
     "actions": {
          "retry": {
               "count": 3,
               "backoff": "exponential",
               "delay": "10m"
          }
     }
}

In your ISM policy, each action has an automated retry based on the count parameter. If your previous operation fails, check the "delay" parameter to see how long you'll need to wait for ISM to retry the action.

Lack of storage space

If your cluster is running out of storage space, then OpenSearch Service triggers a write block on the cluster causing all write operations to return a ClusterBlockException. Your ClusterIndexWritesBlocked metric values shows a value of "1", indicating that the cluster is blocking requests. Therefore, any attempts to create a new index fail. The explain API call also returns a 403 IndexCreateBlockException, indicating that the cluster is out of storage space. To troubleshoot the cluster block exception, see How do I resolve the 403 "index_create_block_exception" error in OpenSearch Service?

After the ClusterIndexWritesBlocked metric returns to "0", retry the ISM action on the failed index. If your JVM memory pressure exceeds 92% for more than 30 minutes, a write block could be triggered. If you encounter a write block, you must troubleshoot the JVM memory pressure instead. For more information about how to troubleshoot JVM memory pressure, see How do I troubleshoot high JVM memory pressure on my OpenSearch Service cluster?


Did this article help?


Do you need billing or technical support?