How do I resolve the manual snapshot error in my Amazon Elasticsearch Service cluster?

Last updated: 2021-03-25

I want to restore a manual snapshot of my Amazon Elasticsearch Service (Amazon ES) cluster. However, I receive an error when I try to register a repository or access a registered repository. Why is this happening and how do I resolve this?

Short description

To successfully migrate data from a manual snapshot in Amazon ES, perform the following steps:

1.    Choose an Amazon Simple Storage Service (Amazon S3) bucket where you want to store your snapshot.

2.    Register the Amazon S3 bucket with your Amazon ES source cluster.

3.    Take a snapshot of the Amazon ES source cluster, and then store it in your Amazon S3 bucket.

4.    Register your destination cluster with the same Amazon S3 bucket to make sure that you can view the manual snapshot.

5.    Restore the manual snapshot on the destination cluster in Amazon ES.

Otherwise, you might encounter one of the following issues:

  • 403 Unauthorized error
  • repository_missing_exception
  • concurrent_snapshot_execution_exception
  • snapshot_restore_exception
  • a_w_s_security_token_service_exception
  • "PARTIAL" snapshot status
  • Amazon S3 Glacier storage class issue

Resolution

403 Unauthorized error

If you've enabled fine-grained access control (FGAC) on your Amazon ES domain, you might receive the following error when you take a snapshot:

{"error":{"root_cause":[{"type":"security_exception","reason":"no permissions for [cluster:admin/repository/put] and User [name=arn:aws:iam::012345678912:user/username, backend_roles=[], requestedTenant=null]"}],"type":"security_exception","reason":"no permissions for [cluster:admin/repository/put] and User [name=arn:aws:iam::012345678912:user/username, backend_roles=[], requestedTenant=null]"},"status":403}

To resolve the 403 Unauthorized error, make sure to specify a username:password parameter whenever you take a manual snapshot:

curl -XPUT -u username:password123$ 'elasticsearch-domain-endpoint/_snapshot/snapshot-repository-name/snapshot-name'

Note: You must be a superuser to enable fine-grained access control for your Amazon ES domain. You can either use your superuser name and password or set an AWS Identity Access Management (IAM) role as the superuser. When you access your cluster snapshot, specify your superuser credentials or IAM role. If you specify an IAM role, the IAM role must sign the HTTP requests using sigv4. For more information about using fine-grained access control and IAM roles, see Creating and managing Amazon Elasticsearch Service domains.

You must also register a snapshot repository with your snapshot, and map the manage_snapshots role to an IAM role. The manage_snapshots role must have proper permissions (IAM:PassRole) to assume the IAM role (TheSnapshotRole). For more information, see Manual snapshot prerequisites.

To map the manage_snapshots role to an IAM role, perform the following steps:

1.    Open the Kibana console.

2.    Log in as a master user.

3.    Choose Security.

4.    Choose Roles.

5.    Choose managed_snapshots as your role.

6.    Choose Mapped users.

7.    Add your role ARN (for example: "arn:aws:iam::012345678912:user/username") to the IAM role.

8.    Register your manual snapshot repository.

Repository_missing_exception

Before you take a manual index snapshot, you must register a manual snapshot repository with Amazon ES. Your IAM role (TheSnapshotRole) must also be set up to work with Amazon S3.

If you haven't registered your snapshot repository before taking a manual snapshot, or you used an incorrect repository name, you receive the following error:

{"error":{"root_cause":[{"type":"repository_missing_exception","reason":"[snapshot-repository-name] missing"}],"type":"repository_missing_exception","reason":"[snapshot-repository-name] missing"},"status":404}

To resolve this error, make sure that you meet the manual snapshot prerequisites. Also, make sure that you check for typos in the repository name.

Concurrent_snapshot_execution_exception

If a snapshot is currently in progress, you receive the following error when you try to take another snapshot:

The below error “concurrent_snapshot_execution_exception” means that a snapshot is already in progress
{"error":{"root_cause":[{"type":"concurrent_snapshot_execution_exception","reason":"[snapshot-repository-name:snapshot-name] a snapshot is already running"}],"type":"concurrent_snapshot_execution_exception","reason":"[snapshot-repository-name:snapshot-name] a snapshot is already running"}

To check if there is another snapshot in progress, run the following command:

curl -XGET 'elasticsearch-domain-endpoint/_snapshot/_status'

If a snapshot is already in progress, wait for the current snapshot to complete. Or, if you suspect that your snapshot is stuck, check your history of hourly snapshots. For more information, see How do I resolve a "Prior snapshot operation has not yet completed" error while upgrading my Amazon Elasticsearch Service cluster?

Snapshot_restore_exception

If you try to migrate data from an on-premises Elasticsearch cluster to an Amazon ES domain, you might encounter the following exception:

{
  "error": {
    "root_cause": [
      {
        "type": "snapshot_restore_exception",
        "reason": "[manual-snapshot-repo:my-manual-snapshot1/HPOcIJryTj6a6GJvyP79bw] the snapshot was created with Elasticsearch version [6.8.0] which is higher than the version of this node [6.7.0]"
      }
    ],
    "type": "snapshot_restore_exception",
    "reason": "[manual-snapshot-repo:my-manual-snapshot1/HPOcIJryTj6a6GJvyP79bw] the snapshot was created with Elasticsearch version [6.8.0] which is higher than the version of this node [6.7.0]"
  },
  "status": 500
}

This error message occurs when a snapshot taken on an existing cluster runs on a different version of Elasticsearch than Amazon ES. If your cluster is running on an earlier version of Elasticsearch than Amazon ES is, consider upgrading your Elasticsearch version. Or, you can use the remote reindex API to migrate your indices.

a_w_s_security_token_service_exception

If the IAM role associated with your manual snapshot doesn't have a trust relationship established for "es.amazonaws.com", you receive the following exception:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "repository_exception",
        "reason" : "[es_01082021_repo] Could not determine repository generation from root blobs"
      }
    ],
    "type" : "repository_exception",
    "reason" : "[es_01082021_repo] Could not determine repository generation from root blobs",
    "caused_by" : {
      "type" : "i_o_exception",
      "reason" : "Exception when listing blobs by prefix [index-]",
      "caused_by" : {
        "type" : "a_w_s_security_token_service_exception",
        "reason" : "a_w_s_security_token_service_exception: User: arn:aws:sts::332315457451:assumed-role/cp-sts-grant-role/swift-us-west-2-prod-679203657591 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::679203657591:role/ES_Backup_Role (Service: AWSSecurityTokenService; Status Code: 403; Error Code: AccessDenied; Request ID: 36d09b93-d94f-457e-8fa5-b0a50ba436c3)"
      }
    }
  },
  "status" : 500
}

With Amazon ES snapshots, an internal role is created (such as arn:aws:sts::332315457451:assumed-role/cp-sts-grant-role/swift-us-west-2-prod-679203657591). This internal role assumes the IAM role associated with the manual snapshot, and then performs any required operations.

To resolve the security token exception, make sure to specify the IAM role associated with the manual snapshot. If you don't have an IAM role associated with the manual snapshot, then you must create one. For more information, see Manual snapshot prerequisites.

Also, check the trust relationship for the IAM role associated with the manual snapshot. The trust relationship for the role must specify Amazon ES in the Principal statement, like this:

{
   "Version": "2012-10-17",
   "Statement": [{
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
          "Service": "es.amazonaws.com"
          },
      "Action": "sts:AssumeRole"
  }]
}

PARTIAL snapshot status

A snapshot enters a "PARTIAL" state for the following reasons:

A partial snapshot indicates that data from a shard couldn't be stored. You can still restore data from a partial snapshot, but you must use earlier snapshots to restore any missing indices. To check whether your cluster has entered a "PARTIAL" state, check your snapshot history. For more information, see Restoring snapshots.

Amazon S3 Glacier storage class issue

If you're storing a restored snapshot in the Amazon S3 Glacier Storage class, avoid applying an Amazon S3 Glacier Lifecycle rule to the bucket. Manual snapshots don't support the Amazon S3 Glacier storage class. Therefore, if you apply an Amazon S3 Glacier Lifecycle policy to the S3 bucket, you must move back any objects that transition over.

After you move the objects back to the standard Amazon S3 storage class, you can restore the objects from those snapshots. For more information, see Manual snapshot prerequisites.