How can I run a select query on objects stored in the Amazon S3 Glacier storage class or an Amazon S3 Glacier vault?

Last updated: 2019-06-20

I want to run a select query on objects that I stored using the Amazon Simple Storage Service (Amazon S3) Glacier storage class. Or, I want to run a select query on objects that I archived in an Amazon S3 Glacier vault. How can I run this select query using the AWS Command Line Interface (AWS CLI)?

Resolution

Amazon S3 Glacier storage class

Follow these steps to run a select query on objects stored in the Amazon S3 Glacier storage class using the AWS CLI:

1.    Create a JSON file using parameters for the restore-object AWS CLI command. Be sure to include the following parameters:
For the Expression parameter, enter the select query.
For the OutputLocation parameter, enter the S3 location where you want the query results to be stored.

For example, the following JSON file runs the query SELECT * FROM object. Then, it sends the query results to the S3 location awsexamplebucket/outputJob:

{
    "Type": "SELECT",
    "Tier": "Standard",
    "SelectParameters": {
        "InputSerialization": {
            "CSV": {
                "FileHeaderInfo": "USE"
            }
        },
        "ExpressionType": "SQL",
        "Expression": "SELECT * FROM object",
        "OutputSerialization": {
            "CSV": {}
        }
    },
    "OutputLocation": {
        "S3": {
            "BucketName": "awsexamplebucket",
            "Prefix": "outputJob",
            "StorageClass": "STANDARD"
        }
    }
}

Note: This example includes only the required parameters.

2.    Run the restore-object command and specify the JSON file that you created as the --restore-request value:

$ aws s3api restore-object --bucket awsexamplebucket --key file.csv --restore-request file://restoreJob.json

3.    After you run the command, you receive the S3 location where the select query results are stored. This output is similar to the following:

{
    "RestoreOutputPath": "outputJob/example_iNHt-0sE2cqt8ZtwNdE1nHi7nKzRaIWLyzzcsW4QWIUaHduHVReDDdavy-aLfEdnD6ql3AR8s4f4oZzhdhBg/"
}

For more information on the query output, see Select Output.

Amazon S3 Glacier vault

Follow these steps to run a select query on objects archived in an Amazon S3 Glacier vault using the AWS CLI:

1.    Create a JSON file using parameters for the initiate-job AWS CLI command. Be sure to include the following parameters:
For the Expression parameter, enter the select query.
For the OutputLocation parameter, enter the S3 location where you want the query results to be stored.

For example, the following JSON file runs the query SELECT * FROM archive. Then, it and sends the query results to the S3 location awsexamplebucket/outputJobGlacier:

{
    "Type": "SELECT",
    "ArchiveId": "example_QpOx5CphSp5VY8Jbhkmd1-TyIF5KFr_VYDAyyqb0n5r5i5_CBVKY0F0OfwzbsWrewjIJ1Kd1d9Tgp97C$
    "Tier": "Standard",
    "SelectParameters": {
        "InputSerialization": {
            "csv": {
                "FileHeaderInfo": "USE"
            }
        },
        "ExpressionType": "SQL",
        "Expression": "SELECT * FROM archive",
        "OutputSerialization": {
            "csv": {}
        }
    },
    "OutputLocation": {
        "S3": {
            "BucketName": "awsexamplebucket",
            "Prefix": "outputJobGlacier",
            "StorageClass": "STANDARD"
        }
    }
}

Note: This example includes only the required parameters.

2.    Run the initiate-job command and specify the JSON file that you created as the --job-parameters value:

$ aws glacier initiate-job --account-id 111122223333 --vault-name awsexamplevault --job-parameters file://restoreJobGlacier.json

3.    After you run the command, you receive a response that includes the S3 location where the query results are stored (jobOutputPath). This output is similar to the following:

{
    "jobOutputPath": "outputJobGlacier/example_N4Zpv9bExi9TCv1gSSwWUhsK-371E9KBCFlAEMSnD2Gzq4GXhaT4aK7Gj8_s-jAn4aAvMNgUEbhTlxUfNYT9/",
    "location": "/111122223333/vaults/awsexamplevault/jobs/example_N4Zpv9bExi9TCv1gSSwWUhsK-371E9KBCFlAEMSnD2Gzq4GXhaT4aK7Gj8_s-jAn4aAvMNgUEbhTlxUfNYT9",
    "jobId": "example_N4Zpv9bExi9TCv1gSSwWUhsK-371E9KBCFlAEMSnD2Gzq4GXhaT4aK7Gj8_s-jAn4aAvMNgUEbhTlxUfNYT9"
}

For more information on the query output, see Glacier Select Output


Did this article help you?

Anything we could improve?


Need more help?