I want to copy a large (100 MB or more) file to an Amazon S3 bucket in multiple parts. I want to ensure the integrity of the file after the file is uploaded. How can I do this using the AWS Command Line Interface (CLI)?

Uploading a file to S3 in multiple parts can improve upload speeds by using multiple threads to upload file parts in parallel. Also, if one or more parts fail to upload, you will only have to re-upload those parts and not the entire file.

You verify the integrity of a multipart message upload by sending the base64 MD5 checksum values for the original file and each of the file parts to Amazon S3 when you initiate the upload. These checksum values can be used to verify the integrity of the file or any file part. These are the steps:

  1. Obtain the base64 MD5 checksum of the file to be uploaded.
  2. Split the file into multiple parts and obtain the base64 MD5 checksum of each file part.
  3. Initiate a multipart message upload to Amazon S3 and receive a response with a unique UploadId value. This value associates the original file with each of the file parts.
  4. Upload the file parts to S3, specifying the UploadId and the base64 MD5 checksum for each file part. S3 returns a distinct ETag value for each file part.
  5. After all parts have been uploaded, run a command to complete the multipart upload process. This command includes several details such as the destination bucket, a list of the file parts, and other information. S3 uses this information to recreate and verify the integrity of the original file.

Complete the following steps using the AWS CLI to perform a multipart message upload of a file to an Amazon S3 bucket.

Note
The AWS CLI is accessible from several different command processors running on multiple operating systems. Command processors often have different syntax rules for accepting parameter values. For example, some command processors process double-quoted text while others only accept single-quoted text. All of the sample input and output in this document was created running the AWS CLI from a Windows command prompt. Linux, MacOS, and Windows PowerShell use slightly different syntax. For more information, see Specifying Parameter Values for the AWS Command Line Interface.

Using Windows – Download the File Checksum Integrity Verifier (FCIV) utility and extract the contents to a folder. Then add the location of the folder to the Windows system path by running the following command from an elevated (run as Administrator) command prompt, replacing c:\fciv with the folder that contains the extracted the FCIV utility files:

C:>set path=%path%;c:\fciv

Note that when you modify the Windows system path from a command prompt, the change does not persist when Windows is restarted. If you want to modify the Windows system path environment variable permanently, check the Windows documentation or search the Web for "Change Windows X path variable", substituting your version of Windows for X.

After installing the FCIV utility and updating the %path% environment variable with the location of the extracted FCIV utility files, run this command to return the hexadecimal MD5 checksum of the file to be uploaded to S3. Replace c:\S3\testfile with the location of the file you are uploading to S3:

fciv.exe c:\S3\testfile

Note
If the path to the file contains spaces, enclose the path with quotation marks (").

The value returned will be similar to the following value returned when calculating the MD5 checksum of the file C:\Windows\explorer.exe:

fciv C:\Windows\explorer.exe

//

// File Checksum Integrity Verifier version 2.05.

//

e1b0af69bfb6cbde9b53c55e4bf91992 c:\windows\explorer.exe

Important
The MD5 checksum returned by the FCIV utility is hexadecimal and must be converted to base64 before it can be used as a checksum value for uploading multipart messages to S3. If you use a hexadecimal MD5 checksum, you will receive the error message "The Content-MD5 you specified is invalid". There are several hexadecimal to base64 string decoders available on the Web; if you prefer, you can download script code such as the HexToBase64 script available at http://www.rlmueller.net/Programs/HexToBase64.txt. You can also find examples of hexadecimal to base64 functions for use with spreadsheets if you plan to make extensive use of multipart message uploading to S3. The base64 encoded equivalent of the hexadecimal value returned by the FCIV utility in the example is 4bCvab+2y96bU8VeS/kZkg==.

Using Linux – Linux natively provides the ability to calculate the base64 MD5 checksum of a file with the openssl command. To determine the base64 MD5 checksum for a file in Linux, run the following command from a Linux shell: 

openssl md5 -binary PATH/TO/FILE | base64

The value returned will be similar to the following value returned when retrieving the base64 MD5 checksum of the file /bin/bash: 

user@example:/home$ openssl md5 -binary /bin/bash | base64

+e9lnJtCrdoKwYqg9wlFwA==

Using Windows – You can easily split a file into parts from Windows with available freeware utilities such as HJ-Split for Windows. HJ-Split for Windows also provides a UI for calculating the hexadecimal MD5 checksum value for each file part. Because these checksum values are hexadecimal, you must convert them to base64 before using them to upload multipart messages.

Using Linux – Linux natively provides the ability to split a file with the split command, and it can also compute the base64 MD5 checksum. For more information about how to use the Linux split command, enter one or more of the following commands in the Linux command shell:

split --help - displays arguments for the split command

info split - displays general information about the split command

man split - displays the manual page for the split command

HJ-Split is also available for Linux if you prefer to use a GUI interface.

This table shows the information you need to gather to initiate the multipart message upload.

Item

Parameters and values

Target bucket

--bucket targetBucket

File name

--key testfile  --metadata md5=mvhFZXpr7J5u0ooXDoZ/4Q==

Part 1

--part-number 1  --body testfile.001 --content-md5 Vuoo2L6aAmjr+4sRXUwf0w==

Part 2

--part-number 2  --body testfile.002  --content-md5 317wIkbjGrgH2m9igCwa6A==

Using the information in the table, you can run the AWS CLI command aws s3api create-multipart-upload to retrieve the unique UploadId value that associates the original file with the file parts: 

aws s3api create-multipart-upload --bucket targetBucket --key testfile

--metadata md5=mvhFZXpr7J5u0ooXDoZ/4Q==

This command response contains the UploadId value that is required whenever a message part is uploaded:

{

    "Bucket": "targetBucket",

    "UploadId":"sDCDOJiTUVGeKAk3Ob7qMynRKqe3ROcavPRwg92eA6JPD4ybIGRxJx9R0VbgkrnOVphZFK59KCYJAO1PXlrBSW7vcH7ANHZwTTf0ovqe6XPYHwsSp7eTRnXB1qjx40Tk",

    "Key": "testfile"

}

In this example, the command to upload the first message part specifies the target bucket, the original file name, the first file part, the UploadId value, and the base64 MD5 checksum for the first file part:

aws s3api upload-part --bucket targetBucket --key testfile --part-number 1 --body testfile.001 --upload-id sDCDOJiTUVGeKAk3Ob7qMynRKqe3ROcavPRwg92eA6JPD4ybIGRxJx9R0VbgkrnOVphZFK59KCYJAO1PXlrBSW7vcH7ANHZwTTf0ovqe6XPYHwsSp7eTRnXB1qjx40Tk --content-md5 Vuoo2L6aAmjr+4sRXUwf0w==

This command response contains an ETag value for this part of the message. Save the ETag value to use later to complete the multipart message upload process:

{

      "ETag": "\"56ea28d8be9a0268ebfb8b115d4c1fd3\""

}

The command to upload the second message part is similar, with the differences being the number of the part (--part-number 2), the name of the second part (--body testfile.002), and a different MD5 checksum value:

aws s3api upload-part --bucket targetBucket --key testfile --part-number 2 --body testfile.002

--upload-id sDCDOJiTUVGeKAk3Ob7qMynRKqe3ROcavPRwg92eA6JPD4ybIGRxJx9R0VbgkrnOVphZFK59KCYJAO1PXlrBSW7vcH7ANHZwTTf0ovqe6XPYHwsSp7eTRnXB1qjx40Tk --content-md5 317wIkbjGrgH2m9igCwa6A==

The command to upload the second message part returns a different ETag value:

{

      "ETag": "\"df5ef02246e31ab807da6f62802c1ae8\""

}

Optionally, run the following command to list the parts that have been successfully uploaded:

aws s3api list-parts -–bucket targetBucket -–key testfile -–upload-id sDCDOJiTUVGeKAk3Ob7qMynRKqe3ROcavPRwg92eA6JPD4ybIGRxJx9R0VbgkrnOVphZFK59KCYJAO1PXlrBSW7vcH7ANHZwTTf0ovqe6XPYHwsSp7eTRnXB1qjx40Tk

In this example, the following output is returned. This describes the account owner, the initiator of the multipart message upload, the parts uploaded, and the storage class of the bucket containing the parts:

{

    "Owner": {

        "DisplayName": "multipartmessage",

        "ID": "290xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

    },

    "Initiator": {

        "DisplayName": "multipart",

        "ID": "arn:aws:iam::22xxxxxxxxxx:user/multipart"

    },

    "Parts": [

        {

            "LastModified": "2016-01-13T13:43:33.000Z",

            "PartNumber": 1,

            "ETag": "\"56ea28d8be9a0268ebfb8b115d4c1fd3\"",

            "Size": 79520768

        },

        {

            "LastModified": "2016-01-13T13:56:47.000Z",

            "PartNumber": 2,

            "ETag": "\"df5ef02246e31ab807da6f62802c1ae8\"",

            "Size": 79519704

        }

    ],

    "StorageClass": "STANDARD"

}

After the parts have been uploaded, S3 requires some additional information to recreate the original file. The first bit of information should be in the form of a JSON formatted file that contains the ETag values returned earlier when the message parts were uploaded. In this example file, the name of the file is fileparts and was saved to the same directory as the original file and constituent file parts:

# fileparts

{

    "Parts": [

    {

        "ETag": "56ea28d8be9a0268ebfb8b115d4c1fd3",

         "PartNumber":1

    },

    {

        "ETag": "df5ef02246e31ab807da6f62802c1ae8",

        "PartNumber":2

    }

    ]

}

The following command completes the multipart message upload process. This command determines each of a files parts by reading the JSON formatted file you created in Step 4 and attempts to piece the file back together as the original file into the specified S3 bucket. The –upload-id parameter is also specified here to uniquely identify each part of the multipart file.

aws s3api complete-multipart-upload --multipart-upload file://fileparts --bucket targetBucket

--key testfile --upload-id sDCDOJiTUVGeKAk3Ob7qMynRKqe3ROcavPRwg92eA6JPD4ybIGRxJx9R0VbgkrnOVphZFK59KCYJAO1PXlrBSW7vcH7ANHZwTTf0ovqe6XPYHwsSp7eTRnXB1qjx40Tk

If this final step is successful, output similar to the following is displayed:

{

    "ETag": "\"13115fdae01633ff0af167d925cad279-2\"",

    "Bucket": "targetBucket",

    "Location": "https://targetBucket.s3.amazonaws.com/testfile",

    "Key": "testfile"

}

You can run the following command to retrieve object header data from the uploaded file:  

aws s3api head-object --bucket targetBucket --key testfile

{

    "AcceptRanges": "bytes",

    "ContentType": "binary/octet-stream",

    "LastModified": "Wed, 13 Jan 2016 13:15:00 GMT",

    "ContentLength": 159040472,

    "ETag": "\"13115fdae01633ff0af167d925cad279-2\"",

    "Metadata": {

        "md5": "mvhFZXpr7J5u0ooXDoZ/4Q=="

    }

}

The object header data contains the base64 encoded MD5 checksum value of the file, which you can used to verify the integrity of the file if it is subsequently downloaded or moved elsewhere. The MD5 checksum value for the file should match the MD5 checksum of the file calculated before the file was split and uploaded as a multipart message.

If you encountered issues uploading one or more parts of a multipart message, you can try re-uploading the message part or parts as described in Step 4. If a message part is uploaded but becomes stranded, be sure to remove it to avoid accruing unnecessary storage charges. To list any incomplete multipart message uploads for your bucket, run the following command and substitute the name of your bucket for targetBucket:

aws s3api list-multipart-uploads --bucket targetBucket

The response lists any message parts that have not been processed.

{

    Uploads": [

        {

            "Initiator": {

                "DisplayName": "myaccount",

                "ID": "5b7xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx "

            },

            "Initiated": "2016-03-31T06:13:15.000Z",

            "UploadId": "MuQzVbEvQpHp7eHc_J5s9U.kzM3GAHeOJh1P8wVTmRqEVojwiwu3wPX6fWYzADNtOHklJI6W6Q9NJUYgjePKCVpbl_rDP6mGIr2AQJNKB_A-",

            "StorageClass": "STANDARD",

            "Key": "music.mp4",

            "Owner": {

                "DisplayName": " myaccount ",

                "ID": "5b7xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx "

            }

        }

    ]

}

You can use this information to remove unwanted message parts. For example, to abort the message part described in the example, run the following command:

aws s3api abort-multipart-upload --bucket targetBucket --key music.mp4 --upload-id MuQzVbEvQpHp7eHc_J5s9U.kzM3GAHeOJh1P8wVTmRqEVojwiwu3wPX6fWYzADNtOHklJI6W6Q9NJUYgjePKCVpbl_rDP6mGIr2AQJNKB_A-

Amazon S3, multipart message upload, aws s3api, AWS CLI, base64 MD5 checksum, multi-thread file upload


Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center.

Published: 2016-01-25