I want to copy a large file (more than 100 MB) to an Amazon Simple Storage Service (Amazon S3) bucket in multiple parts. I also want to confirm the integrity of the file after the file parts are uploaded and linked back together into the original file. How can I do this using the AWS Command Line Interface (CLI)?

Uploading a file to Amazon S3 in multiple parts can improve upload speeds because you use multiple threads to upload file parts in parallel. Another benefit is that if one or more parts fail to upload, you only need to re-upload those failed parts and not the entire file.

You verify the integrity of a multipart message upload when you initiate the upload by sending the base64 MD5 checksum values for the original file as well as each of the file parts to Amazon S3. These checksum values can be used to verify the integrity of the file or any file part. These are the steps:

  1. Obtain the base64 MD5 checksum of the file to be uploaded.
  2. Split the file into multiple parts and obtain the base64 MD5 checksum of each file part.
  3. Initiate a multipart message upload to Amazon S3, and then receive a response with a unique UploadId value. This value associates the original file with each of the file parts.
  4. Upload the file parts to Amazon S3, specifying the UploadId and the base64 MD5 checksum for each file part. Amazon S3 returns a distinct ETag value for each file part.
  5. After all parts are uploaded, run a command to complete the multipart upload process. This command includes several details such as the destination bucket, a list of the file parts, and other information. Amazon S3 uses this information to recreate and verify the integrity of the original file.

Complete the following steps using the AWS CLI to perform a multipart message upload of a file to an Amazon S3 bucket.

Note: The AWS CLI is accessible from several different command processors running on multiple operating systems. Command processors often have different syntax rules for accepting parameter values. For example, some command processors process double-quoted text when others only accept single-quoted text. The sample input and output in this document was created running the AWS CLI from a Windows command prompt. Linux, MacOS, and Windows PowerShell use different syntax. For more information, see Specifying Parameter Values for AWS CLI.

Step 1: Obtain the base64 MD5 checksum of the file to be uploaded

Using Windows - Download the File Checksum Integrity Verifier (FCIV) utility and extract the contents to a folder. Add the location of the folder to the Windows system path by running this command from an elevated privilege (that is, run as Administrator) command prompt, replacing c:\fciv with the folder that contains the extracted the FCIV utility files:

C:>set path=%path%;c:\\fciv

Note: When you modify the Windows system path from a command prompt, the change does not persist when Windows is restarted. To modify the Windows system path environment variable permanently, check the Windows documentation or search the internet for "Change Windows X path variable," substituting your version of Windows for X.

After installing the FCIV utility and updating the %path% environment variable with the location of the extracted FCIV utility files, run this command to return the hexadecimal MD5 checksum of the file to be uploaded to Amazon S3. Replace c:\S3\testfile with the location of the file you are uploading to Amazon S3:

fciv.exe c:\\S3\\testfile

Note: If the path to the file contains spaces, then enclose the path with quotation (") marks.

The value returned is similar to the following value when calculating the MD5 checksum of the file C:\\Windows\\explorer.exe:

fciv C:\\Windows\\explorer.exe
//
// File Checksum Integrity Verifier version 2.05.
//
e1b0af69bfb6cbde9b53c55e4bf91992 c:\\windows\\explorer.exe

Important: The MD5 checksum returned by the FCIV utility is hexadecimal and must be converted to base64 before it can be used as a checksum value for uploading multipart messages to Amazon S3. If you use a hexadecimal MD5 checksum, you receive the error message "The Content-MD5 you specified is invalid." There are several hexadecimal-to-base64 string decoders available on the internet. You can also download script code such as the HexToBase64 VB script. You can find examples of hexadecimal-to-base64 functions for use with spreadsheets if you plan to make extensive use of multipart message uploading to Amazon S3. The base64 encoded equivalent of the hexadecimal value returned by the FCIV utility in this example is 4bCvab+2y96bU8VeS/kZkg==.

Using Linux - Linux lets you calculate the base64 MD5 checksum of a file with the openssl command. To determine the base64 MD5 checksum for a file in Linux, run this command from a Linux shell:

openssl md5 -binary PATH/TO/FILE |base64 

The value returned is similar to the following value when retrieving the base64 MD5 checksum of the file /bin/bash:

user@example:/home$ openssl md5 -binary /bin/bash |base64
+e9lnJtCrdoKwYqg9wlFwA==

Using Windows - You can split a file into parts from Windows with available freeware utilities such as HJ-Split for Windows. HJ-Split for Windows also provides a UI for calculating the hexadecimal MD5 checksum value for each file part. Because these checksum values are hexadecimal, you must convert them to base64 before using them to upload multipart messages.

Using Linux - Linux natively provides the ability to split a file with the split command, and it also can compute the base64 MD5 checksum. For more information about how to use the Linux split command, enter one or more of these commands in the Linux command shell:

split --help - displays arguments for the split command
info split - displays general information about the split command
man split - displays the manual page for the split command

HJ-Split is also available for Linux if you want to use a GUI interface.

This table shows the information required to initiate the multipart message upload.

Item

Parameters and values

Target bucket

--bucket targetBucket

File name

--key testfile  --metadata md5=mvhFZXpr7J5u0ooXDoZ/4Q==

Part 1

--part-number 1  --body testfile.001 --content-md5 Vuoo2L6aAmjr+4sRXUwf0w==

Part 2

--part-number 2  --body testfile.002  --content-md5 317wIkbjGrgH2m9igCwa6A==

Using the information in the table, you can run the AWS CLI command aws s3api create-multipart-upload to retrieve the unique UploadId value that associates the original file with the file parts: 

aws s3api create-multipart-upload --bucket multirecv --key testfile
--metadata md5= mvhFZXpr7J5u0ooXDoZ/4Q==

This command response contains the UploadId value that is required whenever a message part is uploaded:

}    "Key": "testfile"
    "Bucket": "multirecv",    
    "UploadId":"sDCDOJiTUVGeKAk3Ob7qMynRKqe3ROcavPRwg92eA6JPD4ybIGRxJx9R0VbgkrnOVphZFK59KCYJAO1PXlrBSW7vcH7ANHZwTTf0ovqe6XPYHwsSp7eTRnXB1qjx40Tk",
{

In this example, the command to upload the first message part specifies the target bucket, the original file name, the first file part, the UploadId value, and the base64 MD5 checksum for the first file part: 

aws s3api upload-part --bucket multirecv --key testfile --part-number 1 --body testfile.001 --upload-id sDCDOJiTUVGeKAk3Ob7qMynRKqe3ROcavPRwg92eA6JPD4ybIGRxJx9R0VbgkrnOVphZFK59KCYJAO1PXlrBSW7vcH7ANHZwTTf0ovqe6XPYHwsSp7eTRnXB1qjx40Tk --content-md5 Vuoo2L6aAmjr+4sRXUwf0w==

This command response contains an ETag value for this part of the message. Save the ETag value to use later to complete the multipart message upload process:

{
    "ETag": "\\"56ea28d8be9a0268ebfb8b115d4c1fd3\\""
}

The command to upload the second message part is similar, with the differences being the number of the part (--part-number 2), the name of the second part (--body testfile.002), and a different MD5 checksum value: 

aws s3api upload-part --bucket multirecv --key testfile --part-number 2 --body testfile.002 --upload-id sDCDOJiTUVGeKAk3Ob7qMynRKqe3ROcavPRwg92eA6JPD4ybIGRxJx9R0VbgkrnOVphZFK59KCYJAO1PXlrBSW7vcH7ANHZwTTf0ovqe6XPYHwsSp7eTRnXB1qjx40Tk --content-md5 317wIkbjGrgH2m9igCwa6A==

The command to upload the second message part returns a different ETag value: 

{
    "ETag": "\\"df5ef02246e31ab807da6f62802c1ae8\\""
}

Optionally, run this command to list the parts that are successfully uploaded:

aws s3api list-parts --bucket multirecv --key testfile --upload-id sDCDOJiTUVGeKAk3Ob7qMynRKqe3ROcavPRwg92eA6JPD4ybIGRxJx9R0VbgkrnOVphZFK59KCYJAO1PXlrBSW7vcH7ANHZwTTf0ovqe6XPYHwsSp7eTRnXB1qjx40Tk

In this example, the following output is returned. This describes the account owner, the initiator of the multipart message upload, the parts uploaded, and the storage class of the bucket containing the parts:

{
    "Owner": {
        "DisplayName": "multipartmessage",
        "ID": "290xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    },
    "Initiator": {
        "DisplayName": "multipart",
        "ID": "arn:aws:iam::22xxxxxxxxxx:user/multipart"
    },
    "Parts": [
        {
            "LastModified": "2016-01-13T13:43:33.000Z",
            "PartNumber": 1,
            "ETag": "\\"56ea28d8be9a0268ebfb8b115d4c1fd3\\"",
            "Size": 79520768
        },
        {
            "LastModified": "2016-01-13T13:56:47.000Z",
            "PartNumber": 2,
            "ETag": "\\"df5ef02246e31ab807da6f62802c1ae8\\"",
            "Size": 79519704
        }
    ],
    "StorageClass": "STANDARD"
}

After the parts are uploaded, Amazon S3 requires some additional information to recreate the original file. The first bit of information should be in the form of a JSON formatted file that contains the ETag values returned earlier when the message parts were uploaded. In this example, the name of the file is fileparts and is saved to the same directory as the original file and constituent file parts:

# fileparts
{
    "Parts": [
    {
        "ETag": "56ea28d8be9a0268ebfb8b115d4c1fd3",
        "PartNumber":1
    },
    {
        "ETag": "df5ef02246e31ab807da6f62802c1ae8",
        "PartNumber":2
    }
    ]
}

This completes the multipart message upload process. This command determines each of a file's parts by reading the JSON formatted file you created in step 4, and attempts to piece the file back together as the original file into the specified S3 bucket. The –upload-id parameter is also specified to uniquely identify each part of the multipart file.

aws s3api complete-multipart-upload --multipart-upload file://fileparts --bucket multirecv --key testfile --upload-id sDCDOJiTUVGeKAk3Ob7qMynRKqe3ROcavPRwg92eA6JPD4ybIGRxJx9R0VbgkrnOVphZFK59KCYJAO1PXlrBSW7vcH7ANHZwTTf0ovqe6XPYHwsSp7eTRnXB1qjx40Tk

If this final step is successful, then output similar to the following appears:

{
    "ETag": "\\"13115fdae01633ff0af167d925cad279-2\\"",
    "Bucket": "multirecv",
    "Location": "https://multirecv.s3.amazonaws.com/testfile",
    "Key": "testfile"
}

Run this command to retrieve object header data from the uploaded file:

aws s3api head-object --bucket multirecv --key testfile
{
    "AcceptRanges": "bytes",
    "ContentType": "binary/octet-stream",
    "LastModified": "Wed, 13 Jan 2016 13:15:00 GMT",
    "ContentLength": 159040472,
    "ETag": "\\"13115fdae01633ff0af167d925cad279-2\\"",
    "Metadata": {
        "md5": "mvhFZXpr7J5u0ooXDoZ/4Q=="
    }
}

If you encounter issues uploading one or more parts of a multipart message, you can try re-uploading the message part or parts as described in step 4. If a message part is uploaded but becomes stranded, be sure to remove it to avoid accruing unnecessary storage charges. To list incomplete multipart message uploads for your bucket, run this command and substitute the name of your bucket for targetBucket:

aws s3api list-multipart-uploads --bucket multirecv

The response lists any message parts that have not been processed.

{
    "Uploads": [
        {
            "Initiator": {
                "DisplayName": "myaccount",
                "ID": "5b7xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx "
            },
            "Initiated": "2016-03-31T06:13:15.000Z",
            "UploadId": "MuQzVbEvQpHp7eHc_J5s9U.kzM3GAHeOJh1P8wVTmRqEVojwiwu3wPX6fWYzADNtOHklJI6W6Q9NJUYgjePKCVpbl_rDP6mGIr2AQJNKB_A-",
            "StorageClass": "STANDARD",
            "Key": "music.mp4",
            "Owner": {
                "DisplayName": " myaccount ",
                "ID": "5b7xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx "
            }
        }
   ]
}

You can use this information to remove unwanted message parts. For example, to abort the message part described in the example, run the following command:

aws s3api abort-multipart-upload --bucket multirecv --key music.mp4 --upload-id MuQzVbEvQpHp7eHc_J5s9U.kzM3GAHeOJh1P8wVTmRqEVojwiwu3wPX6fWYzADNtOHklJI6W6Q9NJUYgjePKCVpbl_rDP6mGIr2AQJNKB

Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center.

Published: 2016-01-25

Updated: 2018-02-22