Check the Integrity of Data in Amazon S3 with Additional Checksums

TUTORIAL

Overview

Organizations are constantly creating and migrating digital assets to Amazon S3. These assets include images, binary files, post-production renders, and more, all of which are business-critical. As assets are migrated and used across workflows, you want to make sure the files are not altered by network corruption, hard drive failure, or other unintentional issues. Today, the industry uses algorithms to scan a file byte by byte to generate a unique fingerprint for it, known as a checksum.
 
With checksums, you can verify that assets are not altered when copied. Performing a checksum consists of using an algorithm to iterate sequentially over every byte in a file.

Amazon S3 offers multiple checksum options to accelerate integrity checking of data. These capabilities calculate a file’s checksum when a customer uploads an object. Customers migrating large volumes of data to Amazon S3 want to perform these integrity checks as a durability best practice, and to confirm that every byte is transferred without alteration. This allows customers to maintain end-to-end data integrity. The checksum is created the moment the object is uploaded, and it is preserved throughout the lifespan of the object. The same checksum is validated at the end when the object is downloaded, to offer end-to-end data integrity. The additional algorithms supported by Amazon S3 are: SHA-1, SHA-256, CRC32, and CRC32-C. With these new data integrity checking features, you can verify that your files were not altered during data transfer or during the upload or download.

What you will accomplish

  • Upload a file to Amazon S3
  • Compare the checksum on Amazon S3 and your local file to verify data integrity

Prerequisites

 AWS experience

Beginner

 Time to complete

20 minutes

 Cost to complete

Less than $1 (Amazon S3 pricing page)

 Requires

AWS Account

 Services used

 Last updated

August 15, 2022

Implementation

Step 1: Create an Amazon S3 bucket

  • 1.1 — Sign in to the Amazon S3 console
    • If you have not already done so, create an AWS account.
    • Log into the AWS Management Console using your account information.
    • From the AWS console services search bar, enter S3. Under the services search results section, select S3. You may notice an option for S3 Glacier. This option is for the Glacier service prior to integration with Amazon S3. We recommend Amazon S3 Glacier users use the Amazon S3 console for an enhanced user experience.

  • 1.2 — Create an S3 bucket
    • Choose Buckets from the Amazon S3 menu on the left and then choose the Create bucket button.
  • 1.3
    • Enter a descriptive globally unique name for your bucket. Select which AWS Region you would like your bucket created in. The default Block Public Access setting is appropriate for this workload, so leave this section as is.
    • You can leave the remaining options as defaults, navigate to the bottom of the page, and choose Create bucket.

Step 2: Upload a file and specify the checksum algorithm

Now that your bucket is created and configured, you are ready to upload a file and have the checksum calculated by Amazon S3.
  • 2.1 — Upload an object
    • If you have logged out of your AWS Management Console session, log back in. Navigate to the S3 console and select the Buckets menu option. From the list of available buckets, select the bucket name of the bucket you just created.
  • 2.2
    • Next, select the Objects tab. Then, from within the Objects section, choose the Upload button.
  • 2.3 — Add files
    • Choose the Add files button and then select the file you would like to upload from your file browser.
  • 2.4 — Expand properties
    • Navigate down the page to find the Properties section. Then, select Properties and expand the section.
  • 2.5 — Select additional checksums
    • Under Additional checksums select the On option and choose SHA-256.

If your object is less than 16 MB and you have already calculated the SHA-256 checksum (base64 encoded), you can provide it in the Precalculated value input box. To use this functionality for objects larger than 16 MB, you can use the CLI or SDK. When Amazon S3 receives the object, it calculates the checksum by using the algorithm specified. If the checksum values do not match, Amazon S3 generates an error and rejects the upload, as shown in the screenshot.

  • 2.6 — Upload
    • Navigate down the page and choose the Upload button.
  • 2.7
    • After your upload completes, choose the Close button.

Step 3: Verify checksum

  • 3.1
    • Select the uploaded file by selecting the filename. This will take you to the Properties page.
  • 3.2 — Locate the checksum value
    • Navigate down the properties page and you will find the Additional checksums section.
    • This section displays the base64 encoded checksum that Amazon S3 calculated and verified at the time of upload.
  • 3.3 — Compare
    • To compare the object in your local computer, open a terminal window and navigate to where your file is.
    • Use a utility like shasum to calculate the file. The following command performs a sha256 calculation on the same file and converts the hex output to base64: shasum -a 256 image.jpg | cut -f1 -d\ | xxd -r -p | base64
    • When comparing this value, it should match the value in the Amazon S3 console.

Step 4: Clean up

In the following steps, you clean up the resources you created in this tutorial. It is a best practice to delete resources that you are no longer using so that you do not incur unintended charges.
  • 4.1 — Delete test object
    • If you have logged out of your AWS Management Console session, log back in. Navigate to the S3 console and select the Buckets menu option. First you will need to delete the test object from your test bucket. Select the name of the bucket you have been working with for this tutorial. Put a check mark in the checkbox to the left of your test object name, then choose the Delete button. On the Delete objects page, verify that you have selected the proper object to delete and enter permanently delete into the Permanently delete objects confirmation box. Then, choose the Delete object button to continue. Next, you will be presented with a banner indicating if the deletion has been successful.
  • 4.2 — Delete test bucket
    • Finally, you need to delete the test bucket you have created. Return to the list of buckets in your account. Select the radio button to the left of the bucket you created for this tutorial, and then choose the Delete button. Review the warning message. If you desire to continue deletion of this bucket, enter the bucket name into the Delete bucket confirmation box, and choose Delete bucket.

Conclusion

Congratulations! You have learned how to upload a file to Amazon S3, calculate additional checksums, and compare the checksum on Amazon S3 and your local file to verify data integrity.

Was this page helpful?

Next steps

To learn more about checksums, visit the following resources.

Explore additional checksums

To learn more about additional checksums in Amazon S3, read the launch blog and visit the Checking object integrity documentation.

Discover trailing checksums

Amazon S3 also introduced trailing checksums, a new feature for the AWS SDK. Using trailing checksums, the SDK will calculate the checksum in a single pass as it uploads your file to Amazon S3. Read more about it in the Building scalable checksums blog post.