AWS Developer Tools Blog

Announcing Amazon S3 checksums support in the AWS SDK for Kotlin

We are excited to announce support for Amazon S3 checksums in the AWS SDK for Kotlin!

A checksum is a unique fingerprint calculated from a set of data that can be used to check whether the data has been altered or corrupted during transfers. Configuring a checksum is a valuable precaution to help maintain the reliability of your data.

Amazon S3 is an object storage service that enables you to specify a checksum algorithm or value to be used when uploading objects. With Amazon S3 checksums, the checksum is automatically calculated and validated during uploads and downloads. This post provides an introduction to using this feature with the AWS SDK for Kotlin.

There are currently four supported checksum algorithms: SHA-1, SHA-256, CRC-32, and CRC-32C. The AWS SDK for Kotlin can calculate a checksum automatically when transferring objects. If you already know the checksum, supplying the predetermined value can help reduce computation in your client at the time of transfer.

You can use Amazon S3 checksums with the AWS SDK for Kotlin when uploading and downloading an S3 object.

Uploading a Single Object

Objects are uploaded using S3’s PutObject API. The request datatype provides a way for you to enable checksum computation. A request to upload an object with a SHA-256 checksum might look like this:

val request = PutObjectRequest {
    bucket = "bucket"
    key = "key"
    body = ByteStream.fromFile(File("body"))
    checksumAlgorithm = ChecksumAlgorithm.SHA256
}

When this request is sent, a SHA-256 checksum will be automatically calculated and applied to the object being uploaded.

Using a predetermined checksum value

Providing a predetermined checksum value disables automatic calculation by the SDK and uses the provided value instead. If the checksum you provide is incorrect, the upload will fail with an exception.

The following example shows how to define a checksum value on a PutObject request:

val request = PutObjectRequest {
    bucket = "bucket"
    key = "key"
    body = ByteStream.fromFile(File("body"))
    checksumAlgorithm = ChecksumAlgorithm.SHA256
    checksumSha256 = "cfb6d06da6e6f51c22ae3e549e33959dbb754db75a93665b8b579605464ce299"
}

Note that both checksumAlgorithm and checksumSha256 must be specified in the request.

Be sure the algorithm specified by checksumAlgorithm matches the checksum property containing the predetermined value. Otherwise, the provided checksum will be ignored and a new one will be calculated.

Uploading a Multipart Object

S3 offers multipart uploads which allows you to split your object’s upload across multiple requests.

You can also apply checksums to these multipart uploads. To do this, the checksum algorithm must be specified in both the CreateMultipartUpload request and each UploadPart request. Additionally, the CompleteMultipartUpload request must have each part’s checksum specified. The following is an example of what these requests might look like.

val multipartUpload = s3.createMultipartUpload {
    bucket = "bucket"
    key = "key"
    checksumAlgorithm = ChecksumAlgorithm.Sha1
}

val partFilesToUpload = listOf("data-part1.csv", "data-part2.csv", "data-part3.csv")

val completedParts = partFilesToUpload
    .mapIndexed { i, fileName ->
        val uploadPartResponse = s3.uploadPart {
            bucket = "bucket"
            key = "key"
            body = ByteStream.fromFile(File(fileName))
            uploadId = multipartUpload.uploadId
            partNumber = i + 1 // Part numbers begin at 1
            checksumAlgorithm = ChecksumAlgorithm.Sha1
        }

        CompletedPart {
            eTag = uploadPartResponse.eTag
            partNumber = i + 1
            checksumSha1 = uploadPartResponse.checksumSha1
        }
    }

s3.completeMultipartUpload {
    uploadId = multipartUpload.uploadId
    bucket = "bucket"
    key = "key"
    multipartUpload {
        parts = completedParts
    }
}

Downloading an Object

S3 objects are downloaded using the GetObject API. You can opt-in to checksum validation while downloading the object by setting the checksumMode property in the request to ChecksumMode.Enabled.

Note that no validation will occur if the object was uploaded without a checksum. Checksums can be set during upload by using the AWS Console or any AWS SDK.

There may be multiple checksums associated with the object in S3, but only one checksum will be validated. Which checksum is validated is determined by the following priority list: CRC-32C, CRC-32, SHA-1, SHA-256. For example, if a response contains both CRC-32 and SHA-256 checksums, only the CRC-32 checksum will be validated.

val request = GetObjectRequest {
    bucket = "bucket"
    key = "key"
    checksumMode = ChecksumMode.Enabled
}

When this request is sent, the AWS SDK for Kotlin will automatically compute the checksum and validate it against the value returned by S3.

Validating the response

The checksum is validated as you consume the response’s body because the AWS SDK for Kotlin uses streaming responses when downloading objects from S3. This means you must consume the object in order for the checksum to be validated. A ChecksumMismatchException will be thrown if the checksum is invalid. The following example shows how to validate a checksum by fully consuming the response.

val request = GetObjectRequest {
    bucket = "bucket"
    key = "key"
    checksumMode = checksumMode.Enabled
}

s3.getObject(request) {
    println(it.body?.decodeToString())
    // Checksum is valid!
}

The response is fully consumed in this example because decodeToString() will read the entire body into a string. Other possible options that fully consume a response include writeToFile(), toByteArray(), or fully exhausting a reader returned by readFrom().

In the following example, the checksum will not be validated, because the response is not used in any way.

s3.getObject(request) {
    println("Got the object!")
}

Conclusion

In this blog post, you learned how to begin using Amazon S3 checksums with the AWS SDK for Kotlin. To learn more about how to use this feature and other features of the AWS SDK for Kotlin, visit our Developer Guide and API Reference. If you’re curious about how the feature is implemented, check out its design on GitHub. If you have questions or come across any issues, please open an issue on our GitHub repository.

We’re eager to hear your thoughts about the SDK and this new feature in our developer survey!

Matas Lauzadis

Matas Lauzadis

Matas is a Software Development Engineer working on the AWS SDK for Kotlin. He’s excited about building tools that enhance developer experience. Find him on GitHub @lauzadis.