Discovering and Deleting Incomplete Multipart Uploads to Lower Amazon S3 Costs
This blog post is contributed by Steven Dolan, Senior Enterprise Support TAM
Amazon S3’s multipart upload feature allows you to upload a single object to an S3 bucket as a set of parts, providing benefits such as improved throughput and quick recovery from network issues. In general, when your object size reaches 100 MB, you should consider using multipart upload instead of uploading the object in a single operation. You can use multipart upload for objects from 5 MB to 5 TB in size.
Not sure if you are making use of multipart upload? Keep in mind that many tools and applications use multipart upload by default based on the size of the file you are uploading. Some examples include the AWS Management Console, the AWS S3 CP Commands in AWS Command Line Interface (CLI), and Amazon S3 Replication.
A multipart upload is a three-step process:
- The upload is initiated
- The object parts are uploaded
- Once all of the object parts are uploaded, the multipart upload completes (Amazon S3 constructs the object from the uploaded parts, and you can then access the object)
But what happens to a multipart upload that doesn’t complete?
If the complete multipart upload request isn’t sent successfully, Amazon S3 will not assemble the parts and will not create any object. The parts remain in your Amazon S3 account until the multipart upload completes or is aborted, and you pay for the parts that are stored in Amazon S3. These parts are charged according to the storage class specified when the parts were uploaded. An exception to this are parts uploaded using Amazon S3 Storage Class, S3 Glacier or S3 Glacier Deep Archive. In-progress multipart parts for a PUT to the S3 Glacier storage class are billed as S3 Glacier Staging Storage at S3 Standard storage rates until the upload completes. Similarly, in-progress multipart parts for a PUT to the S3 Glacier Deep Archive storage class are billed as S3 Glacier Deep Archive Staging Storage at S3 Standard storage rates until the upload completes.
While it is possible to manually list and abort incomplete multipart uploads in your S3 buckets, this can quickly become a cumbersome task as the number of uploads, buckets, and accounts within your organization increase. Also note that you aren’t able to view the parts of your incomplete multipart upload in the AWS Management Console.
In this blog post, our Senior Enterprise Support TAM, Steven Dolan will show you how to save on S3 costs by easily discovering and aborting incomplete multipart uploads in your S3 buckets, freeing the storage consumed by any previously uploaded parts.
Discover Incomplete Multipart Uploads Using S3 Storage Lens
Amazon S3 Storage Lens provides visibility into storage usage and activity trends at the organization or account level, with drill-downs by Region, storage class, bucket, and prefix.
We will use S3 Storage Lens to discover our AWS accounts and S3 buckets that contain multipart uploads. We will also be able to see how much data exists as a result of these incomplete multipart uploads. For information on how to set up S3 Storage Lens, click here. If you are setting up a new S3 Storage Lens dashboard or accessing you default dashboard for the first time, be mindful that it can take up to 48 hours to generate your initial metrics.
S3 Storage Lens provides four Cost Efficiency metrics for analyzing incomplete multipart uploads in your S3 buckets. These metrics are free of charge and automatically configured for all S3 Storage Lens dashboards.
- Incomplete Multipart Upload Storage Bytes – The total bytes in scope with incomplete multipart uploads
- % Incomplete MPU Bytes – The percentage of bytes in scope that are results of incomplete multipart uploads
- Incomplete Multipart Upload Object Count – The number of objects in scope that are incomplete multipart uploads
- % Incomplete MPU Objects – The percentage of objects in scope that are incomplete multipart uploads
For this example, Steven created an organizational dashboard in S3 Storage Lens to collect storage metrics and usage data for all AWS accounts that are part of our AWS Organizations hierarchy. You can use your default dashboard, but keep in mind that you will only be able to view storage metrics for the account in which you are logged in.
We access Storage Lens from the Amazon S3 console and select the organizational dashboard.
Now that we are in the dashboard, we scroll down to the Top N overview section. This lets us see the metrics across the top N AWS accounts because we enabled S3 Storage Lens to work with AWS Organizations. Again, if you are using the default dashboard (instead of an organizational dashboard), you’ll only see data from the AWS account in which you are logged in.
For Metric, we choose Incomplete MPU Bytes. This shows us the total bytes in scope with incomplete multipart upload.
Now we can view which of our AWS accounts, regions, and buckets contain incomplete multipart uploads. We can also see the amount of data for which we’re charged. At first glance the amount of incomplete multipart upload bytes you discover may not seem very significant. However, remember that you are charged for this data until the multipart upload is complete or aborted and these charges will surely accumulate over time.
You can also use the filters in S3 Storage Lens (at the top of your dashboard) to narrow down the results within an account or Region to help in targeting all of the buckets that contain incomplete multipart uploads.
Abort Incomplete Multipart Uploads Using S3 Lifecycle
Thanks to S3 Storage Lens, we now know which of our AWS accounts and S3 buckets contain incomplete multipart uploads.
Next we’ll configure a lifecycle rule for one of our S3 buckets to automatically abort 7-day old incomplete multipart uploads, which also deletes the in-progress multipart parts. Here’s how to set up the lifecycle rule using the AWS Management Console.
After selecting the bucket in the S3 console, we select the Management tab. Here we select Create lifecycle rule.
We give the lifecycle rule a name and choose to apply the rule to all objects in the bucket. Don’t forget to select the checkbox to acknowledge that you are applying this rule to all objects in the bucket.
Next, we select the checkbox to Delete expired delete markers or incomplete multipart uploads under the Lifecycle rule actions section. We’re now given the option to delete incomplete multipart uploads and must specify how many days after the start of a multipart upload the cleanup should occur. Remember that this is the number of days after the multipart upload initiated. We recommend 7 days as a good starting point.
For information on expired delete markers, check Jeff Barr’s blog post.
Finally, we select Create rule. The lifecycle rule is now in place and ready to run. We can now see the enabled rule under our lifecycle rules.
Important Note: Lifecycle rules run once a day at midnight Universal Coordinated Time (UTC) and new lifecycle rules can take up to 48 hours to complete the first run.
If we wanted to create and apply this same lifecycle rule using the AWS CLI, we first create the following JSON file. Here we named our file lifecycle.json.
We then run the following command to apply the rule to the S3 bucket:
aws s3api put-bucket-lifecycle-configuration –bucket BUCKETNAME –lifecycle-configuration file://lifecycle.json
Be sure to replace BUCKETNAME with your own bucket name
Finally, here’s how we would configure an S3 bucket with this rule using an AWS CloudFormation template (YAML):
- Id: delete-incomplete-mpu-7days
Be sure to replace BUCKETNAME with your own bucket name
An easy way to verify that your lifecycle rule is working (after giving the rule adequate time to run) is to check the Trends and Distributions section in your Storage Lens dashboard. We have selected the Incomplete Multipart Upload Storage Bytes and Incomplete Multipart Upload Object Count metrics. Here we can clearly see that our lifecycle rule is working properly, as there is a significant drop in both of these metrics since enabling the rule. Keep in mind that Storage Lens is updated daily, so you may need to give Storage Lens some time to reflect the cleanup.
Click here for more tips on how to verify that cleanup is taking place.
Note: Deleted in-progress multipart parts uploaded as S3 Glacier will not be subject to an S3 Glacier early delete fee.
In this post, we explained how to cut S3 costs by identifying and automatically aborting incomplete multipart uploads, preventing you from paying for unused S3 storage. Steven Dolan used Storage Lens to discover the AWS accounts and S3 buckets that contained incomplete multipart uploads and used S3 Lifecycle to automatically abort incomplete multipart uploads in one of the S3 buckets. Remember that applying the demonstrated lifecycle rule will not delete any successfully uploaded and assembled objects in your S3 bucket.
If you would like information on how to automate applying lifecycle rules and policies to your S3 buckets, click here.