Manage tens to billions of objects at scale with S3 Batch Operations
S3 Batch Operations is an Amazon S3 data management feature that lets you manage billions of objects at scale with just a few clicks in the Amazon S3 Management Console or a single API request. With this feature, you can make changes to object metadata and properties, or perform other storage management tasks, such as copying or replicating objects between buckets, replacing object tag sets, modifying access controls, and restoring archived objects from S3 Glacier — instead of taking months to develop custom applications to perform these tasks.
S3 Batch Operations
S3 Batch Operations is a managed solution for performing storage actions like copying and tagging objects at scale, whether for one-time tasks or for recurring, batch workloads. S3 Batch Operations can perform actions across billions of objects and petabytes of data with a single request. To perform work in S3 Batch Operations, you create a job. The job consists of the list of objects, the action to perform, and the set of parameters you specify for that type of operation. You can create and run multiple jobs at a time in S3 Batch Operations or use job priorities as needed to define the precedence of each job and ensures the most critical work happens first. S3 Batch Operations also manages retries, tracks progress, sends completion notifications, generates reports, and delivers events to AWS CloudTrail for all changes made and tasks executed.
S3 Batch Operations complements any event-driven architecture you may be operating today. For new objects, using S3 events and Lambda functions is great for converting file types, creating thumbnails, performing data scans, and carrying out other operations. For example, customers use S3 events and Lambda functions to create smaller sized, low resolution versions of raw photographs when images are first uploaded to S3. S3 Batch Operations complements these existing event-driven workflows by providing a simple mechanism for performing the same actions across your existing objects as well.
How it works: S3 Batch Operations
To perform work in S3 Batch Operations, you create a job. The job consists of the list of objects, the action to perform, and the set of parameters you specify for that type of operation. You can create and run multiple jobs at a time in S3 Batch Operations or use job priorities as needed to define the precedence of each job and ensures the most critical work happens first. S3 Batch Operations also manages retries, tracks progress, sends completion notifications, generates reports, and delivers events to AWS CloudTrail for all changes made and tasks executed.
S3 Batch Operations tutorial
Teespring was founded in 2011 and enables users to create and sell custom on-demand products online. As every piece of custom merchandise requires multiple assets inside Teespring, they store petabytes of data in Amazon S3.
"Amazon S3 Batch Operations helped us optimize our storage by utilizing Amazon S3’s Glacier storage class. We used our own storage metadata to create batches of objects that we could move to Amazon S3 Glacier. With Amazon S3 Glacier we saved more than 80% of our storage costs. We are always looking for opportunities to automate storage management, and with S3 Batch Operations, we can manage millions of objects in minutes.”
James Brady, VP of Engineering - Teespring
Capital One is a bank founded at the intersection of finance and technology and one of America’s most recognized brands.
Capital One used Amazon S3 Batch Operations to copy data between two AWS regions to increase their data’s redundancy and to standardize their data footprint between those two locations.
"With Amazon S3 Batch Operations we created a job to copy millions of objects in hours, work that had traditionally taken months to complete. We used Amazon S3’s inventory report, which gave a list of objects in our bucket, as the input to our Amazon S3 Batch Operations job. Amazon S3 was instrumental in copying the data, providing progress updates, and delivering an audit report when the job was complete. Having this feature saved our team weeks of manual effort and turned this large-scale data transfer into something routine.”
Franz Zemen, Vice President, Software Engineering - Capital One
ePlus, an AWS Advanced Consulting Partner, works with customers to optimize their IT environments and uses solutions like, S3 Batch Operations, to save clients time and money.
"S3 Batch Operations is simply amazing. Not only did it help one of our clients reduce the time, complexity, and painstaking chore of having to pull together a wide selection of S3 operations, scheduling jobs, then rendering the information in a simple to use dashboard, it also helped tackle some daunting use cases I don't think we would have been able to tackle in the fraction of the time it took S3 Batch Operations to do.
For example, S3 Batch Operations made quick work of copying over 2 million objects across regions within the same account while keeping the metadata intact. The solution worked seamlessly performing similar tasks across accounts, and most notably, generated a completion report that automatically sifted and separated successful versus failed operations amongst 400 million objects allowing for simpler handling of the failed operations in a single file.”
David Lin, Senior Solutions Architect & AWS Certified Professional - ePlus
S3 Batch Operations blog posts
AWS News Blog
Amazon S3 Batch Operations can be used to easily process hundreds, millions, or billions of S3 objects in a simple and straightforward fashion. You can copy objects to another bucket, set tags or access control lists (ACLs), initiate a restore from S3 Glacier, or invoke an AWS Lambda function on each one.
AWS Storage Blog
This post demonstrates how to create list of objects, filter to only include unencrypted objects, set up permissions, and perform an S3 Batch Operations job to encrypt your objects. Encrypting existing objects is one of the many ways that you can use S3 Batch Operations to manage your Amazon S3 objects.