How Amazon Photos uses Amazon S3 Intelligent-Tiering to significantly reduce storage costs
Amazon Photos provides unlimited photo storage and 5 GB of video storage to Amazon Prime members in eight marketplaces world-wide. Customers backup, relive, and share memories on Amazon Photos’ mobile, web, and desktop apps, and relive these memories on Amazon smart screen devices like Amazon Echo Show and Amazon Fire TV. Customers in the US can also print their favorite photos as prints, greeting cards, calendars, gifts, and photo books.
Due to the number of ways we provide customers access to their memories and the volume of customer content we maintain, how we store these photos and videos securely at virtually unlimited scale is a top priority. Amazon Photos uses Amazon S3 to achieve this mission, storing hundreds of petabytes of content and billions of images and videos in the Amazon S3 Intelligent-Tiering storage class.
In this blog post, we discuss how we used S3 analytics Storage Class Analysis to help us optimize costs by choosing the right storage class for our business. We also talk through how we used S3 Object tags to run performance tests, and how we significantly reduced our storage costs by using S3 Intelligent-Tiering.
Exploring storage class options
Amazon Photos segments customer content into two categories: original content and generated assets. Original content is defined as full-resolution photos and videos uploaded by customers. Generated assets are a high-quality, low-resolution version of the original content, which are used to support and surface different customer experiences such as thumbnails in our apps or auto-curated content played back as a slideshow. Since the launch of Amazon Photos in 2011, we’ve stored both original content and generated assets in the S3 Standard storage class. However, with the availability of additional S3 storage classes, we started to explore more cost-effective options that would better suit our use cases based on the data access, resiliency, and cost requirements of our workloads.
For Amazon Photos to select a new storage class, the following criteria had to be met:
- Read/write latency comparable to S3 Standard storage
- Minimal technical efforts to migrate content to other S3 storage classes
- Availability and durability comparable to S3 Standard storage
To determine which storage class would best serve our needs, we enabled Storage Class Analysis on our original content bucket. This tool allowed us to monitor access patterns across media objects. Storage Class Analysis showed that our original content was long lived, with frequent access for the first 30 days, but infrequent access thereafter. Given this pattern, the S3 Standard-Infrequent Access (S3 Standard-IA) and S3 Intelligent-Tiering storage classes were both considered potential options.
From there, Amazon Photos performed a technical and financial cost analysis between the two storage classes. Using S3 Standard-IA would require us to store content in S3 Standard storage for 30 days, and then use an S3 lifecycle policy to migrate the content to S3 Standard-IA. Additionally, the S3 Standard-IA cost of accessing data is higher than S3 Intelligent-Tiering. This operational and financial overhead made S3 Standard-IA a non-viable option for us based on our business needs. Alternatively, S3 Intelligent-Tiering automatically moves data to the most cost-effective access tier when access patterns change, without performance impact or operational overhead. Further, with the addition of the Archive Instant Access Tier to S3 Intelligent-Tiering in 2021, there was potential to save up to 68% for data that is rarely accessed but needs instant access. With no need to configure the movement of data between S3 storage classes, a minimal monitoring and automation charge to deploy, and cost optimizations afforded by additional access tiers, we decided that S3 Intelligent-Tiering was the optimal storage class for Amazon Photos.
Next, we wanted to understand the performance difference between S3 Intelligent-Tiering compared to S3 Standard. To do this, we compared the GET/PUT latency of S3 Standard with that of S3 Intelligent-Tiering Frequent Access, Infrequent Access, and Archive Instant Access tiers. We used our upload and view end-to-end experience as the performance benchmark. For upload experience benchmarking, we ran a load test on the PutObject API. We uploaded to both S3 Standard and S3 Intelligent-Tiering with an average file size of 2.5 MB for a duration of 3 hours. For view experience benchmarking, we ran a load test on the GetObject API, with an average file size of 1.25-2 MB for a duration of 3 hours.
In order to gather the necessary metrics for each storage class, we tagged the objects that were uploaded to S3 Intelligent-Tiering. To ensure content would be migrated to S3 Intelligent-Tiering Infrequent Access and Archive Instant Access tiers, we made certain that a portion of the uploaded content was not accessed for a 90-day period.
The following graphs outline the availability comparisons in the AWS US East (Northern Virginia) Region:
Figure 1: Amazon S3 Standard vs. Amazon S3 Intelligent-Tiering Put/Get availability comparison
This test showed that there was no difference in availability performance between the S3 Intelligent-Tiering access tiers and S3 Standard for our use case. With the above benchmarking, we also monitored the FirstByteLatency and saw minimal difference between the two storage classes. Given the performance parity, the automatic data movement, and the projected cost optimizations, we decided to adopt S3 Intelligent-Tiering as our preferred storage class for original content.
Since the launch of Amazon Photos, we have been using Amazon S3. While S3 Standard storage was able to grow with the scale of our business, the need to optimize for cost and performance of a large (and growing) volume of data presented challenges. With the launch of S3 Intelligent-Tiering in 2018, and the recent addition of the S3 Intelligent-Tiering Archive Instant Access tier, the Amazon Photos team was able to instantly use an AWS solution with minimal to no changes to our existing services, and in the process save over 10% in storage costs.
Thanks for reading this blog post. If you have any comments or questions, please leave them in the comments section.