How Capital One uses Amazon S3 Glacier to optimize data storage costs and maximize resources

At Capital One, our goal is to help our customers succeed by bringing ingenuity, simplicity, and humanity to banking. Our founders believed that banking would be revolutionized by technology. Based on that vision, our technology leaders made an aggressive effort in building a talented in-house software engineering practice. This foresight helped us become a leader in digital banking, and it continues to allow us to provide exceptional banking experiences for our customers. We started our cloud journey as part of our continuous innovation to bring even better services to our customers. We adopted the cloud in 2014 as a pilot, and in 2020 we closed our final data center to be all in on AWS. The agility, scalability, elasticity, and security of AWS helps us with our mission to change banking for good. AWS provides us with a virtually infinite compute and storage supply that helps us build the bank of the future.

At Capital One, I am responsible for tactical and strategic implementations. I primarily focus on resource usage, operations efficiency improvement, governance, and compliance of infrastructure assets for our Enterprise Platforms and Products Technology line of business. I am accountable for AWS Lambda and integration services such as Amazon CloudWatch, AWS Step Functions, Amazon EventBridge, Amazon Kinesis, Amazon SNS, and Amazon SQS. Working closely with AWS solutions architects, service teams, and customer solution managers, we build capabilities relevant to our enterprise’s strategic goals. This tight collaboration and planning allows us to improve observability, resilience, and operational efficiency across our assets.

Our storage optimization and data governance journey started back in 2016. Throughout our journey, we’ve had many learnings around Amazon S3, Amazon S3 Glacier, and Amazon S3 Glacier Deep Archive. In this blog, I share some of our key learnings with the hope that our insights can help you make more informed decisions around effectively managing resources while balancing data storage spend and performance in the cloud.

Storage is a vital part of Capital One’s technology footprint

Resource usage efficiency is key to augment the benefits of cloud migration. Data is the heart of our organization, making data storage a vital part of Capital One’s technology footprint. Storage optimization without compromising on capability-driven data durability, resiliency, and security standards is one of our key focus areas. We use Amazon S3 extensively to store data objects from application data and backup data, to analytical data in our internal data lake.

Our S3 usage has grown tremendously since we migrated our platforms and applications to AWS. As such, storage optimization has become increasingly crucial to keep storage costs in check across all our enterprise platforms. One such enterprise platform is our internal data lake powered by S3. Based on usage, S3 costs incurred on these central data platforms are charged back to data producing and consuming applications across multiple divisions. Inefficient storage practices can result in unforeseen chargebacks to these applications from central platforms.

S3 Versioning and S3 Lifecycle are two key features that play a critical role in enabling users to govern and optimize the data stored in S3. With versioning, we can recover more easily from both unintended user actions and application failures. Additionally, lifecycle policies allow for fine-grained control of storage optimization and versioning helps with our record retention configurations. We use S3 Lifecycle policies to manage our objects so that they are stored cost effectively throughout their lifecycle.

Our requirements around data management vary based on the use cases that a given platform is serving to end customers or internal applications. Rolling out a single S3 Lifecycle policy across the enterprise around storage and retention was not an option for us. We currently have several unique lifecycle policy configurations across our S3 buckets. These configurations are based on the type of objects being stored. The configurations are also based on access frequency, data retention, and archival requirements. We are continuing to find even more opportunities to optimize our storage costs.

Now, let’s discuss how we got here, our knowledge gained, and where we are headed.

S3 Versioning

It is fundamental to initially assess whether the S3 buckets must have versioning enabled. Enabling versioning on S3 means every version of every data object in the bucket is stored. If there are multiple writes in parallel, then S3 stores all those versions of data. By enabling versioning, you can recover and restore the data as part of disaster recovery, resilience exercises, accidental deletes, or updates to the data.

Our business use case around key storage buckets required us to allow versioning. We used enterprise tooling to enforce versioning for these defined buckets via automated measures that detect and correct misconfigured buckets.

S3 Lifecycle policy

Each S3 bucket with versioning enabled has a current version object, and zero or more earlier versions (non-current version objects). If we don’t configure policies to manage these versioned S3 buckets and objects, we may end up with higher storage costs than intended.

90%+ of our buckets are version enabled due to our business requirements, therefore efficient lifecycle management is crucial. An enterprise-wide default lifecycle policy rollout helps ensure that versioned objects are being stored in the cost-optimal storage class.

1. Amazon S3 Intelligent-Tiering

We use S3 Intelligent-Tiering on buckets where data access patterns are unknown. This storage class delivers automatic cost savings by moving objects between four access tiers when access patterns change.

Here are the key fields we use for the IT policy:

{"ID":"<<Intelligent Tiering>>",
"Filter":{"Prefix":""},
"Status":"Enabled",
"Transitions":[{"Days":,X Days,"StorageClass":"INTELLIGENT_TIERING"}],
"NoncurrentVersionTransitions":[]}]

Recent announcements from AWS make S3 Intelligent-Tiering even more of an “easy” button for optimization and encourages teams to explore additional optimization opportunities. We are in the process of onboarding several more buckets onto S3 Intelligent-Tiering. We expect this move to bring considerable savings to our business.

2. Default policy rollout

Earlier versions get deleted after <X> days. A non-production environment should have lower retention compared to the production environment.

Here are the key fields we use in our default policy configuration:

[{"ID":"<<Default Rule ID>>"
,"Filter":{"Prefix":""},
"Status":"Enabled",
...,
"NoncurrentVersionExpiration":{"NoncurrentDays":<X days>},
...,
"AbortIncompleteMultipartUpload":{
"DaysAfterInitiation":<Y days>}}]

Field descriptions:

ID – Unique identifier for the rule
Filter – Prefix or tags of the S3 object
Status – Rule is enabled or applied per the filter condition
NoncurrentVersionExpiration – S3 permanently deletes the old versions after X number of days
AbortIncompleteMultipartUpload – Waits for Y number of days since initiation before deleting incomplete multipart upload

3. Amazon S3 Standard-Infrequent Access (IA)

We use S3 Standard-Infrequent Access for our use cases where old versions are not frequently accessed. It’s also a great storage class for meeting millisecond latency if we must access the data right away. S3 Standard-IA offers the high durability, high throughput, and low latency of S3 Standard. S3 Standard-IA also provides a low per GB storage price and per GB retrieval fee. We roll out a rule to move the non-current version data to S3 Standard-IA. The rule stipulates moving data to S3 Standard-IA after <X > days and will keep it there for <Y> months before it gets deleted.

Here are the key fields we use for this policy:

[{"ID":"<<IA Rule>>",
"Filter":{"Prefix":""},
"Status":"Enabled",
"Transitions":[{"Days":<X Days>"StorageClass":"STANDARD_IA"}],
"NoncurrentVersionTransitions":[{"StorageClass":"STANDARD_IA","NoncurrentDays":<Y Days>}]}]

Field descriptions:

Transitions – Number of days before an object transitions to different storage class
NoncurrentVersionTransitions – Moves non-current version objects into specific storage class after <Y> number of days

4. Amazon S3 Glacier and Amazon S3 Glacier Deep Archive

Analytical data storage is a huge portion of our total storage footprint. All versions of the data were initially kept in Amazon S3 Standard due to use case-specific financial audit requirements. However, technical, product, and business teams were able to revisit the data retention and archival requirements in 2020 and updated the recommendations for non-current versions. Per Capital One’s revised requirements, we moved a major portion of the old versioned data to S3 Glacier and S3 Glacier Deep Archive. We were able to transition this data by using updated S3 Lifecycle policies.

We updated S3 Lifecycle policies across hundreds of buckets to move old versioned data to S3 Glacier after <X> days in S3 Standard storage. Data will stay in the S3 Glacier or S3 Glacier Deep Archive storage classes for <Y> days before the data gets purged permanently.

Here are the key fields we use in the lifecycle policy:

[{"ID":"<<glacier>>",
"Filter":{"Prefix":"/"}
,"Status":"Enabled"
,"Expiration":{"Days":<>},
"Transitions":[{"Days":<>,"StorageClass":"GLACIER"}]
,"NoncurrentVersionExpiration":{"NoncurrentDays":< >},
"NoncurrentVersionTransitions":[{"StorageClass":"GLACIER","NoncurrentDays":<>}]}]

[{"ID":" <<DEEP_ARCHIVE >>,
"Prefix":"","Status":
"Enabled","Expiration":
{"ExpiredObjectDeleteMarker":true},
"Transitions":[],
"NoncurrentVersionExpiration":{"NoncurrentDays":< >},
"NoncurrentVersionTransitions":[{"StorageClass":"DEEP_ARCHIVE","NoncurrentDays":< >}]}]

Field descriptions:

ExpiredObjectDeleteMarker – Deletion of versioned objects does not permanently delete. S3 puts a delete marker on it. “Delete” call is the only action that can be taken on these expired objects. This configuration explicitly tells S3 to remove the expired object’s delete marker.

5. Different data archival configuration per prefix

We have some buckets where data access patterns and retention requirements are different for different datasets. We use the Filter - Prefix configuration to apply the policies accordingly and help optimize the storage.

Example: A bucket with three datasets and varied retention and availability requirements.

Prefix 1: The current version of the data must be in the S3 Standard storage class for 30 days. After this point, it can move to S3 Glacier before getting deleted, after keeping it S3 Glacier for 90 days.
Prefix 2: The current version of the data must be in the S3 Standard storage class. Old versions are moved to S3 Standard-Infrequent Access and then deleted after 30 days.
Prefix 3: Keep the current version in the S3 Standard storage class. Old data is moved to S3 Glacier after 30 days and then stored in S3 Glacier Deep Archive for 1 year before deleting.

“Prefix” filed in the lifecycle policy plays a key role here in applying appropriate configurations onto the datasets.

{"ID":"<<XYZ>>",
"Filter":{"Prefix":"<<123>>"},
"Status":"Enabled"
,"Expiration":{"Days":<X Days>},
"Transitions":[{"Days":,Y Days>
,"StorageClass":"GLACIER"},{"Days":<N Days>,
"StorageClass":"DEEP_ARCHIVE"}],
"NoncurrentVersionTransitions":[]
,"AbortIncompleteMultipartUpload":{"DaysAfterInitiation":<M Days>}}

Key enablers

We used S3 APIs and AWS Config to build an internal dashboarding and reporting tool. Insights into S3 usage data are essential in driving the optimization. We also used the recently announced Amazon S3 Storage Lens to gain that visibility into our storage usage. S3 Storage Lens delivers us organization-wide visibility into object storage usage, activity trends, and makes actionable recommendations to improve cost-efficiency and apply data protection best practices.

Here are the metrics and insights that we focused on:

S3 bucket details by application tags, AWS accounts, environment (production, non-production), objects, folders up to three levels, S3 storage class
Storage size of current, non-current versioned data
Number of objects in current vs non-current versioned data
Max, min, and average object size
Access metrics (write, read, delete)
Visibility into lifecycle rules across buckets and accounts

These reports and insights have played a crucial role in prioritizing the datasets, buckets, and accounts for our storage optimization.

S3 optimization implementation details

We categorized the buckets into three categories for storage optimization based on the data analysis reports, and applied lifecycle policies accordingly.

S3 buckets where tuning policy can be rolled out at the bucket level.
S3 buckets where lifecycle policy rules require a review with business and senior leadership.

- We focused on analytical data storage-related applications that contribute to the considerable percentage of our S3 storage costs. Most of these applications have longer retention in S3 Standard for older versions of data. Access metrics did not justify the need to keep old versioned data in S3 Standard storage for more than 30–45 days.
- The application team engaged the business and senior leadership to help review and revise the retention and archival requirements. This was done to ensure that the optimization would not impact data retention agreements and standards. We updated our S3 Lifecycle rules per the revised requirements.

S3 buckets where we can enforce aggressive and granular rules at the prefix level to optimize our storage usage.

Critical learnings

Here are four key learnings we experienced at Capital One:

1. Enabling versions and hard delete.

Dating back to 2017, multiple buckets were previously configured with S3 Lifecycle policies to delete data older than 21 days. When our enterprise policy was rolled out enabling versions across buckets, we noticed S3 charges increasing due to soft deletes. We had to handle deleted objects and delete markers for deleted objects to clean or hard delete the data.

Configurations were updated to reflect:

- Non-current version expiration <N> days.
- Expired object delete markers set to true.

2. S3 Glacier transition fees are applied on a per object basis and the charges should be carefully considered if you transfer billions of small objects.

I recommend aggregating the data before pushing it to S3 Glacier or S3 Glacier Deep Archive if you have millions of small old version objects that cannot be deleted. Note that there is no native capability to aggregate small objects during the transfers. Add the required metadata if you decide to aggregate. Data can be disaggregated if you ever need to retrieve it back from S3 Glacier Deep Archive. Some transfers may have higher upfront costs and take longer to break even. However, S3 Glacier transition is a high return on investment when objects stay in S3 Glacier for months or years to meet retention requirements. It is important to note that effective March 1, 2021, AWS lowered the charges for PUT and Lifecycle requests to S3 Glacier by 40% for all AWS Regions.

3. S3 Glacier and S3 Glacier Deep Archive are best suited for objects that need to be restored 1–2 times a year. Consider S3 Standard-Infrequent Access if you need to restore the data more frequently and quickly. The data retrieval fee from S3 Glacier and S3 Glacier Deep Archive will be higher if the data request must be expedited and your data volume is high ($2,000 to $30,000 per PB of data retrieval depending on the retrieval speed).

4. S3 Glacier Deep Archive was a great fit for our data as it is long-lived, well beyond the 180-day minimum. It offers the lowest cost archive storage and allows indefinite data retention. Pay attention to early deletion expenses. If you configure the lifecycle to delete or transition the data early, you will be charged for the request. You will also be charged for the prorated remaining minimum retention days.

Conclusion

The long-term, secure, and durable S3 Glacier and S3 Glacier Deep Archive data archiving storage classes have enabled us to meet data retention requirements and optimize our costs. One lifecycle policy does not fit all buckets and use cases. Enforcing default configurations at the enterprise level is not an option in our scenario. Optimizing S3 Lifecycle policies without breaking audit and data retention compliance is a continuous journey. It can become challenging at times if requirements around the storage environment are not clear or are still evolving. Granular insights into S3 usage and storage metrics with the Amazon S3 Storage Lens help tune the policies.

We currently use S3 Standard and S3 Glacier Deep Archive more often compared to other S3 storage classes. We also started exploring other storage classes such as S3 Intelligent-Tiering and S3 Standard-Infrequent Access that are designed for different use cases. Working with AWS, we look forward to innovating and optimizing our storage costs even further. Hopefully this blog post provided you with some valuable insights for your own cost-saving journey in the cloud.

Thanks for reading the blog post! If you have any questions or suggestions, don’t hesitate to leave your feedback in the comments section.

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.

AWS Storage Blog

How Capital One uses Amazon S3 Glacier to optimize data storage costs and maximize resources

Storage is a vital part of Capital One’s technology footprint

S3 Versioning

S3 Lifecycle policy

1. Amazon S3 Intelligent-Tiering

2. Default policy rollout

3. Amazon S3 Standard-Infrequent Access (IA)

4. Amazon S3 Glacier and Amazon S3 Glacier Deep Archive

5. Different data archival configuration per prefix

Key enablers

S3 optimization implementation details

Critical learnings

Conclusion

Resources

Follow

Learn

Resources

Developers

Help