AWS Storage Blog
20 years of Amazon S3: A storage professional’s journey to AWS Hero
I’ve been working with data storage technologies for more than 20 years. Over that time, storage technologies have matured to keep up with exponential data growth. Solid state storage replaced spinning disk for the most critical workloads. Drive capacities grew from tens of gigabytes to tens of terabytes. On March 14, 2006, in the middle of this data storage renaissance, AWS launched Amazon S3, opening a new frontier for data expansion. Celebrating 20 years of S3 storage is a great time to reminisce about my first experiences with cloud storage and some of the most exciting innovations that have made S3 what it is today.
If I’m being honest, my first experiments with cloud storage left a lot to be desired. In 2010, I had a project to replace an aging backup tape library protecting maybe 30 terabytes of data. Our backup software vendor announced support for S3, so I took a look. Of course, with zero cloud experience, the learning curve was more than I had the time to tackle, especially with a short timeline to retire legacy hardware. Wanting to do my due diligence, I pivoted to a tabletop exercise focused on economics. At the time, tape offered the lowest price to store data with long retention times. S3 for backup wasn’t an architecture that aligned well with my project budget. That said, I did see a future where I wouldn’t have admins driving through snowstorms to fix offline tape drives, and that was exciting!
With a full plate of data center projects, I set cloud storage aside for a few years. I replaced the tape library, added a few hundred terabytes of storage to the data center, retired old storage systems, deployed some cutting-edge NAS. I didn’t pay much attention to the cloud for a few years. Then, in mid-2016, cloud storage seemed to pop up in everything I read. I checked back in and the cloud had matured. S3 pricing had come down to compete with my data center backup solution. Alternate storage tiers were alive and well in S3, and I could archive data for fractions of a penny per gigabyte! Even better, I was getting added to a cloud migration project that would encompass over 1 petabyte of data. It seemed like every night I was figuring out some new pattern in the cloud! Since then, there have been numerous S3 feature and capability launches that have changed the way I build in the cloud.
The first feature release that accelerated my S3 adoption was the launch of Server Message Block (SMB) capabilities for the Amazon S3 File Gateway. I had been using file gateways, running on Amazon EC2, as a Network File System (NFS) target for database backups. NFS was brittle when connecting from Windows database servers, and gateway updates forced a reboot of my entire SQL environment. Reboots meant late night work and, more importantly, downtime for my customers. The ability to connect to S3 over SMB improved uptime, improved data security, and simplified maintenance. Once I’d standardized this architecture, I went on to automate gateway deployments with Terraform and even gave a chalk talk on my architecture at AWS re:Invent in 2018!
The second S3 innovation to call out is the launch of the S3 Intelligent-Tiering storage class. Lifecycle policies on an S3 bucket are very straightforward in cases where data has a well-defined retention. More often than not, though, the only way I could establish retention policies was to spend hours upon hours studying metadata: file age, update frequencies, and usage patterns. S3 Intelligent-Tiering came along and eliminated that toil with a single bucket setting change. This gave me a way to guarantee my users that they were storing data as cost efficiently as possible. It also eliminated the risk of a misconfigured Lifecycle policy causing a cost spike.
The third S3 feature to make my list is the move from eventual to strong consistency. If you’ve worked with S3 long enough, you’ve certainly come across the impact of eventual consistency. Applications had to factor in techniques for ensuring updates to an object were fully complete and consistent before triggering a downstream process to access that data. The move to strong consistency was outstanding. It just happened, entirely behind the scenes, and had been quietly rolled out before the official launch announcement at re:Invent. Not only did strong consistency make it easier for applications to interact with data, but I would also argue this was the best launch AWS has done. Zero downtime, zero customer impact. Amazing work by the S3 team!
These are just a few highlights from my experience over the first 20 years of S3. Whether enabling data lakes, powering the training of large language models, or serving as the backbone for vector stores, S3 has been the foundation. What started as Amazon’s Simple Storage Service has become something far more powerful than simple storage. No matter what innovations come next, one thing is certain: S3 will be right in the middle of it all!