Amazon S3 – Enterprise Grade, Internet Scale, and Ready for Big Data
I saw an interesting quote this past weekend from inventor and entrepreneur Dean Kamen. In response to a claim of instant success for one of his products, Dean responded that it wasn’t in fact instant, but was actually the result of between 15 and 20 years of research and development.
Amazon S3 hasn’t been around for nearly that long, but the service is growing very rapidly. This kind of rapid growth comes from a wide variety of customers with an equally wide variety of use cases. Our customers tell us that they use S3 in heavily regulated industries, for any application that needs access to data at Internet scale, and for solutions that address today’s big data challenges. Let’s look at how S3 is being put to use by our customer base in these ways.
S3 in Mature and Regulated Industries
Customers in heavily regulated industries such as finance, health care, and government are using S3 today. For example:
NASDAQ OMX is the largest exchange company in the world, and uses S3 for their FinQloud and Market Replay offerings. FinQloud provides NASDAQ OMXs clients with efficient storage and management of financial data to help address regulations such as the U.S. Securities and Exchange Commission (SEC) Rule 17 a-4 (Books and Records), which require storage of certain regulated financial data for specific periods of time. Among other use cases, FinQloud utilizes S3 for data storage to help broker-dealers meet record archival and retrieval requirements. The Market Replay offering helps customers quickly access historical stock price information. As noted in the case study, NASDAQ OMX saw that Amazon S3 would enable them to deliver hundreds of thousands of small files per day to AWS, and then back to the customer – in seconds – an ideal solution at a low cost.
Toshiba Medical Systems Corporation is a leading Japanese manufacturer of diagnostic imaging systems. They run a health care cloud service called Healthcare@Cloud which utilizes Amazon S3 for X-ray, CT and MRI image data recorded by health care institutions. S3 allows them to meet guidelines on safety management for medical information systems, as well as guidelines on employees who are entrusted with medical data (more information can be found in the case study).
The National Renewable Energy Laboratorys Open Energy Information Initiative is an open source knowledge sharing platform created to facilitate access to data and tools that accelerate the transition to clean energy systems. OpenEI, which follows guidelines set by the White Houses Open Government Initiative, utilizes S3 for storage of datasets that users can upload and share. There are several hundred datasets today, including global energy and mining data from the World Bank as well as air emissions data from the EPA.
S3 at Internet Scale
Many of the largest Internet companies rely on S3 to store vast amounts of data. Here are a few examples:
Netflix, a leading online subscription service for watching movies and TV programs, runs their streaming video business on AWS. Netflix uses S3 to store petabytes of video content, which they then distribute to their customers devices via a CDN. When Netflix needs to create a video format for a new device, they then stream their S3 video content to thousands of EC2 instances for transcoding.
Instagram enables its users to quickly and easily share photos with their friends and family from their mobile devices. S3 provides the storage backend behind Instagrams offering, which has now grown to over 100 million users per month.
Spotify is an online music service offering instant access to over 16 million licensed songs. As noted in the case study, using S3 gives them confidence in their ability to expand storage quickly while also providing high data durability, allowing them to add over 20,000 tracks a day to their catalog.
S3 for Big Data
Many of our customers store their application and web server logs in S3 for later analysis. These files can occupy a lot of space but S3 handles them with ease. Here are a few examples:
Yelp is best known for sharing in-depth reviews and insights on all types of local businesses. Yelp uses S3 to store daily logs and photos, and Amazon Elastic MapReduce to process these logs to power features like People Who Viewed this Also Viewed and Review highlights.
Pinterest is an online pinboard that lets their customers share things they love with their friends. Fortune Magazine recently reported that theyre one of the fastest growing social networks of all time. They use S3 for file and log storage, and process these logs on Elastic MapReduce to draw key insights into their business.
Etsy provides a website for individuals to sell handmade, vintage items, and craft supplies. They have over 25 million members and 18 million items listed today. They store their HTTP server logs in S3, and use Elastic MapReduce for web log analysis and recommendation algorithms.
S3 For You
If you find these examples interesting, you can get started at no charge using the AWS Free Usage Tier. You can also read the S3 documentation and build your own apps that use S3 storage using the AWS SDKs.