AWS Case Study: Scribd – Reading on a Cloud

Scribd is the largest social publishing and reading site on the Internet, with over 50 million monthly readers. Scribd allows users to turn PDF, Microsoft Office Word and Microsoft Office PowerPoint files into Web documents readable through Scribd.com, mobile devices, downloads, or print and accessible to readers through sites such as Facebook or Twitter and search engines such as Google. Many well-known businesses, media companies, government organizations, and professional publishers, including the Chicago Tribune, Ford Motor Company, Harvard University Press, Lonely Planet, O’Reilly Media, Random House Publishing Group, the Red Cross, The New York Times Dealbook, The World Bank, UNICEF, and the World Economic Forum—use Scribd each month to get their message out to a large audience.
Scribd

Scribd was launched in San Francisco in 2007, using Amazon Web Services (AWS) from the outset to store and process all documents uploaded to the site. The Scribd team uses Amazon Simple Storage Service (Amazon S3) to host original and converted document assets, Amazon Elastic Compute Cloud (Amazon EC2) to convert the documents from original format to Web-readable HTML, and the Ruby AWS library.

When Scribd decided to switch the site from Adobe Flash to HTML5, it meant that the Scribd team needed to reprocess every document ever uploaded to the site into HTML format—millions of files. The computational power required to do that job was monumental for a small company like Scribd, with its team of 50; the deadline for completing the conversion was very tight. The Scribd team decided to handle the conversion on Amazon EC2.

“With AWS, we got to talk directly with an account manager at Amazon who listened to our business problem and helped us design a large batch job,” says Jared Friedman. Scribd runs a scalable grid of slave nodes that process files, and a master node that controls them. The master node scales the grid in response to changing demand, and can purchase either on-demand or spot-instances depending on the time sensitivity of requests.

Scribd made extensive use of Amazon EC2 spot instances for its batch conversion, saving 63%, or $10,500, compared to what it would have spent on on-demand instances for the same job. Scribd ran its batch processing job using up to 2,000 Amazon EC2 spot instances at a time. The only change required to make use of spot instances was writing “a couple of really small scripts”. Friedman’s team was able to move from on-demand to spot instances in “a couple of hours, coffee breaks included.”

When the Scribd team ran into issues mid-way through the process because of software overload on the thousands of instances processing, Amazon jumped on the problem with urgent support. “Helpful, technical people there looked through our account data, explained to us what was wrong, and helped us understand how to fix it,” says Friedman. “Thanks to Amazon’s help, we were able to get the job done, actually ahead of time and under budget—at amazingly low cost.”

Friedman recommends taking advantage of Amazon’s premium support options, saying it is well worth the cost when you’re caught in a pinch. In the future, Scribd plans to make greater use of spot instances, which Friedman describes as a great way to save money on Amazon EC2. “AWS has supported Scribd’s business for three years with nary a hiccup,” says Friedman, “allowing us to easily handle our millions of users.”

To learn more, visit http://www.scribd.com/ This link will launch in a new browser window or tab..

Top









Security Whitepaper
Learn about our physical and operational security processes for network infrastructure.

whitepaper View Whitepaper (pdf)



AWS Customer News
Read the latest announcements about AWS customer success and innovation.

View Media Coverage

©2011, Amazon Web Services LLC or its affiliates. All rights reserved.