Launched in San Francisco in 2007, Scribd is an Internet company that provides social publishing and reading services. Scribd converts documents into a Web format readable on Scribd.com. Scribd members can share documents across the web, mobile devices, and social platforms such as Facebook and Twitter.
Scribd members range from individuals to large corporations. John Wiley & Sons, The World Bank, and Facebook are just some of the organizations that use Scribd. With over 90 million monthly readers, Scribd is one of the largest social publishing and reading sites on the web.
From the beginning, Scribd used Amazon Web Services (AWS) to store and process the documents uploaded to its site. Jared Friedman, co-founder of Scribd states, “It was by far the easiest way to scale our web storage to meet customer demand.” Scribd uses Amazon Simple Storage Service (Amazon S3) to host original and converted document assets, Amazon Elastic Compute Cloud (Amazon EC2) to convert the documents to Web-readable HTML, and the AWS SDK for Ruby.
When Scribd decided to migrate its documents from Adobe Flash to HTML5, the team had only weeks to convert millions of files to HTML format. With a tight deadline to complete the conversion, the computational power required was monumental.
“We talked with an account manager at AWS, who listened to our business problem and helped us design a large batch job using Amazon EC2 Spot Instances,” says Friedman. Scribd runs a scalable grid of slave nodes that processes files, and a master node that controls them. The master node scales the grid in response to changing demand, and can purchase On-Demand or Spot Instances, depending on the time sensitivity of requests. Scribd estimates that it saved 63%, or $10,500, by using Spot Instances for the batch conversion instead of On-Demand Instances for this particular job.
Scribd used up to 2,000 Spot Instances at a time to run the batch-processing job. “We only had to write a couple of really small scripts," says Friedman. “We were able to move from On-Demand to Spot Instances in a couple of hours, coffee breaks included."
The team called on AWS Support when they ran into issues mid-way through the process because of software overload on the instances. “Helpful technical people looked through our account data, explained the issue, and helped us understand how to fix it,” says Friedman. “Thanks to AWS, we were able to get the job done, ahead of time and under budget—at an amazingly low cost.”
In the future, Scribd plans to make greater use of Spot Instances, which Friedman describes as a great way to save money on Amazon EC2. He continues, “AWS has supported our business for five years, allowing us to easily handle our millions of users.” With the recent release of Amazon Glacier, Scribd quickly took advantage of the low-cost and durable storage service to backup all of their data on Amazon Glacier, including files that previously had no backups. Scribd found Amazon Glacier particularly helpful for database snapshots and log files. Friedman comments, “Amazon Glacier's extremely low prices have made it economically viable for us to do far more comprehensive backups than we were previously able to do.”
To learn more about how AWS can help your web application needs, visit our Web Applications details page: http://aws.amazon.com/web-mobile-social/
Added November 6, 2012