As a young, bootstrapped company, the founders of AideRSS architected their new service to take advantage of the Amazon Web Services (AWS) cloud computing platform. “Without EC2, the project would have been impossible: we would have had to provision our own large cluster of machines and pay for the hosting, management, and all other associated fees. Instead, we were able to make use of the per-hour pricing model to keep the R&D costs low,” states Ilya Grigorik, Founder and CTO.
“Due to the large amounts of data that our service collects and analyzes, we needed a platform that would allow us to scale our processing capacity on demand – both up and down. This decision has proven its worth to us. On the very first day the service went live, we had to scale to over a hundred EC2 instances to handle the traffic caused by wide industry coverage. Since then, we scale to an average of 80 instances on a daily basis.”
The diagram below depicts AideRSS/PostRank’s current architecture on AWS. Amazon EC2 is used to power their front-end web servers, MySQL servers, and servers for indexing, crawling, analytics, staging, and more. Aside from Amazon EC2, PostRank has made heavy use of Amazon S3, EBS, and SQS for a combined solution of compute cloud, data storage services, and queuing. Amazon SQS (Simple Queue Service) offers an easy way to manage and distribute the work between all EC2 instances. SQS comes in handy as it processes the workflow of their large cluster of ~70 web crawlers. PostRank also uses Amazon S3 storage service as a backup repository for databases, log files, and other PostRank data.
Grigorik states one of the single most important aspects of using AWS, “Instead of focusing on managing infrastructure of our cluster, we’ve been able to focus on our product and what we do best – engagement ranking and filtering of online content.”
(Published March 2009)