AWS Official Blog

Fotopedia and AWS

by Jeff Barr | on | in Amazon CloudFront, Amazon EC2, Amazon Elastic Load Balancer, Amazon Elastic MapReduce, Europe | | Comments

Hi there, this is Simone Brunozzi, Technology Evangelist for AWS in Europe. I’ll steal the keyboard from Jeff Barr for a few minutes to share something really interesting with you: in fact, It is always fascinating to see how our customers are using the Amazon Web Services to power their businesses.

Olivier Gutknecht, Director of Server Software at French-based Fotonauts Inc., spent some time with me to describe how they use AWS to power Fotopedia, a collaborative photo encyclopedia.

We have been very lucky with our development timeframe: we developed this project while Amazon was building its rich set of services. Early in the development we tested Amazon S3 as the main data store for our images and thumbnails. Switching our first implementation to S3 was a matter of days. Last year, when our widgets were featured on LeWeb 08 site, we enabled Amazon CloudFront for distribution of our images – literally days after the official CloudFront introduction. Before this, we moved our processing to EC2 instances and persistent EBS volumes. And in the previous months, we integrated the Elastic Load Balancing and Elastic Map Reduce into our stacks.

It is interesting to see how the AWS services replaced our initial implementation. We’re not in the business of configuring Hadoop for the cloud, for example, so we’re quite happy to use such a service if it fits our needs. The same happened to our HTTP fault tolerance layer, quickly replaced with AWS ELB.

So Amazon S3, CloudFront and EC2 (with Elastic Block Storage (EBS) volumes for the data stores) are the three key services that they are using to power Fotopedia, but they also take advantage of other AWS services.

We regularly analyze a full Wikipedia dump to extract abstract and compute a graph of related articles to build our photo encyclopedia. We use Elastic Map Reduce with custom Hadoop jobs and Pig scripts to analyze the Wikipedia content – it’s nice to be able to go from eight hours to less than two hours of processing time.

We’re also using on-demand instances and Hadoop to analyze our logs: all services logs are aggregated and archived into a S3 bucket, and we regularly analyze these to extract business metrics and user visible stats we then integrate into the site.

And there’s the secret sauce to bind this together: Chef. Chef is a young, and extremely powerful system integration framework. The Fotonauts team is working on a detailed blog post on “how we use chef” post in the future, because they consider Chef to be an essential component in our stack.

For instance, when we provision a new EC2 instance, we set up the instance with a simple boot script. On first boot, the instance automatically configures our ssh keys, installs some base packages (ruby, essentially) and registers itself in our DNS. Finally, Chef registers the instance into our Chef server. At this point we have a “generic”, passive machine added to our grid. Then we just associate a new role for this instance – let’s say we need a new backend for our main Rails application. At this point, it is just a matter of waiting for the instance to configure itself: installing rails, monitoring probes, doing a checkout of our source code and finally launch the application. A few minutes later, the machine running our load balancer and web cache notices a new backend and immediately reconfigures itself.

It would be interesting to see how they will benefit from the recent Boot-From-EBS feature that we added earlier.

What is great with this Amazon & Chef setup is that it helps you into thinking about your application globally. Running a complex application like Fotopedia is not just a matter of running some rails code and a MySQL database, but coordinating a long list of software services: some written by us, some installed as packages from the operating systems, some built and installed from source code (sometimes because it’s so recent it is not available in our linux distribution, sometimes because we need to patch the software for our needs). Automation is the rule, not the exception.

But putting aside the technical questions, our decision to base our infrastructure on Amazon Web Services led to several positive consequences on our process and workflow: less friction to experiment and prototype, an easy way to setup a testing and development platform, and more control over our production costs and requirements. We also recently migrated some instances to reserved instances billing.

I asked Olivier what’s next in their AWS Experiments and this is what he told me: “Amazon Relational Database Service.”

Thanks Olivier, and good luck with Fotopedia!

Simone Brunozzi (@simon on Twitter)
Technology Evangelist for AWS in Europe