foursquare Labs, Inc., is a location-based social network in which its more than 10 million users check in via a smartphone app or SMS to exchange travel tips and to share their location with friends. By checking in frequently, users earn points and virtual badges. To perform analytics across more than 5 million daily check-ins, foursquare uses Amazon Elastic MapReduce, Amazon EC2 Spot Instances, Amazon S3, and open-source technologies Mongodb and Apache Flume.
Foursquare streams hundreds of millions of application logs each day into Amazon S3 using the open-source technology Apache Flume. Using Amazon Elastic MapReduce, they perform a wide range of analytics on this data to better understand and service their customers. This includes:
Foursquare chose Elastic MapReduce for three main reasons:
"Using Amazon Elastic MapReduce to analyze data stored in Amazon S3 rather than maintaining our own Hadoop cluster was the clear choice," said Foursquare Software Engineer Matthew Rathbone. "Hadoop clusters can be difficult to manage, leading to weeks spent debugging minor issues. Amazon Elastic MapReduce gets rid of this wasted time without requiring dedicated support personnel. Additionally, if you want to update your application or need a modified configuration, you can simply terminate the cluster and start a new one."
foursquare runs Amazon Elastic MapReduce clusters using a mix of High Memory and High CPU instances. Previously, this Amazon EMR analytics cluster was running On-Demand Amazon EC2 Instances, but just recently, foursquare decided they would purchase over $1 million of 1-year Heavy Utilization Reserved Instances, reducing their costs by 35% while still using some On-Demand instances to provide them with the flexibility to scale up or shed instances as needed. Due to the new Amazon EC2 price drop by AWS, foursquare’s costs were lowered even further. This price reduction will help foursquare save another 22%, and their overall Amazon EC2 Reserved Instance usage for their Amazon Elastic MapReduce cluster qualifies them for an additional 10% volume tier discount on top of that. This price drop combined with the move to Reserved Instances will help foursquare reduce their Amazon EC2 instance costs by over 53% without sacrificing any of the scaling provided by Amazon EC2 and Amazon Elastic MapReduce.
"Amazon Elastic MapReduce had already significantly reduced the time, effort, and cost of using Hadoop to generate customer insights," said Rathbone. "By expanding our clusters with Reserved Instances and On-Demand Instances, plus the Amazon EC2 price reductions, we have reduced our analytics costs by over 50% when compared to hosting it ourselves. Additionally, we have decreased the processing time for urgent data-analysis, all without requiring additional application development or adding risk to our analytics."
Unlock your city with foursquare® at http://www.foursquare.com/.
Added March 22, 2012