Yelp Case Study

2014

Yelp was founded in 2004 with the main goal of helping people connect with great local businesses. The Yelp community is best known for sharing in-depth reviews and insights on local businesses of every sort. In their ten years of operation Yelp went from a one-city wonder (San Francisco) to an international phenomenon spanning 29 countries and more than 120 markets. As of June 2014, Yelp had an average of 138 million monthly unique visitors and more than 61 million local reviews have been written by yelpers.

start a python tutorial
kr_quotemark

With AWS, our developers can now do things they couldn’t before. Our systems team can focus their energies on other challenges.”

Dave Marin
Search and Data-Mining Engineer

The Challenge

Yelp has established a loyal consumer following, due in large part to the fact that they are vigilant in protecting the user from shill or suspect content. Yelp uses an automated review filter to identify suspicious content and minimize exposure to the consumer. The site also features a wide range of other features that help people discover new businesses (lists, special offers, and events), and communicate with each other. Additionally, business owners and managers are able to set up free accounts to post special offers, upload photos, and message customers.

The company has also been focused on developing mobile apps and was recently voted into the iTunes Apps Hall of Fame. Yelp apps are also available for Android, Blackberry, Windows 7, Palm Pre and WAP.

Local search advertising makes up the majority of Yelp’s revenue stream. The search ads are colored light orange and clearly labeled “Sponsored Results.” Paying advertisers are not allowed to change or re-order their reviews.

Why Amazon Web Services

Yelp originally depended upon giant RAIDs to store their logs, along with a single local instance of Hadoop. When Yelp made the move to Amazon Elastic MapReduce (Amazon EMR), they replaced the RAIDs with Amazon Simple Storage Service (Amazon S3) and immediately transferred all Hadoop jobs to Amazon Elastic MapReduce.

“We were running out of hard drive space and capacity on our Hadoop cluster,” says Yelp search and data-mining engineer Dave Marin.

Yelp uses Amazon S3 to store daily logs and photos, generating around 1.2TB of logs per day. The company also uses Amazon EMR to power approximately 20 separate batch scripts, most of those processing the logs. Features powered by Amazon Elastic MapReduce include:

  • People Who Viewed this Also Viewed
  • Review highlights
  • Auto complete as you type on search
  • Search spelling suggestions
  • Top searches
  • Ads

Their jobs are written exclusively in Python, while Yelp uses their own open-source library, mrjob, to run their Hadoop streaming jobs on Amazon EMR, with boto to talk to Amazon S3. Yelp also uses s3cmd and the Ruby Elastic MapReduce utility for monitoring.
 
Yelp developers advise others working with AWS to use the boto API as well as mrjob to ensure full utilization of Amazon Elastic MapReduce job flows. Yelp runs approximately 250 Amazon Elastic MapReduce jobs per day, processing 30TB of data and is grateful for AWS Support that helped with their Hadoop application development.

The Benefits

Using Amazon Elastic MapReduce Yelp was able to save $55,000 in upfront hardware costs and get up and running in a matter of days not months. However, most important to Yelp is the opportunity cost. “With AWS, our developers can now do things they couldn’t before,” says Marin. “Our systems team can focus their energies on other challenges.”


About Yelp

Yelp was founded in 2004 with the main goal of helping people connect with great local businesses. The Yelp community is best known for sharing in-depth reviews and insights on local businesses of every sort.

Benefits of AWS

  • Over $50,000 savings in hardware costs
  • New system up and running in days, not months
  • Increased storage capacity

AWS Services Used

Amazon S3

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. 

Learn more »

Amazon Elastic MapReduce

Amazon EMR is the industry leading cloud-native big data platform, allowing teams to process vast amounts of data quickly, and cost-effectively at scale.

Learn more »

AWS Support

AWS Support brings Amazon’s tradition of customer-obsession to the B2B technology world. We focus on helping you achieve the outcomes you need to make your business successful.

Learn more »


Get Started

Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.