Articles & Tutorials>ItemSimilarity
ItemSimilarity is a simple Hadoop streaming Python application that attempts to find similar items for each item in the input dataset. This example application finds similar artists using the Audioscrobbler user playlist dataset and Amazon Elastic MapReduce.


Submitted By: Jai@AWS
Created On: March 31, 2009 4:14 AM GMT
Last Updated: April 2, 2009 9:49 PM GMT

Provided By Peter Skomoroch, President at Data Wrangling LLC

Data Wrangling blogger and AWS developer Peter Skomoroch gives us an introduction to Amazon Elastic MapReduce. In this sample application, we use the Amazon Elastic MapReduce Ruby client to run a multiple step Python streaming job that identifies similar artists based on Audioscrobbler playlists. We also run the Amazon Elastic MapReduce job on Netflix Prize data to identify similar movies based on user ratings and take the AWS Management Console for a test drive.

Source Location on Amazon S3:
Source License: Apache License, Version 2.0
How to Run this Application:

You can run this application using AWS Management Console or Command Line Tools

Further Reading:
Terms and Conditions for Associated Sample Data: Music Listening Dataset originally provided by May 6, 2005 version of the data released under the Creative Commons Attribution-Noncommercial-Share Alike (CC-BY-NC-SA) license, Version 1.0.
©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved.