Category: Mechanical Turk
I recently interviewed Sharon Chiarella for The AWS Report. Sharon is an Amazon Vice President with responsibility for the Amazon Mechanical Turk. After we talked about the Mechanical Turk concept in general terms (“a marketplace for work,”) we zoomed in and talked about the kinds of work that is being done and who’s doing it:
Sharon told me that the most popular kinds of work are transcription, writing, data cleansing, sentiment analysis, and moderation. She connected data cleansing to big data in an interesting way; I learned that data cleansing is an essential noise reduction tool, especially when processing data that has been aggregated from multiple sources, as is the case with the Amazon.com product catalog.
We also discussed the Mechanical Turk workforce (500,000 people in 190 countries, with obvious hotspots in the US, India, Canada, and the UK and some not-so-obvious ones such as Kenya) and the fact that they are generally well-educated. Demographically, the workers value the flexibility that Mechanical Turk affords them: they can choose what they work on, how much work they want to do, and when they want to do it. The workforce include stay-at-home mothers, retirees, and also students.
I enjoyed our talk, and look forward to bringing you additional interviews on The AWS Report.
Categorization is one of the more popular use cases for the Amazon Mechanical Turk. A categorization HIT (Human Intelligence Task) asks the Worker to select from a list of options. Our customers use HITs of this type to assign product categories, match URLs to business listings, and to discriminate between line art and photographs.
Using our new Categorization App, you can start categorizing your own items or data in minutes, eliminating the learning curve that has traditionally accompanied this type of activity. The app includes everything that you need to be successful including:
- Predefined HITs (no HTML editing required).
- Pre-qualified Master Workers (see Jinesh’s previous blog post on Mechanical Turk Masters).
- Price recommendations based on complexity and comparable HITs.
- Analysis tools.
The Categorization App guides you through the four simple steps that are needed to create your categorization project.
First, you create the project, assign a name to it, and enter the question that you want to ask of the workers.
Next, you provide the categories that the Master Workers will use. You simply enter the names and the App will generate the HTML for you. You also have the opportunity to supply instructions as part of the step.
The next step is to upload the data to be categorized. You simply upload a CSV file and select the fields that you’d like the Workers to see. If one of your fields contains links to images to be displayed as part of the HIT, you can also set that up in this step.
Finally, you review the pricing information and the total cost for your HIT. Once everything looks good, you go ahead and Publish the HITs and await the results.
Two Master Workers will handle each HIT. After the Workers have finished all of the HITs in the project, you can use the Result Analysis tools to see a summary of the results. You’ll be able to see how often both of the Master Workers agreed on a categorization, and how often they disagreed. You can also download the actual categorization data for each of your items.
Read more about this feature on the Mechanical Turk Blog.
There are now more than 500,000 Mechanical Turk Workers in 190 countries on the Mechanical Turk marketplace. These Workers are ready and eager to work on a wide variety of tasks (Human Intelligence Tasks or HITS) including data cleansing, image filtering, image tagging, and categorization.
Today, we’re excited to introduce Mechanical Turk Masters. This new feature gives Requesters direct access to the best Workers on the Mechanical Turk marketplace. Masters are an elite group of Workers, who have demonstrated superior performance while completing thousands of HITs across the Marketplace. Masters must maintain this high level of performance or they may lose this distinction. Mechanical Turk has built technology which analyzes Worker performance, identifies high performing Workers and monitors their performance over time. Starting today, Requesters will be able to access two types of Masters — Photo Moderation Masters and Categorization Masters. Workers with these qualifications will have a certain skill set. For example:
- Photo Moderation Masters have proven that they can review photos against site guidelines, determine if a photo includes specific content, or pick the best photo given a set of criteria.
- Categorization Masters have proven that they can categorize web pages, advertisements, or products in a social catalog. They can also classify the sentiment of social media content.
These qualifications are automatically granted to Mechanical Turk Workers on an on-going basis. Mechanical Turk Masters have exclusive access to HITS that require a Masters qualification. They also have access to a private forum, and need not navigate through a CAPTCHA to work on HITS.
With this new system, skills developed on HITS from one Requester can now qualify Workers to handle HITS from other Requesters.
Read more about this new feature on the Mechanical Turk blog.
The Mechanical Turk now has a blog of its very own.
The new Mechanical Turk blog will be used to share information with Mechanical Turk Requesters, Workers, and fans of the Mechanical Turk marketplace.
It will include product announcements, how-to guides, best practices, case studies, and examples of how businesses are using Mechanical Turk for everything from transcription to content moderation to data cleansing and testing of search algorithms.
Programs have always been good at dealing with highly structured, very uniform data. They sometime stumble when asked to deal with data that is irregular, unstructured, or otherwise messy in some way. Normalizing data that came from a casual, real-world source where people are allowed to enter free-form text can be tedious and expensive.
A recent story on ReadWriteWeb tells the tale of data scientist Peter Skomorch and his analysis of real-world data taken from LinkedIn. Peter and his colleagues used a processing pipeline which made use of the Amazon Mechanical Turk to tap into what he described as the “human brain-power of thousands of Turks.” They were, for example, able to figure out that “IBM”, “I.B.M.,” and “IBM UK” all referred to the same company.
Earlier, Pete had used this technique to create a view of the locations of thousands of Twitter users; the code for this project can be found here.
If you are interested in the Amazon Mechanical Turk and other forms of workforce collaboration, you may also find the upcoming Net:Work 2010 conference to be of interest. Sharon Chiarella, VP of the Amazon Mecanical Turk, will be speaking. The conference will be held in San Francisco on December 9th; you can get a $100 discount by clicking here.
We’ve added some new workforce management tools to Amazon Mechanical Turk. If you are are a current or potential Mechanical Turk Requester, these features are for you.
These features were designed to make it easier for you to find and reward the Workers who are doing good work on your HITs (Human Intelligence Tasks). The new tools help you to identify good Workers, allow you to manage Workers and Qualification Types, and save you time by sending more work to your best Workers.
The new features include:
- Worker Statistics – Information on who is doing work for you and how well they are doing at it.
- Qualification Types Management – The ability to create new Qualification Types using the web interface, and to assign them to Workers. You can use this pair of features to create groups of highly qualified Workers.
- Worker Management Tools – The ability to provide selected Workers with bonuses and to block undesirable Workers.
Here is a presentation with more information about each feature:
You’ll here from Donghui Feng, Research Scientist for AT&T Interactive. He’ll talk about the use of the Mechanical Turk for data extraction and data analysis. Following his talk will be Neil Symes, Director of Listing Quality at AT&T Interactive.
Next on the roster will be Omar Alonso of Microsoft. He will talk about experiment design and execution, and will gthen present some guidelines for data preparation, interface design, quality control, and scheduling.
A member of the Mechanical Turk team will also be speaking.
Attendance is free but space is limited and you need to register.
PS – If you can’t attend, check out some of our on-demand case studies.
I was on the east coast of the US last week and spent a very pleasant day at the Emerging Technologies for the Enterprise conference in Philadelphia. As a native of the city is always great to go back. During my all too brief time in the city I spoke at the conference, met with a couple of developers, and had time for a cheese steak at Sonny’s Famous Steaks in Old City.
While at the conference I heard about a really cool and unique use for the Mechanical Turk and I just had to share it with you!
You can choose from a number of well known tunes:
Then you enter your message:
You then pay for your croon using Amazon Payments and wait for an email message to indicate that it is ready!
A Mechanical Turk HIT (Human Intelligence Task) is created and your croon will be created within a half-day or so (actually, mine took just 36 minutes):
The finished croon is stored in Amazon S3 and can be played directly. Here’s mine!
Is that cool, or what?
Here’s some information on some AWS events coming up in April, all on the East Coast of the US:
- I will be speaking at the Rochester AWS User Group at 6:00 PM on Monday, April 5th. My talk will cover some of the latest AWS developments including the Virtual Private Cloud and the Relational Database Service.
- I will be speaking at the New York City Cloud Computing Group at 6:00 PM on Tuesday, April 6th. I’ll cover VPC and RDS again.
- I will be speaking at the Emerging Tech for the Enterprise conference in Philadelphia on April 9th. I am looking forward to this visit to my home town! If you will be at ETE, please say hello, and also plan to see Chris Cera and David Brussin talk about Enterprise Cloud Computing: Pitfalls, Puzzles, and Great Rewards.
- As the final talk of my trip to the East Coast, I will be speaking to the RubyNation conference in Reston, Virginia on Saturday, April 10th. I worked in Reston back when the unofficial motto was “We’re not dead, we’re Reston.” Things have livened up considerably since then and I’m looking forward to connecting with some old friends and colleagues while I am in the area.
- There will be a Mechanical Turk Meetup in New York at 6:00 PM on April 13th. Learn more about Mechanical Turk‘s global on-demand workforce, discover best practices, talk to existing Requesters, and mingle with members of the Mechanical Turk team. Preregister here.
- Terry Wise, Director of Business Development for the Amazon Web Services, will be speaking at PegaWORLD in Philadelphia on April 26th. Terry will talk about how Tenet Healthcare uses Pegas Cloud Computing solution to radically improve the way it builds its business process applications, reducing delivery time and cost by a factor of 5. Discount registrations for the conference are available here.
PS – Despite the route implied by my map, I will be traveling by plane and train!
The Amazon Mechanical Turk team will be holding Meetups in Seattle and Mid-town New York in the next couple of weeks.
Attendees will learn about new ways to use the Mechanical Turk’s global on-demand workforce, discover best practices for recruiting and managing a qualified worker group, and see how to break down complex projects into workflow tasks. Attendees will also have the opportunity to learn from existing Requesters, to network with members of the Mechanical Turk team, and to hob-nob with local tech leaders.
The Seattle event will be held August 18th at 6:00 PM. The New York event will be held on September 1st in mid-town. More information about the event can be found here.
Space is limited, so register now if you would like to attend.