AWS News Blog

Mechanical Turk for Metadata Collection

Voiced by Polly

Esv_blog There are some fascinating posts over at the ESV blog. ESV refers to the English Standard Version of the Bible.

In order to increase the accuracy of their database of Biblical quotations, they used the Amazon Mechanical Turk. The HIT was relatively simple, and asked the Worker to identify the name of the person who uttered each quote in exchange for a payment of 2 cents. The first set of HITs was uploaded as a test of the speed and quality of the Mechanical Turk workforce. You can read the description of the work here.

A follow up post recaps the experiment and describes the results. 3,100 quotations were uploaded using a Perl script. 78 workers responded to their invitation and dove right in. Since this was a test, the folks at ESV already knew the right answer for each HIT. A first-check direct string comparison let them approve 85% of the submissions automatically. Further hand checking pushed the approval rate all the way up to 98.3% — they rejected just 54 (1.7%) of the submissions.

The blog post contains some fascinating statements about the process here are some of my favorites:

  • “Computers cant do everything.”
  • “Mechanical Turk presents a new and helpful way to spread the work inexpensively among many people.”
  • “We got a database for about $75 that, as far as we can tell, no one has created before for the Bible.”
  • “We estimate that Mechanical Turk cut our costs by about 60% for a comparable-quality result.”
  • “Workers performed these HITs almost as fast as they were uploaded.”

Hard to argue with any of these; we’ve been talking about the use of Mechanical Turk for quality control, metadata collection, and text annotation for a while now.

— Jeff;

Modified 2/9/2021 – In an effort to ensure a great experience, expired links in this post have been updated or removed from the original post.
TAGS:
Jeff Barr

Jeff Barr

Jeff Barr is Chief Evangelist for AWS. He started this blog in 2004 and has been writing posts just about non-stop ever since.