Please introduce yourself and your company. I’m the CEO of Knewton, Inc., an online learning company. Knewton has used Mechanical Turk to dramatically accelerate our time to market and improve our business processes. It really is impressive how quickly problems are found, highlighted, and articulated. We’ve gotten amazing data back. Our first experience was testing the questions. The first thing we did with Mechanical Turk was actually just have people proofread and answer a question—a single question, a single HIT (Human Intelligence Task).
How have you incorporated Mechanical Turk as part of your business processes? We have tested various professionally designed logos. We tested all of them on MTurk and found one that tested best with various communities, with a sample of randomized people, people who were in our particular target market. We also used it to test out a number of taglines. We tested about 30 taglines, and found our final tagline.
This next one’s a very big thing. We basically used Mechanical Turk to compile a large database. And database compilation and maintenance is incredibly expensive and time-consuming. And in fact, it can cost hundreds of thousands of dollars to produce a big database, or it can cost $50,000 to buy a database. People are in the database-selling business, and they charge a lot of money for it. We found on MTurk, you can produce your own database very quickly, it’s higher quality, and you can update it for almost nothing.
We feel MTurk is going to change the way businesses procure data. You just MTurk for data on anything you need—for a few pennies a hit. We wanted to build a database about colleges, so we just had students at the college collect the data for us and send it over through MTurk. Boom, instant database. So, I was once in a business that was developing a database for stores in New York City. We had a team that we paid about $10 an hour. They ran around and just knocked on every single door. That cost us maybe $100,000 and took a year’s time.
Tell me more about your core product and how you use MTurk to develop it? We produce educational content and we need to know exactly how good that content is, and exactly how that content works. We write practice questions and in some cases we license practice questions from third parties. And then we distribute them on MTurk to create our testing methods. This also helps us figure out which questions are broken and we can see if there is something wrong with that question. And this is all automated.
Secondly: What are the exact parameters of those questions? What is the difficulty level? Who got it right? Was there a particular wrong answer-choice that everybody fell for that was just a great trick choice? All of these questions make the product much better. It allows us to assemble those questions in a better format, and it allows us to dynamically generate the right kind of content for people who might need an extra-difficult question, for example.
How much have you spent on Mechanical Turk gathering the same amount of records? It cost less than a few thousand dollars—and it took just a few weeks. MTurk is going to be really revolutionary. Database systems are expensive and time-consuming. They make businesses run, especially 21st-century business that are about database marketing. With MTurk you have the ability to flip a switch, once you have an MTurk system in place, to procure information from resources around the world.
Why would you use Mechanical Turk, versus something like Dow Jones or some of the business-leads or database services offered out there—Hoovers, for example? Well, there are two very good reasons. One is: It’s much cheaper. Those businesses are very expensive. And two is: I’ve been involved with database businesses before, and the dirty secret is that most databases aren’t very good. It’s really hard to build databases—I should say it’s really difficult to keep them current. It’s really, really hard to keep them current. And the information gets stale very quickly.
That’s terrifying for the buyer of the database because you don’t know how good the database is. The whole reason you’re buying it is because you don’t want to build it yourself. You can’t check its quality. Well, with MTurk, you know the quality. It’s live, and it’s basically real-time.
We built our entire website off of Mechanical Turk. And I expect, ultimately, all websites, whether they’re startups or Fortune 500 companies, are going to build their websites with Mechanical Turk.
How have you incorporated Mechanical Turk as part of building your website? We tested all of our copy. Every word that went on the site was tested—literally every word. We tested our models. Once we chose the models we liked, the ones who best represented our product according to the marketplace, we tested the specific images. We showed hundreds of images, and we asked the Turkers to tell us exactly which images they preferred. We tested every image on the site, not just the images of models. Every single thing that goes onto the site gets tested via Mechanical Turk.
And it gets tested a few different ways. We test stuff for proofreading and for usability and things like that. So I’ll talk about some of these independently.
First off, proofreading; right now, sites go up, and there are lots of errata on them and broken links, and it takes a few months to work all the stuff out of the system. Well, it didn’t take us any time, because we use MTurk for everything. There were essentially no broken links on the site after asking the MTurk workers to scour the site. We’ve discovered two. That’s it. That’s all we’ve discovered so far.
The Turkers also do quality assurance for us. They checked all of our website links, and they checked all of our user interactions. They checked the UI. They would just give us kind of qualitative and quantitative feedback like: “This link confused me. I went from here to here and I didn’t understand why that was happening,” or made suggestions “Have you thought about doing this?”
What are the best tasks to put in Mechanical Turk and what type of guidelines did you give the workforce? The workforce seems to be the most responsive when you make tasks objective. For best results, you must make the tasks black and white and the guidelines as clear as possible. We’ve gone outside of traditional uses and through trial and error, we’ve been successful.
How do you use bonuses in your HITs? We use incentives in by offering bonuses for finding errors. We put in bonuses for just about everything. Bonuses encourage people to look at more pages. It also encourages people to find little, little stuff. This is exactly what we wanted. We wanted them to find a stray period or a stray comma. We wanted them to find that stuff and so we were happy to offer incentives.
The interesting thing is, although we’re very diligent about putting bonuses in there to incentivize behavior, the Turkers seem to give you their best efforts anyway. We haven’t really had much trouble with them trying to beat the system.
We have also been very impressed by the long, written responses we get from Turkers. In some cases, we ask them to do something and then write up a little paragraph about it afterwards. And we often get these very long, detailed reports. The Turkers are very generous about giving their time.
Are the bonuses expensive? The most expensive single thing we did is we had people take an entire, full-length test, which takes three-plus hours. To do this, we gave a bonus for people who got above certain scores. And we had a tiered system: the higher your score, the more you get. We paid about $20 each on average for three hours of time. One Turker got a close-to-perfect score. We’ve had PhDs take that test; we’ve had high-school students, and everyone in between.
Have you found that when you interact and humanize the workforce, you get better results? I think one of the areas where we do get that giving-the-best aspect is when we ask Turkers for their personal opinions and their impressions of our site and services. When we’re talking about the amount of feedback that we’ve gotten, with those requests that aren’t just sort of task-oriented, it seems like the Turk community likes it and enjoys feeling like they are contributing to something bigger. We’ve had some Turkers really get behind the service and say, “Gosh, I really hope you guys succeed. And I’m going to tell my friends about it.” We’ve actually been able to turn some of the Turkers into evangelists, because they feel like they have some ownership, really, in the process. It’s a good way to get the buzz out.
What are you using Mechanical Turk for that you didn’t plan? One really big thing is pricing. Pricing strategy is one of the most complex things that any new business has to figure out. I used to be a venture capitalist, and there would be arguments about whether a particular company should double their price or make it free. It’s so difficult.
We used MTurk extensively to figure out pricing. And we asked questions a few different ways. Without giving any information we would ask questions like, “Here’s exactly what the product does… what do you think it should cost?” And the information we got back was fascinating. “Here’s the product,” and then we would make it intentionally cheap, or intentionally expensive, just to see how the Turkers would react. We found it often depended on how much interaction the Turkers had with the site. The more interaction they had, the more valuable they perceived the product to be, and so worthy of a higher price. This was good news for us, the opposite of that would have been really bad.
People who knew a lot about our industry had the price come in more or less where we thought we should price it. So it really validated what we felt the price should be. People who didn’t know anything about the industry, who weren’t really in the market for this kind of a product, we used them as a control, and they were all over the map, but their estimates tended to be low. This was not surprising because we have a high-priced product relative to a lot of Internet-based products.
How are you performance testing your website with MTurk? Our online videos are educational content. If you’re watching a YouTube video and there’s a hiccup or something, you don’t really care. But if you’re paying $1000 for a class, you care a lot. It has to be perfect. We had Turkers validate the system backwards and forwards—check our system with all kinds of connection speeds and from all kinds of different locations around the world. This was just to make sure that the product was working the way we thought it would work. And it was, in fact, working that way. But we did tweak some things. All of our beta testers were Turkers. The way most software companies right now do alpha testing or beta testing is they get a bunch of people inside the community, whatever the community is, if they’re a gaming company or whatever, to do it for them. Well, the problem with that is you get a lot of fans that way, who may not give you very good feedback. This is also not a randomized selection.
How did your business change when you began using Mechanical Turk? Mechanical Turk changed a lot of things about the way we do business, so it’s difficult to pick just one thing. One very big thing that it changed was time to market. It made everything much, much faster. Instead of a three-month beta where we’re testing things and finding bugs, we found all of our bugs inside of a week. Another big thing that it changed was the way we do market research, from testing every single aspect of the site, the images on the site, the models we use on the site, the content and popularity of various portions of the site, QA on the site, and the marketing aspects of it all. MTurk has been sort of the one place we’ve looked to answer all of our questions.
How did your product change when you began using Mechanical Turk? The biggest thing that MTurk does for the product is QA. MTurk can find a broken product very, very quickly. It can differentiate between the most popular and least popular products very quickly. We take that information and we develop and market our most popular products.
What has your cost savings been in using Mechanical Turk? We have spent less than $10,000 for about a month to six weeks of non-stop QA testing. There’s a fascinating in-sourcing aspect to Mechanical Turk. In the last ten years of product development, you typically outsourced QA to teams in India or the Ukraine or Russia and we tried a little of that. The results were pretty awful, both in terms of expense, quality, and time. All of these things are very bad for any company, and certainly a startup. When we started going on MTurk to QA, the quality and the speed just shot up, and the price collapsed. It was just fantastic. It is also important to note that managing the work with MTurk is less time-consuming than managing an off-shore QA team.
We had earmarked about $70,000 just for QA and instead we spent less than $10,000 for MTurk QA plus market research, product enhancements, HR improvements and more. So this was a 90% cost savings plus we’re getting the work much faster and it is much easier to manage.
There are things we could have never done, like testing with a full classroom. Whereas, with Mechanical Turk, in less than an hour, we had 60 people in that room, all by putting up a hit that maybe took half an hour to develop and get online. This is my original point, Mechanical Turk will help you do some things faster and better, but it will also help you do things that just weren’t possible before.
What advice do you have for businesses who are considering using Mechanical Turk? Well, the first piece of advice is you should definitely use it. There’s no business that can’t be using Mechanical Turk for something, because there’s no business that doesn’t have a marketing department, and it’s invaluable for marketing. It takes a little bit of thought to get the greatest value out of it.
How many hours of development did you have to put in to really make this work as seamlessly as you have it? We probably only had about 20 hours of additional development to handle getting the assignment ID and the hit IDs and other things into our system, so you can track the Turker metrics. We’re eventually going to hire one full-time MTurk junior resource. It’s kind of a neat idea that we’re going to have one person dedicated just to MTurk and that the resource will pay for itself. It’s a no-brainer.
Thank you for your time and for sharing your experiences. We are excited about the way you’re using Mechanical Turk. For those reading this article, if you know anyone taking a standardized test in 2009, send them to Knewton