Tim Tryzbiak, VP of Operations at Channel Intelligence tells us about leveraging Amazon Mechanical Turk in their workflow. Channel Intelligence helps manufacturers, retailers, and affiliates improve and understand their data in order to sell more products. They help shoppers find and buy the products they are looking for, both online and in local stores. They also provide data intelligence that helps identify the latest trends about consumer buying habits.
Why did you choose Mechanical Turk?
We were introduced to Mechanical Turk by one of our developers. About seven months ago, he pulled a bunch of us together and said, “Hey, I found this new tool and I think it can really improve what we’re doing…” Certainly this piqued our interest and we decided to take a closer look. One of the things that we’re constantly doing is looking at different ways to make our work more efficient. We’ve always tried to think outside of the box and leverage different ideas and technology. I think the tag-line you guys have, “artificial artificial intelligence”, really grabbed our attention. When we started looking into Turk, we really started to see the benefits. We thought, “Here’s a way we can be more efficient in our operational tasks while still maintaining quality.” From there, we decided to try our first HIT and see what kind of results we’d get.
What was your first HIT?
The first HIT that we decided to try was part of our data optimization process. One of our first steps in the optimization of data is categorization. In our world, we deal with about 100 million products every day and we probably have a product churn of about 50,000 to 100,000 a day. The first thing we do is we take these products and categorize them. This helps us put them in context so we can do data cleansing and mining. Our categorization process is mostly automated and it gets about 80 percent of the items in the right place, right away. The other 20 percent doesn’t get caught because the data tends to be poor or the scripts simply don’t exist yet. This means we need manual intervention to try to figure out what these products are.
The challenge for us is that when you don’t know what a product is, you don’t know if it’s the most important product in the world or the least. If you can categorize the product, you’ve at least got a hint to the answer. So, when we heard about Turk, we thought it might be able to help us with this problem. What we essentially did was take 73,000 items that we couldn’t categorize automatically and we put them out to the Turkers and asked them to categorize the products for us. Quite frankly, we were pretty surprised with the results. We weren’t quite sure what to expect. Here, you’ve got thousands of people around the world that are looking at the data and giving advice. We just weren’t sure what we were going to get. What we found was that the workers actually did a really good job. We estimated that it would have taken us 600 man hours to categorize the 73,000 products correctly. The Turkers ended up doing it in four days. It was nonstop. We kept logging in to see if the work was being done and the HITs kept getting worked. Needless to say, it was a pretty exciting moment for us.
Our next questions were, “How good are the results going to be? What kind of quality are we getting?” Like we have in the past, we looked to the concept of ‘consensus.’ When we built the HIT, we made sure to build it in a way that we could ask the same questions, multiple times. Fortunately, this was something that Turk supported. Basically, we took the 73,000 items and we asked the same question three times: “Where would you place it?”
What we started to do was look at the different levels of consensus. So, in the case where three different people gave us the same exact answer, we went, “Wow! Great! A high confidence in this.” Then, we started doing spot checks. Well, guess what? The answers were right. We were seeing that people were really categorizing these products in the right place.
In the case where we would get two out of three people agreeing, we’d go in and try to figure out what the difference was. In the end, once we evaluated all the results in our spot checking and validation, we found that about 88.3 percent of the final results were accurate. For us, this was a real win. Remember, this was also done in four days.
What do you estimate your cost savings has been?
The cost of using Turkers was about 22 percent of what we figured we would have paid if we would have done it either manually in-house or had to outsource to a temp agency. We were excited that the work was done faster and cheaper and we still had what we felt like was a pretty high quality level.
What are the most common misconceptions you made about Mechanical Turk?
Of course, we initially had a different expectation of the Amazon Turkers when we first heard about the service. We thought, “Ok, here’s a bunch of people that…” I guess I’ll be blunt. “…that we thought didn’t have jobs and that probably wouldn’t do that great. We just didn’t understand the kind of dedication or work ethics involved.” We thought, “Anyone can press the button and vote for an answer and we’ll probably just get a bunch of spam. Are they really going to care about the results?”
I think the thing that blew us away was simply the level of dedication by the Turkers and the quality of work that they came out with. To be honest with you, every time we put out a new HIT type, we’re surprised.
I’ll give you another example. We put out a HIT type where we asked the Turkers to tell us how many retailers were showing up on a particular product page. We figured it would be easier if we gave them a drop-down and let them chose from ‘one’, ‘two’, ‘three’, ‘four’, or ‘five plus’. Then we gave them a comment box.
Nearly every one of the Turkers, when they responded back, filled in the comment box with the exact number of retailers that were showing up. So, instead of just choosing ‘five plus’ and being done, they’d write in ‘27’ or ‘13’ or whatever the real number was. It hit us at that point, that these people really did care about what they were doing and they wanted to get the answer right. It was a pretty amazing reality check for us that these Turkers would actually be good workers.
How has Mechanical Turk changed the way you do business?
The way we’re doing business is changing, in the sense that we’re becoming more efficient. Our operational goals have always been the same: increase efficiency, increase quality, and reduce costs. When I look at what we’ve achieved using Turkers so far, we find that our through-put has increased about four-fold. So, what was taking us, let’s say four days, we can now do in one.
What I’ve been able to do now in operations is layer in more quality assurance and protection to make sure that we’re doing the right thing by our clients without affecting our existing service level agreements or contracts. We feel our level of quality has certainly gone up because of it. Also, what we’re able to handle has increased because we’ve integrated these Turk tasks directly into our systems and our processes. This, I think, will be a very key point for people that are looking to use Turk.
How have you automated the process of working with Mechanical Turk?
Integrating Mechanical Turk into our existing operational processes became very important to us. Before using Turk at all, our entire categorization work flow was automated. Data would come in and we’d automatically categorize a product or have it flow through to the tools where our staff did it. When we implemented that first categorization HIT, we did it on the side instead of integrating the HIT within the existing workflow. The process was pretty painful. We exported the uncategorized products manually and then we had to tweak the format of the export in order to get it onto Turk. Once we saw the Turkers were done, we had to manually pull the results off of Amazon. After that, we had to figure out how to tie the results back into our existing categorization workflow.
We started calling this the “side process.” In looking back, it was funny because even though we were getting more work done by the Turkers, we were actually becoming less efficient. Turk wasn’t so much the problem; it was more the fact that we didn’t really integrate Turk with our existing systems.
Our philosophy has changed since then and for our big Turk jobs, we look at this like any other technology that we develop. We commit to it, we integrate it, and we make sure that we put the appropriate quality checks around it. In the case of this categorization HIT, we’ve now integrated it directly into the workflow and there is no “side process.”
In this new integrated solution, when our system does not recognize or categorize a new product, it shoots it over to a queue that will automatically send that work out to the Turkers. Then, we have an agent running that gets the results, compares the work, and pays automatically based on the thresholds we’ve set. The results that pass our consensus and quality tests get injected directly into our optimization engine.
By doing this, we’ve been able to cut out all of the side processes that were slowing us down. We think of Turk like it’s a technology and we’ve embraced it. In general, we’ve found that with this type of integration, we are seeing about a 70 percent reduction in the amount of time it takes for us to get these jobs done. Of course, this is a rough average across all the different tasks that we do. We’re also finding that we’re reducing costs by about 85 percent as well. This has allowed us to reallocate resources to other important tasks.
Absolutely exciting! I think again at what my goals are; increase efficiency, increase quality, and reduce costs, and see Turk helping me do all of these. When I can increase the speed at which I do things and maintain quality, I can do more. Add in the 85 percent reduction in cost for these specific tasks and it’s pretty exciting. Everyone from the CEO down recognizes why we’re embracing Turk as a technology and what we’re trying to do.
What advice would you give to others who are considering Mechanical Turk?
If I were to give advice to people interested in using Mechanical Turk, I’d say that there are four key things to pay attention to.
One, you really have to respect and work with the community. When I say ‘work’ with the community, I don’t mean it in a negative way. These people are actually trying to help you. We spend about 15 hours a week with the Turk community and to be honest with you, I love it. I have no problem with it because we see the results.
The Turkers actually help by responding to us and saying, “Hey, this HIT that you’re doing… we would be able to do it better and faster if you made a change.” We’ve taken the advice and made the changes and we’ve seen the results. So, listen to the Turkers, talk to them, and when someone comes back and says, “I don’t really understand what you want me to do”… respond. You’ll find you get a lot of feedback.
Another example of working with the community is that we started sending out messages to the Turkers about how well they’re doing with some of our tasks. I was just reading a response from one of the Turkers where we sent them a message saying, “Thanks for doing these HITs. You’ve got an 87 percent accuracy rating and you’re doing good.” He wrote back and said, “Thanks for the feedback. This is great! It lets me know how I’m doing.” Respecting the community and paying attention to them is key to making this work.
The second thing that I’d say is that you need to leverage consensus. As we try to manage quality, we leverage the fact that we can ask the same question multiple times rather cheaply. When three or four different people around the world give you the same answer back, you’ve got a pretty high confidence that the answer is right. Because we measure that confidence level, we’re able to tie results directly into our system.
The ability to ask multiple people and compare answers is, at least in our world, a huge benefit. Using rules like ‘three-out-of-four or 10-out-of-10 answers must agree’ allows us to control quality at different levels.
The third thing that we’ve already talked about is integration. If I were to be blunt about it, what I would say is that after someone has done their due diligence and they feel they want to engage Turkers, commit. You can decide not to fully integrate and keep something like Turk as a ‘side process,’ but you won’t see the full benefit like what we’ve seen. If you’re going to embrace technology, then build it into your systems. I think we’ve done it in a way that we’re able to use different types of intelligence; automated and human. Amazon Turk obviously is key to this.
The fourth and last thing is one of the lessons that we sort of keep learning. The Turkers seem to do really well when you create a small, very objective task. What we find is that when we give the Turkers tasks such as, “Click on this URL and tell us how many retailers are listed,” we get the best results. It’s very objective and very finite. When we say, “Here’s a big, broad product description and we want you to find the color of the product,” Turkers do really well.
Where we’ve had less success is when we ask them to, “Go through the product description and find all the attributes you can.” We end up getting so many different results (some good and some bad) that it’s harder for us to manage them. So, the way we changed this particular job was to give the Turkers a specific list of attributes and ask them to find the corresponding values. We found that the quality of the results increased probably three- or four-fold.
We try to make these jobs as objective as possible or break them into smaller chunks. I think one of the tricks that we do that works quite well is to create HITs that feed into each other. For example, let’s say we’re trying to harvest attributes. The first job of the sequence is to ask which attributes are found on the web pages we provide links to. For digital cameras, we may get back results like: color, mega-pixels, digital zoom, etc. This gives us a foundation to build off of.
Next, we ask the Turkers, “For this digital camera, tell us what the color is.” So, by breaking the overall task into multiple, smaller, more objective tasks, we get the same work done with a higher success rate. It may take an extra pass or two, but the net efficiency is still better than us trying to do it in-house or using typical outsourcing.
What departments at Channel Intelligence are using Mechanical Turk?
We’re leveraging Turk in the operations group within Channel Intelligence. Our responsibility is to ensure we both run and protect our systems in order to better serve our clients. Finding efficient ways to run our business without sacrificing quality is absolutely key.
So, we leverage Turk, both to manage and maintain quality, as well as drive efficiency. We’ve talked a lot about the categorization piece, but we’re continuing to open up new capabilities. Using Turk to help with things like attribute harvesting allows us to use the best of both artificial and human intelligence. The results are better than any single solution could provide.
We definitely consider Mechanical Turk an asset that we use to grow our business. The Turkers play a large part in making sure we’re first in class in the services we provide.
We’ve seen great results with Mechanical Turk. We’re very happy.
Learn more about CI at channelintelligence.com or for questions about CI’s Mechanical Turk implementation, contact firstname.lastname@example.org.