crowdsourcing: challenges and opportunitiesecheng/di/papers/crowdsourcing... · crowdsourcing...

90
Crowdsourcing: Challenges and Opportunities Guoliang Li Tsinghua University

Upload: others

Post on 11-Jan-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Crowdsourcing: Challenges and Opportunities

Guoliang LiTsinghua University

Page 2: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Tutorial Objectives

What is crowdsourcing? How and when to use crowdsourcing? How to do experiments for crowdsourcing? What are research challenges of crowdsourcing?

Crowdsourcing @ HotDB2012 (2)

Page 3: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Tutorial Outline

Introduction Applications Platforms Challenges Opportunities

Crowdsourcing @ HotDB2012 (3)

Page 4: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

What is Crowdsourcing?

Outsourcing –外包 A known agent (an employee)

Crowdsourcing –众包 An undefined, generally large group of people via a group call The application of open source principles to fields outside of

software

Most successfully story: Wikipedia

Crowdsourcing @ HotDB2012 (4)

Large Businesses

$ / Customer

# of Customers

Page 5: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Crowdsourcing Definition

Coordinating a crowd (a large group of people on the web) to do micro-work (small contributions) that solves problems (that software or one user can’t do)

A collection of mechanisms and associated methodologies for scaling and directing crowd activities to achieve goals

Crowdsourcing @ HotDB2012 (5)

volunteersocial

fun

Evolving & broadly defined

Page 6: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Example - Captcha

Captcha: 200M every day ReCaptcha: 750M to date

Crowdsourcing @ HotDB2012 (6)

Luis von Ahn, Benjamin Maurer, Colin McMillen, David Abraham and Manuel Blum. reCAPTCHA: Human-Based Character Recognition via Web Security Measures. Science, 321: 1465-1468, 2008

Page 7: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

NLP Example – Machine Translation

Machine Translation Problem:

Manual evaluation on translation quality is slow and expensive

Crowdsourcing: Low cost of non-experts, $0.10 to translate a sentence High agreement/equivalent quality between non-experts and experts Complex tasks like human-mediated translation edit rate

C. Callison-Burch. “Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk”, EMNLP 2009.

B. Bederson et al. Translation by Iteractive Collaboration between Monolingual Users, GI 2010

Crowdsourcing @ HotDB2012 (7)

Page 8: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

IR Example – Image Search

Crowdsourcing @ HotDB2012 (8)

Tingxin Yan, Vikas Kumar, Deepak Ganesan: CrowdSearch: exploiting crowds for accurate real-time image search on mobile phones. MobiSys 2010:77-90

Page 9: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

IR Example - Relevance and ads

Crowdsourcing @ HotDB2012 (9)

Omar Alonso, Daniel E. Rose, Benjamin Stewart: Crowdsourcing for relevance evaluation. SIGIR Forum (SIGIR) 42(2):9-15 (2008)

Page 10: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

CV Example - Painting Similarity

Crowdsourcing @ HotDB2012 (10)

How similar is the artistic style in the paintings above? Very similar Similar Somewhat dissimilar Very dissimilar

Human and Machine Detection of Stylistic Similarity in Art. Adriana Kovashka and Matthew Lease. CrowdConf 2010

Page 11: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Audio Example Mobile service that aids

blind users with “visual questions” in near-realtime

An iPhone application

Crowdsourcing @ HotDB2012 (11)

Jeffrey P. Bigham, ChandrikaJayant, Hanjie Ji, Greg Little, AndrewMiller, Robert C. Miller, RobinMiller, Aubrey Tatarowicz, BrandynWhite, Samuel White, Tom Yeh.VizWiz: nearly real-time answers tovisual questions. UIST 2010 (BestPaper Award)

http://www.cs.rochester.edu/u/jbigham/vizwiz/video/

Page 12: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

DB Example - CrowdDB

Use crowd to answer DB queries Where to use crowd? How to use crowd? How to support SQL? How to devise a system? Quality?

Crowdsourcing @ HotDB2012 (12)

Michael J. Franklin, Donald Kossmann, Tim Kraska, Sukriti Ramesh, Reynold Xin: CrowdDB: answering queries with crowdsourcing. SIGMOD 2011:61-72

Page 13: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Crowdsourcing Overview

Worker People work on tasks

Crowdsourcing @ HotDB2012 (13)

Requester People submit tasks

Platforms Task management

Submit tasks

Publish tasks

Find interesting tasks

Collect answers

Return answers

Platforms

Page 14: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Crowdsourcing vs Human Computation Human Computation Design a solution using both automated

computers and human computers Maybe a closed set of workers Crowdsourcing greatly facilitates

human computation

Social computing Social behavior

Collective intelligence May not human

Crowdsourcing @ HotDB2012 (14)

Alexander J. Quinn, Benjamin B. Bederson: Human computation: a survey and taxonomy of a growing field. CHI 2011:1403-1412

Page 15: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

A Growing Field - Tutorials WWW 2011: Managing Crowdsourced Human Computation

VLDB 2011: Crowdsourcing Applications and Platforms

SIGIR 2011: Crowdsourcing for Information Retrieval: : Principles, Methods and Applications

AAAI 2011: Human Computation: Core Research Questions and State of the Art

WSDM 2011: Crowdsourcing 101: Putting the WSDM of Crowds to Work for You

CVPR 2010: Mechanical Turk for Computer Vision

ECIR 2010: Crowdsourcing for Relevance Evaluation

HCIC 2011: Quality Crowdsourcing for Human Computer Interaction Research

CrowdConf 2011: Crowdsourcing for Fun and Profit

Crowdsourcing @ HotDB2012 (15)

Page 16: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

A Growing Field - Workshops KDD 2009: 1st Human Computation Workshop - HCOMP 2009

KDD 2010 : 2nd Human Computation Workshop - HCOMP 2010

AAAI 2011 : 3rd Human Computation Workshop - HCOMP 2011

AAAI 2012 : 4th Human Computation Workshop - HCOMP 2012

SIGIR 2010 : Crowdsourcing for Search Evaluation

SIGIR 2011: Crowdsourcing for Information Retrieval

CVPR 2010 : Advancing Computer Vision with Humans in the Loop

NAACL 2010: Creating Speech and Language Data with Amazon’s Mechanical Turk

NIPS 2010 : Computational Social Science and Wisdom of the Crowds

Ubicomp 2010 : Workshop on Ubiquitous Crowdsourincg

WSDM 2010: Crowdsourcing for Search and Data Mining Workshop

ICWE 2010 : Enterprise Crowdsourcing Workshop

CHI 2011: Workshop on Crowdsourcing and Human Computation

AMTA 2010: Collaborative Translation Technology, Crowdsourcing and the Translator

EC 2011: Workshop on Social Computing and User Generated Content

Crowdsourcing @ HotDB2012 (16)

Page 17: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Many Related Areas

Human–Computer Interaction (HCI) Artificial Intelligence (AI) Machine Learning (ML) Information Retrieval (IR) Crowd Management (DB) Social Science Theory Statistics

Crowdsourcing @ HotDB2012 (17)

DB

Social Science

Page 18: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Landscape

Crowdsourcing @ HotDB2012 (18)

Page 19: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Crowdsourcing @ HotDB2012 (19)

Platforms

Crowdsourcing

Are thereany platforms?

Page 20: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Amazon Mechanical Turk (Mturk) - www.mturk.com

Crowdsourcing @ HotDB2012(20)

highly-available, cheap, programmable, a prototyping platform for crowd computing

Page 21: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Tasks

What are tasks?

Crowdsourcing @ HotDB2012 (21)

Page 22: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Why Micro-tasks

Cheap, Easy, and Fast Ready to use infrastructures Payments, workforce, interface widgets

Allow early, iterative, frequent trials Test new ideas

Many successful examples Image search reCaptcha

Crowdsourcing @ HotDB2012 (22)

Page 23: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Human Intelligence Tasks (HITs) Human Intelligence Tasks – micro tasks

Requesters create (HITs) web services API/Dashboard assess results, pay per HIT satisfactorily completed.

Workers (sometimes called “Turkers”) log in, choose HITs, perform them.

Currently >200,000 workers from 100 countries Millions of HITs completed

Crowdsourcing @ HotDB2012 (23)

Page 24: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Requester & Worker

Crowdsourcing @ HotDB2012 (24)

Build HIT Test HIT Post HIT

Search for HITs

Accept HIT Do work

Submit HIT

Reject or approve Requester

Worker

Page 25: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

The Requester

Sign up with your Amazon account Amazon payments Purchase prepaid HITs There is no minimum or up-front fee AMT collects a 10% commission The minimum commission charge is $0.005 per HIT Approve/reject answers

Crowdsourcing @ HotDB2012 (25)

Page 26: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Dashboard

Three tabs Design Publish Manage

Design HIT Template

Publish Make work available

Manage Monitor progress

Crowdsourcing @ HotDB2012 (26)

Page 27: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Amazon Web Services API Rich set of services Command line tools More flexibility than dashboard

Crowdsourcing @ HotDB2012 (27)

CreateHIT (Requirements, Pay rate, Description) – returns HIT Id and HIT Type Id

SubmitAssignment (AssignmentId) – notifies Amazon that this assignment has been completed

ApproveAssignment (AssignmentID) – Requester accepts assignment, money is transferred, also RejectAssignment

GrantBonus (WorkerID, Amount, Message) – Give the worker the specified bonus and sends message, should have a failsafe

NotifyWorkers (list of WorkerIds, Message) – e-mails message to the workers.

Page 28: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Dashboard vs API

Dashboard Easy to prototype Setup and launch an experiment in a few minutes

API Ability to integrate AMT as part of a system Ideal if you want to run experiments regularly Schedule tasks

Crowdsourcing @ HotDB2012 (28)

Page 29: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

The Worker Sign up with your Amazon account Tabs Account: work approved/rejected HIT: browse and search for work Qualifications: browse and search for qualifications

Crowdsourcing @ HotDB2012 (29)

Page 30: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Why Do Work on MTurk?

Money ($$$) Fun (or avoid boredom) Socialize Earn acclaim/prestige Altruism Learn something new (e.g. English) Unintended by-product (e.g. re-Captcha) Create self-serving resource (e.g. Wikipedia) Multiple incentives are typically at work in parallel

Crowdsourcing @ HotDB2012 (30)

Page 31: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Who Are My Workers?

2008-2009 studies found less global and diverse than previously thought 47% US, 34% India, 19% others

Female Educated Bored Money is secondary

Crowdsourcing @ HotDB2012 (31)

Page 32: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Survey on Workers

“Mturk money is always necessary to make ends meet.” 5% U.S. 13% India

“Mturk money is irrelevant.” 12% U.S. 10% India

“Mturk is a fruitful way to spend free time and get some cash.” 69% U.S. 59% India

Crowdsourcing @ HotDB2012 (32)

Page 33: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Mturk Advantages More participants More diverse participants High speed Low cost Speed of experimentation Diversity

Crowdsourcing @ HotDB2012 (33)

Crowdsourcing != MTurk

Disadvantages/Limitations Lower quality feedback Less interaction Greater need for quality control Less focused user groups No control of users’ environment Not designed for user studies Spam - Uncertainty about user

demographics, expertise Lots of problems and missing features

Page 34: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Pay-based Marketplaces / Vendors

Amazon Mechanical Turk (since 2005, www.mturk.com) Crowdflower (since 2007, www.crowdflower.com) CloudCrowd (www.cloudcrowd.com/) DoMyStuff (www.domystuff.com/) Livework (https://www.livework.com/) Clickworker (www.clickworker.com/) SmartSheet (www.smartsheet.com/crowdsourcing) uTest (www.utest.com/) Elance (www.elance.com/) oDesk (www.odesk.com/) vWorker (www.vworker.com/)

Crowdsourcing @ HotDB2012 (34)

Page 35: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Microtask Aggregators

Crowdsourcing @ HotDB2012 (35)

Page 36: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Samasource.org

Crowdsourcing @ HotDB2012 (36)

How and when to use crowdsourcing?

Page 37: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

How and when to use crowdsourcing?

Page 38: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

When to use crowdsourcing

Computers cannot do A single person cannot do The work can be split into small tasks

Crowdsourcing @ HotDB2012 (38)

Page 39: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

How to Do Experiments in Crowdsourcing

Experimental Design Choose crowdsourcing platform Decompose your tasks into micro tasks Publish your tasks and wait for answers Aggregate workers’ answers

Crowdsourcing @ HotDB2012 (39)

Page 40: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Tutorial Outline

Introduction Applications Platforms Challenges Opportunities

Crowdsourcing @ HotDB2012 (40)

Page 41: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Research Challenges in Crowdsourcing

Task Management Task assignment, payment, discover

Human–Computer Interaction Payment / incentives, interface and interaction design,

communication, reputation, recruitment, retention

Quality Control / Data Quality Trust, reliability, spam detection, consensus labeling

Human-Processing Unit (HPU) and CPU How to combine

Scalability Large scale data

Crowdsourcing @ HotDB2012 (41)

Page 42: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Task Management

How to decompose a complex task? How to find tasks for workers? How much is the payment of a HIT?

Crowdsourcing @ HotDB2012 (42)

Page 43: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Supporting Complex Tasks

Mturk works only for small tasks How to support complex tasks? Task decomposition – large tasks are divided into small problems Job distributed among multiple workers Collect all answers and combine them Verifying performance of heterogeneous CPUs and HPUs

Crowdsourcing @ HotDB2012 (43)

Page 44: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

CrowdForge - MapReduce framework for crowds

Aniket Kittur, Boris Smus, Robert Kraut: CrowdForge : crowdsourcing complex work. CHI Extended Abstracts 2011:1801-1806

Crowdsourcing @ HotDB2012 (44)

Page 45: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

How to Discover Tasks?

Task discovery is very important. Heavy tailed distribution of completion times.

Crowdsourcing @ HotDB2012 (45)

Page 46: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Task Assignment

Push Methods:system ➠ workers The system takes complete control over who is assigned which

task. Worker expertise recording for task assignment (employer/task

finds worker)

Pull Methods:workers ➠ system The system merely sets up the environment to allow workers to

assign themselves (or each other) tasks. Task organization for task discovery (worker finds

employer/task)

Crowdsourcing @ HotDB2012 (46)

Page 47: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Task Recommendation

Content-Based recommendation find similarities between worker profile and task

characteristics.

Collaborative Filtering make use of preference information about tasks (e.g., ratings)

to infer similarities between workers.

Hybrid a mix of content-based and collaborative filtering methods.

Crowdsourcing @ HotDB2012 (47)

Page 48: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Task Payments How much is a HIT? Delicate balance Too little, no interest Too much, attract spammers Paying a lot is a counter‐incentive Money does not improve quality but (generally) increase

participation Payment based on user effort- Bonus Example: $0.04 (2 cents to answer a yes/no question, 2 cents if you

provide feedback that is not mandatory)

Winter A. Mason, Duncan J. Watts: Financial incentives and the "performance of crowds". SIGKDD Explorations (SIGKDD) 11(2):100-108 (2009)

Crowdsourcing @ HotDB2012 (48)

Page 49: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Optimization Goal

3 main goals for a task to be done: Minimize Cost (cheap) Minimize Completion Time (fast) Maximize Quality (good)

Many optimization problems to tradeoff the three goals

Crowdsourcing @ HotDB2012 (49)

Page 50: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Human-Assisted Graph Search/Classification Given a DAG G, Containing unknown target nodes Find target nodes by asking humans search

queries at nodes in G “is there a target node reachable from the

current node?”

Applications Classification, workflow debugging, interactive

search

Crowdsourcing @ HotDB2012 (50)

Aditya G. Parameswaran, Anish Das Sarma, Hector Garcia-Molina, Neoklis Polyzotis, Jennifer Widom: Human-assisted graph search: it's okay to ask questions. PVLDB 4(5):267-278 (2011)

Page 51: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

CrowdGraphSearch - Application

Classify an image

Crowdsourcing @ HotDB2012 (51)

Page 52: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

CrowdGraphSearch - Optimization

Do not ask all nodes or serially Too expensive or too slow

Given a limit of k questions, find the best nodes to ask in parallel Minimize the size To classify into huge taxonomy, first ask k

question Is the superset small enough?

Crowdsourcing @ HotDB2012 (52)

Page 53: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

CrowdSearch: Using crowd to improve image search

How to ensure a high enough accuracy, say over 95%?

Crowdsourcing @ HotDB2012 (53)

Tingxin Yan, Vikas Kumar, Deepak Ganesan: CrowdSearch: exploiting crowds for accurate real-time image search on mobile phones. MobiSys2010:77-90

Page 54: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

CrowdSearch - Optimization

To minimize money Increase delay

To minimize delay Increase money

Goal: return one validate image before deadline, while minimizing the money

Crowdsourcing @ HotDB2012 (54)

Page 55: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

CrowdSearch - Optimization

Crowdsourcing @ HotDB2012 (55)

Page 56: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

CrowdSearch - Optimization

Crowdsourcing @ HotDB2012 (56)

Page 57: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Quality - Example Get people to look at sites and classify them as: G(general audience) PG (parental guidance) R (restricted) X (porn)

Crowdsourcing @ HotDB2012 (57)

Page 58: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Quality Control

Quality of workers’ answers is extremely important part of the experiment

Approach it as “overall” quality – not just for workers Bi-directional channel You may think the worker is doing a bad job. The same worker may think you are a lousy requester.

Crowdsourcing @ HotDB2012 (58)

Page 59: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Methods for Measuring Agreement What to look for Agreement, reliability, validity

Beforehand Qualification test Screening, selection, recruiting, training

During Accesses labels as workers produce them Reward, penalize, weight

After Accuracy metrics Filter, weight

Crowdsourcing @ HotDB2012 (59)

Page 60: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Qualification Tests: Pros and Cons Qualification test create questions on topics so user gets familiar before starting

assessments

Advantages Great tool for controlling quality Adjust passing grade

Disadvantages Hard to verify subjective tasks like judging relevance Slows down the experiment, difficult to “test” relevance Extra cost to design and implement the test Try creating task-related questions to get worker familiar with task

before starting task in earnest

No guarantees - Still not a guarantee of good outcome

Crowdsourcing @ HotDB2012 (60)

Page 61: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Quality Control

Majority vote 2 bad, 3 good good 3 bad, 2 good bad

Weighted majority vote Identify workers that always disagree with the majority Lower down the weight of such workers

Crowdsourcing @ HotDB2012 (61)

Page 62: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Dealing with Bad Workers

Pay for “bad” work instead of rejecting it? Pro: preserve reputation, admit if poor design at fault Con: promote fraud, undermine approval rating system

Use bonus as incentive Pay the minimum $0.01 and $0.01 for bonus Better than rejecting a $0.02 task

Worker blocking - spammer “caught”, block from future tasks May be easier to always pay, then block as needed

Crowdsourcing @ HotDB2012 (62)

Page 63: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Emails after Rejection

Crowdsourcing @ HotDB2012 (63)

Page 64: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Emails after Rejection WORKER: this is not fair , you made me work for 10 cents

and i lost my 30 minutes of time ,power and lot more and gave me 2 rejections atleast you may keep it pending. please show some respect to turkers

WORKER: I understood the problems. At that time my kid was crying and i went to look after. that's why i responded like that. I was very much worried about a hit being rejected. The real fact is that i haven't seen that instructions of 5 web page and started doing as i do the dolores labs hit, then someone called me and i went to attend that call. sorry for that and thanks for your kind concern.

Crowdsourcing @ HotDB2012 (64)

Page 65: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Gold Sets / Honey Pots

Gold derived from Experts Crowd using high quorum

Interject trap questions Block users in trap and invalidate answers Pros Often very effective Cost efficient

Cons Not always applicable Digging gold is hard

Crowdsourcing @ HotDB2012 (65)

Page 66: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Crowdsourcing @ HotDB2012 (66)

Page 67: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Quality Control As a worker I hate when instructions are not clear I’m not a spammer – I just don’t get what you want A good pay is ideal but not the only condition for engagement

As a requester A task that would produce the right results and is appealing to

workers I want your honest answer for the task I want qualified workers and I want the system to do some of

that for me

Managing crowds and tasks is a daily activity and more difficult than managing computers

Crowdsourcing @ HotDB2012 (67)

Page 68: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Human-Computer Interaction (HCI)

Getting input from users is important in HCI surveys rapid prototyping usability tests cognitive walkthroughs performance measures quantitative ratings

Crowdsourcing @ HotDB2012 (68)

Page 69: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

HCI - The UI hurts the market!

Practitioners know that HITs in 3rd page and after, are not picked by workers.

Many such HITs are left to expire after months, never completed.

Badly designed task discovery interface hurts every participant in the market!

Better modeling as a queuing system may demonstrate other such improvements

Crowdsourcing @ HotDB2012 (69)

Page 70: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

HCI

Survey design Questionnaire design Right questions Examples

UI design Implementation Generic tips

Experiment should be self-contained. Keep it short and simple. Brief and concise. Be very clear with the relevance task. Engage with the worker. Avoid boring stuff. Always ask for feedback (open-ended question) in an input box.

Crowdsourcing @ HotDB2012 (70)

Page 71: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Other Design Principles

Text alignment Legibility Reading level: complexity of words and sentences Attractiveness (worker’s attention & enjoyment) Multi-cultural / multi-lingual Who is the audience (e.g. target worker community) Special needs communities (e.g. simple color blindness)

Crowdsourcing @ HotDB2012 (71)

Page 72: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

System Design

Crowdsourcing @ HotDB2012 (72)

Page 73: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

DB & Crowd

How can crowd help databases? Fix broken data Add missing data Subjective comparison

How can DB help crowd apps? Lazy data acquisition Game the workers market Semi-automatically create user interfaces Manage the data sourced from the crowd

Crowdsourcing @ HotDB2012 (73)

Page 74: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

CrowdDB

Use crowd to answer DB queries Find missing data Make subjective comparison

Recognize patterns Main operations Join Sort

Crowdsourcing @ HotDB2012 (74)

Page 75: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Research Challenges

Data Model Uncertainty User view vs System view How to get data?

Query processing User-defined functions (UDF) CrowdSQL

Crowdsourcing @ HotDB2012 (75)

Page 76: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Example DB Systems

CrowdDB (Berkeley) Qurk (MIT) Scoop (Stanford) Hlog (Wiscosin) Freebase (Google)

Crowdsourcing @ HotDB2012 (76)

Page 77: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Tutorial Outline

Introduction Applications Platforms Challenges Opportunities

Crowdsourcing @ HotDB2012 (77)

Page 78: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Opportunities

Problems with the current platform Very rudimentary No tools for data analysis No integration with databases Very limited search and browse features

Opportunities What is the database model for crowdsourcing? MapReduce with crowdsourcing Can you integrate human-computation into a language? Task management

Crowdsourcing @ HotDB2012 (78)

Page 79: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Opportunities

Quality control Human factors vs. outcomes Pricing tasks Predicting worker quality from observable properties (e.g.

task completion time) HIT / Requestor ranking or recommendation Expert search : who are the right workers given task nature

and constraints Privacy Workers and requesters Tasks

Crowdsourcing @ HotDB2012 (79)

Page 80: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Opportunities Crowdsourcing is cheap but not free, and cannot scale to web

without help How to scale out? Improve the accuracy and efficiency of human computation algorithms. Indexing & Pruning techniques

Dealing with uncertainty Temporal and labeling uncertainty Learning algorithms Search evaluation

Combining CPU + HPU MapReduce with human computation? Integration points with enterprise systems

Crowdsourcing @ HotDB2012 (80)

Page 81: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

Conclusion

Crowdsourcing for relevance evaluation works Fast turnaround, easy to experiment, cheap Still have to design the experiments carefully! Quality Worker quality User feedback extremely useful Usability considerations

Platform MTurk is a popular platform and others are emerging Lots of opportunities to improve current platforms Scale out

Crowdsourcing @ HotDB2012 (81)

Page 82: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

References - Tutorials AnHai Doan, Michael J. Franklin, Donald Kossmann, Tim Kraska: Crowdsourcing Applications

and Platforms: A Data Management Perspective. PVLDB 4(12):1508-1509 (2011)

Omar Alonso, Matthew Lease: Crowdsourcing for information retrieval: principles, methods, and applications. SIGIR 2011:1299-1300

Omar Alonso, Matthew Lease: Crowdsourcing 101: putting the WSDM of crowds to work for you. WSDM 2011:1-2

Panagiotis G. Ipeirotis, Praveen K. Paritosh: Managing crowdsourced human computation: a tutorial. WWW (Companion Volume) 2011:287-288

Omar Alonso: Crowdsourcing for Information Retrieval Experimentation and Evaluation. CLEF 2011:2

This tutorial (for research purpose only) used some slides in the above tutorials.

Thank the authors.

Crowdsourcing @ HotDB2012 (82)

Page 83: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

References - Survey Papers

Man-ChingYuen, Irwin King, Kwong-Sak Leung: A Survey of Crowdsourcing Systems. SocialCom/PASSAT 2011:766-773

Alexander J. Quinn, Benjamin B. Bederson: Human computation: a survey and taxonomy of a growing field. CHI 2011:1403-1412

Rajarshi Das, MajaVukovic. Emerging theories and models of human computation systems: a brief survey. UbiCrowd2011.

AnHai Doan, Raghu Ramakrishnan, AlonY. Halevy: Crowdsourcing systems on the World-Wide Web. Commun. ACM (CACM) 54(4):86-96 (2011)

Crowdsourcing @ HotDB2012 (83)

Page 84: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

References - DB Adam Marcus, Eugene Wu, Samuel Madden, Robert C. Miller: Crowdsourced

Databases: Query Processing with People. CIDR 2011:211-214

Aditya G. Parameswaran, Neoklis Polyzotis: Answering Queries using Humans, Algorithms and Databases. CIDR 2011:160-166

Salil S. Kanhere: Participatory Sensing: Crowdsourcing Data from Mobile Smartphones in Urban Spaces. Mobile Data Management 2011:3-6

Michael J. Franklin, Donald Kossmann, Tim Kraska, Sukriti Ramesh, Reynold Xin: CrowdDB: answering queries with crowdsourcing. SIGMOD 2011:61-72

Adam Marcus, Eugene Wu, David R. Karger, Samuel Madden, Robert C. Miller: Human-powered Sorts and Joins. PVLDB 5(1):13-24 (2011)

Aditya G. Parameswaran, Anish Das Sarma, Hector Garcia-Molina, NeoklisPolyzotis, Jennifer Widom: Human-assisted graph search: it's okay to ask questions. PVLDB 4(5):267-278 (2011)

Kurt D. Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, Jamie Taylor: Freebase: a collaboratively created graph database for structuring human knowledge. SIGMOD 2008:1247-1250

Crowdsourcing @ HotDB2012 (84)

Page 85: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

References - HCI Giordano Koch, Johann Füller, Sabine Brunswicker: Online Crowdsourcing in

the Public Sector: How to Design Open Government Platforms. HCI 2011:203-212

Ido Guy, Adam Perer, Tal Daniel, Ohad Greenshpan, Itai Turbahn: Guess who?: enriching the social graph through a crowdsourcing game. CHI 2011:1373-1382

Aniket Kittur, Boris Smus, Robert Kraut: CrowdForge: crowdsourcing complex work. CHI Extended Abstracts 2011:1801-1806

Yasuaki Sakamoto, Yuko Tanaka, LixiuYu, Jeffrey V. Nickerson: The Crowdsourcing Design Space. HCI 2011:346-355

Jon Noronha, Eric Hysen, Haoqi Zhang, Krzysztof Z. Gajos: Platemate: crowdsourcing nutritional analysis from food photographs. UIST 2011:1-12

Aniket Kittur, Boris Smus, Susheel Khamkar, Robert E. Kraut: CrowdForge: crowdsourcing complex work. UIST 2011:43-52

Jeffrey Heer, Michael Bostock: Crowdsourcing graphical perception: using mechanical turk to assess visualization design. CHI 2010:203-212

Crowdsourcing @ HotDB2012 (85)

Page 86: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

References - NLP

Omar Zaidan, Chris Callison-Burch: Crowdsourcing Translation: Professional Quality from Non-Professionals. ACL 2011:1220-1229

Derya Ozkan, Louis-Philippe Morency: Modeling Wisdom of Crowds Using Latent Mixture of Discriminative Experts. ACL (Short Papers) 2011:335-340

Nitin Madnani, Martin Chodorow, Joel R. Tetreault, Alla Rozovskaya: They Can Help: Using Crowdsourcing to Improve the Evaluation of Grammatical Error Detection Systems. ACL (Short Papers) 2011:508-513

Keith Vertanen, Per Ola Kristensson: The Imagination of Crowds: Conversational AAC Language Modeling using Crowdsourcing and Large Data Sources. EMNLP 2011:700-711

Matteo Negri, Luisa Bentivogli, Yashar Mehdad, Danilo Giampiccolo, Alessandro Marchetti: Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora. EMNLP 2011:670-679

Crowdsourcing @ HotDB2012 (86)

Page 87: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

References - AI

Yan Yan, Rómer Rosales, Glenn Fung, Jennifer G. Dy: Active Learning from Crowds. ICML 2011:1161-1168

Peng Dai, Mausam, Daniel S. Weld: Decision-Theoretic Control of Crowd-Sourced Workflows. AAAI 2010

Peter Welinder, Steve Branson, Serge Belongie, Pietro Perona: The Multidimensional Wisdom of Crowds. NIPS 2010:2424-2432

Yen-ling Kuo, Jane Yung-jen Hsu: Resource-Bounded Crowd-Sourcing of Commonsense Knowledge. IJCAI 2011:2470-2475

Edith Law, Haoqi Zhang: Towards Large-Scale Collaborative Planning: Answering High-Level Search Queries Using Human Computation. AAAI 2011

Crowdsourcing @ HotDB2012 (87)

Page 88: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

References - IR Gabriella Kazai: In Search of Quality in Crowdsourcing for Search Engine Evaluation. ECIR

2011:165-176

Omar Alonso, Ricardo A. Baeza-Yates: Design and Implementation of Relevance Assessments Using Crowdsourcing. ECIR 2011:153-164

Roi Blanco, Harry Halpin, Daniel M. Herzig, Peter Mika, Jeffrey Pound, Henry S. Thompson, DucThanh Tran: Repeatable and reliable search system evaluation using crowdsourcing. SIGIR 2011:923-932

Matthew Lease, Vitor R. Carvalho, EmineYilmaz: Crowdsourcing for search and data mining. SIGIR Forum (SIGIR) 45(1):18-24 (2011)

Krishna Yeswanth Kamath, James Caverlee: Transient crowd discovery on the real-time social web. WSDM 2011:585-594

Omar Alonso, Ralf Schenkel, Martin Theobald: Crowdsourcing Assessments for XML Ranked Retrieval. ECIR 2010:602-606

Hao Ma, Raman Chandrasekar, Chris Quirk, Abhishek Gupta: Improving search engines using human computation games. CIKM 2009:275-284

Gabriella Kazai, Jaap Kamps, Marijn Koolen, Natasa Milic-Frayling: Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking. SIGIR 2011:205-214

Crowdsourcing @ HotDB2012 (88)

Page 89: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work

References - Theory

Shuchi Chawla, Jason D. Hartline, Balasubramanian Sivan: Optimal crowdsourcing contests. SODA 2012:856-868

Crowdsourcing @ HotDB2012 (89)

Page 90: Crowdsourcing: Challenges and Opportunitiesecheng/di/papers/Crowdsourcing... · Crowdsourcing Definition Coordinating a crowd (a large group of people on the web) to do micro-work