crowdsourcing beyond mechanical turk: building crowdmining services for your own research

66
Crowdsourcing beyond Kuan-Ta Chen Institute of Information Science Academia Sinica Building Crowdmining Services for Your Own Research CrowdKDD’12 Aug 12, 2012

Post on 14-Sep-2014

1.677 views

Category:

Education


4 download

DESCRIPTION

The keynote talk at CrowdKDD 2012 http://www.cse.ust.hk/~nliu/crowdkdd12/

TRANSCRIPT

Crowdsourcing beyond …

Kuan-Ta Chen

Institute of Information Science Academia Sinica

Building Crowdmining Services for Your Own Research

CrowdKDD’12 Aug 12, 2012

What I’m going to talk

Crowdsourcing?

Crowdsourcing + Data Mining Research?

Common Fallacies of CS4DM Research

Pomics: A Crowdmining Service

Conclusion

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 3

Crowdsourcing = Crowd + Outsourcing

“soliciting solutions via open calls to large-scale communities”

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 4

A more formal definition

“Crowdsourcing is the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call.” [1]

[1] Howe, Jeff. Crowdsourcing: A Definition, http://crowdsourcing.typepad.com/

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 5

What Can Crowdsourcing Do?

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 7

Brand Tagging

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 8

Data Entry

Reward: 4.4 USD/hour

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 9

General Questions

Reward: points on Yahoo! Answers

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 10

When crowdsourcing meets data mining…

Crowdsourcing Data mining

What’s in here?

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 11

Crowdsourcing for Data Mining: Issues

Purposes Annotation (ground-truth generation) Evaluation Retrieval Human-in-the-loop computation

Methodologies Recruiting Incentives Task Design Workflow Learning from crowd Quality control Cheat detection

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 12

Crowdsourcing Uses in Data Mining Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 13

Image Semantics

Reward: 0.04 USD / task

main theme? key objects?

unique attributes?

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 14

0.02 USD/ task

find out photos of revolvers!

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 15

0.01 USD/ task

Human Skeleton

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 17

0.01 USD/ task

Photo Orientation

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 18

Perspectives for 3D Objects

Thi Phuong Nghiem, Axel Carlier, Geraldine Morin, and Vincent Charvillat, "Enhancing online 3D products through crowdsourcing," ACM CrowdMM'12.

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 19

Web Site Classifier

12 USD / hour Panos Ipeirotis, “Crowdsourcing using Mechanical Turk: Quality Management and Scalability,” Invited Talk at CSDM 2011.

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 20

Photographers’ Intention to support a task? to capture a bad feeling? to preserve a good feeling? to recall later on? to publish it online? to show it to friends and family?

Mathias Lux, Mario Taschwer, and Oge Marques, “A Closer Look at Photographers’ Intentions: a Test Dataset,” ACM CrowdMM’12.

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 22

Linguistic Affective Judgement

Affective response (Snow et al. 2008)

USD 0.4 to label 20 headlines (140 labels)

“Closing and cancellations top advice on flu outbreak”

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 24

A Lot More Examples

Document relevance evaluation Alonso et al. (2008)

Document rating collection Kittur et al. (2008)

Noun compound paraphrasing Nakov (2008)

Person name resolution Su et al. (2007)

And so on...

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 25

THE COMMON FALLACIES -- EXPERIENCES FROM CROWDMM’12

Thanks to CrowdMM’12 co-organizers: Wei-Tsang Ooi, Martha Larson, and Wei-Ta Chu; also thanks to “Crowdsourcing for Multimedia” SI co-guest-editors Paul Bennent and Matt Lease.

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 26

Common Fallacies #1

Crowdsourcing is NOT JUST conducting user studies

Crowd is uncontrollable with tasks performed in uncontrolled conditions

How to manage the crowd?

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 27

Common Fallacies #2

Crowdsourcing is NOT JUST analyzing user-generated content

Cope with the noise in UGC rather than only the information.

How to manage the imperfectness & diversity in UGC?

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 28

Common Fallacies #2

Crowdsourcing is NOT JUST analyzing user-generated content

Put the task element in the loop

Re-purposing the creation of UGC as your own microtasks

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 29

Common Fallacies #3

Crowdsourcing is NOT JUST posting tasks on Mechanical Turk

Explicit Crowdsourcing Implicit Crowdsourcing

Piggyback Crowdsourcing

Doan et al, "Crowdsourcing systems on the World-Wide Web," CACM, vol 54, no 4, 2011.

An implicit crowdmining platform for multimedia content

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 31

Crowdsourcing for Data Mining: Issues

Purposes Annotation (ground-truth generation) Evaluation Retrieval Human-in-the-loop computation

Methodologies Recruiting Incentives Task Design Workflow Learning from crowd Quality control Cheat detection

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 32

The Era of Too Many Photos People today use pictures to write down their daily experience (with the prevalence of digital cameras)

How to Share Photos?

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 34

3 Common Ways

Photo browsing Photo/video slideshow Illustrated text

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 35

Photo Browsing

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 36

Photo/Video slideshow

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 37

Illustrated Text

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 38

A MISSING PIECE

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 39

Comics

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 40

Photo Comics – Baby Born

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 41

Photo Comics – Birthday Party

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 42

Photo Comics – Daily Fun

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 43

Media Comparison Creation

Cost Viewer

Req. Viewer Control

Richness Port-

ability

Photo browsing

Low Low High Low Low

Slideshow Medium Low Low Medium Low

IllustratedText

High High High High High

Comic High Low High High High How to lower it?

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 44

Comic Making – Cartoonist’s Way

http://www.pomics.net

Goal of Pomics

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 47

Pomics = Picture to Comics

47

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 48

Computer-Aided Storytelling

Picture

Location Timing Analysis

Aesthetics Analysis Semantics Analysis

User Preference

Own rating Popularity

Auto Storytelling

Automated

Adjustment

Machine Learning

Draft Story

User Editing

Final Story

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 49

Technical Challenges #1

Semantics Analysis Human recognition Emotion recognition Behavior recognition Object recognition Location identification Natural language processing

Aesthetics Analysis Exposure Composition

Timing Analysis Contextual Analysis

49

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 50

Technical Challenges #2 Automatic Storytelling Significant photo selection Paginating and page layouting Narrative design

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 51

Publish & share

Pomics as a Social Service

Web albums

Web resources

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 52

Live Demo

52

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 55

HOW IS RELATED TO CROWDSOURCING?

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 56

USERS ARE IMPLICITLY DOING IMAGE ANNOTATION AND EVALUATION

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 57

What pictures are used?

Why the 3 pictures were used?

Aesthetics information

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 58

Wizard Interface

Aesthetics information

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 59

The Page Layout

Semantics

Saliency info

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 60

Usage Statistics of Pomics (since July 15 2012)

352 authors 434 comic books

4,362 frames 4,332 images used 1,057 image annotations 3,789 text balloons

3000+ shares on Facebook

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 61

WHAT WE HAVE GATHERED SO FAR?

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 62

Picture Aesthetics Info

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 63

Picture Aesthetics (cont.)

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 64

Picture Saliency Info

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 65

Picture Semantics

Love / Like / Dear Happy Sleepy / sleeping Tears Wearing a hat NO!

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 66

Can Pomics Do Micro-tasks?

The answer is YES! Users were asked to create comics using a specific album Rewarded by 200 MB quota if their books are “shared” by 20+ FB users

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 67

Picture Aesthetics from Microtasks

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 68

Picture Saliency from Microtasks

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 69

Crowdmining Services

Advantages No or little hiring cost once right incentives are given Easily scale up Can change the game rules to fit to research

Disadvantages

High development cost Less flexible Hard to find the right incentives (besides money)

Conclusion

Crowdmining is a potential and exciting area Crowdsourcing != Mechanical Turking A lot more can be done with crowdmining services

Building your own crowdmining service

today!

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 71

CrowdMM 2012

Keynote: Prof. Masataka Goto (AIST, Japan) 11 oral+poster presentations

Annotation, Evaluation, Novel applications

An industrial panel discussion Welcome to join us!

(in conjunction with ACM Multimedia 2012)

http://crowdmm.org/

Kuan-Ta Chen Academia Sinica

Unleash the power of

Crowd!

Thank You!

http://www.iis.sinica.edu.tw/~swc