crowdsourcing beyond mechanical turk: building crowdmining services for your own research
Post on 14-Sep-2014
1.672 views
Embed Size (px)
DESCRIPTION
The keynote talk at CrowdKDD 2012 http://www.cse.ust.hk/~nliu/crowdkdd12/TRANSCRIPT
Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research
Crowdsourcing beyond
Kuan-Ta Chen
Institute of Information Science Academia Sinica
Building Crowdmining Services for Your Own Research
CrowdKDD12 Aug 12, 2012
What Im going to talk
Crowdsourcing?
Crowdsourcing + Data Mining Research?
Common Fallacies of CS4DM Research
Pomics: A Crowdmining Service
Conclusion
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 3
Crowdsourcing = Crowd + Outsourcing
soliciting solutions via open calls to large-scale communities
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 4
A more formal definition
Crowdsourcing is the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call. [1]
[1] Howe, Jeff. Crowdsourcing: A Definition, http://crowdsourcing.typepad.com/
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 5
What Can Crowdsourcing Do?
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 7
Brand Tagging
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 8
Data Entry
Reward: 4.4 USD/hour
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 9
General Questions
Reward: points on Yahoo! Answers
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 10
When crowdsourcing meets data mining
Crowdsourcing Data mining
Whats in here?
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 11
Crowdsourcing for Data Mining: Issues
Purposes Annotation (ground-truth generation) Evaluation Retrieval Human-in-the-loop computation
Methodologies Recruiting Incentives Task Design Workflow Learning from crowd Quality control Cheat detection
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 12
Crowdsourcing Uses in Data Mining Research
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 13
Image Semantics
Reward: 0.04 USD / task
main theme? key objects?
unique attributes?
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 14
0.02 USD/ task
find out photos of revolvers!
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 15
0.01 USD/ task
Human Skeleton
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 17
0.01 USD/ task
Photo Orientation
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 18
Perspectives for 3D Objects
Thi Phuong Nghiem, Axel Carlier, Geraldine Morin, and Vincent Charvillat, "Enhancing online 3D products through crowdsourcing," ACM CrowdMM'12.
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 19
Web Site Classifier
12 USD / hour Panos Ipeirotis, Crowdsourcing using Mechanical Turk: Quality Management and Scalability, Invited Talk at CSDM 2011.
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 20
Photographers Intention to support a task? to capture a bad feeling? to preserve a good feeling? to recall later on? to publish it online? to show it to friends and family?
Mathias Lux, Mario Taschwer, and Oge Marques, A Closer Look at Photographers Intentions: a Test Dataset, ACM CrowdMM12.
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 22
Linguistic Affective Judgement
Affective response (Snow et al. 2008)
USD 0.4 to label 20 headlines (140 labels)
Closing and cancellations top advice on flu outbreak
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 24
A Lot More Examples
Document relevance evaluation Alonso et al. (2008)
Document rating collection Kittur et al. (2008)
Noun compound paraphrasing Nakov (2008)
Person name resolution Su et al. (2007)
And so on...
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 25
THE COMMON FALLACIES -- EXPERIENCES FROM CROWDMM12
Thanks to CrowdMM12 co-organizers: Wei-Tsang Ooi, Martha Larson, and Wei-Ta Chu; also thanks to Crowdsourcing for Multimedia SI co-guest-editors Paul Bennent and Matt Lease.
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 26
Common Fallacies #1
Crowdsourcing is NOT JUST conducting user studies
Crowd is uncontrollable with tasks performed in uncontrolled conditions
How to manage the crowd?
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 27
Common Fallacies #2
Crowdsourcing is NOT JUST analyzing user-generated content
Cope with the noise in UGC rather than only the information.
How to manage the imperfectness & diversity in UGC?
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 28
Common Fallacies #2
Crowdsourcing is NOT JUST analyzing user-generated content
Put the task element in the loop
Re-purposing the creation of UGC as your own microtasks
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 29
Common Fallacies #3
Crowdsourcing is NOT JUST posting tasks on Mechanical Turk
Explicit Crowdsourcing Implicit Crowdsourcing
Piggyback Crowdsourcing
Doan et al, "Crowdsourcing systems on the World-Wide Web," CACM, vol 54, no 4, 2011.
An implicit crowdmining platform for multimedia content
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 31
Crowdsourcing for Data Mining: Issues
Purposes Annotation (ground-truth generation) Evaluation Retrieval Human-in-the-loop computation
Methodologies Recruiting Incentives Task Design Workflow Learning from crowd Quality control Cheat detection
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 32
The Era of Too Many Photos People today use pictures to write down their daily experience (with the prevalence of digital cameras)
How to Share Photos?
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 34
3 Common Ways
Photo browsing Photo/video slideshow Illustrated text
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 35
Photo Browsing
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 36
Photo/Video slideshow
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 37
Illustrated Text
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 38
A MISSING PIECE
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 39
Comics
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 40
Photo Comics Baby Born
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 41
Photo Comics Birthday Party
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 42
Photo Comics Daily Fun
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 43
Media Comparison Creation
Cost Viewer
Req. Viewer Control
Richness Port-
ability
Photo browsing
Low Low High Low Low
Slideshow Medium Low Low Medium Low
IllustratedText
High High High High High
Comic High Low High High High How to lower it?
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 44
Comic Making Cartoonists Way
http://www.pomics.net
Goal of Pomics
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 47
Pomics = Picture to Comics
47
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 48
Computer-Aided Storytelling
Picture
Location Timing Analysis
Aesthetics Analysis Semantics Analysis
User Preference
Own rating Popularity
Auto Storytelling
Automated
Adjustment
Machine Learning
Draft Story
User Editing
Final Story
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 49
Technical Challenges #1
Semantics Analysis Human recognition Emotion recognition Behavior recognition Object recognition Location identification Natural language processing
Aesthetics Analysis Exposure Composition
Timing Analysis Contextual Analysis
49
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 50
Technical Challenges #2 Automatic Storytelling Significant photo selection Paginating and page layouting Narrative design
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 51
Publish & share
Pomics as a Social Service
Web albums
Web resources
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 52
Live Demo
52
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 55
HOW IS RELATED TO CROWDSOURCING?
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 56
USERS ARE IMPLICITLY DOING IMAGE ANNOTATION AND EVALUATION
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 57
What pictures are used?
Why the 3 pictures were used?
Aesthetics information
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 58
Wizard Interface
Aesthetics information
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 59
The Page Layout
Semantics
Saliency info
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 60
Usage Statistics of Pomics (since July 15 2012)
352 authors 434 comic books
4,362 frames 4,332 images used 1,057 image annotations 3,789 text balloons
3000+ shares on Facebook
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 61
WHAT WE HAVE GATHERED SO FAR?
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 62
Picture Aesthetics Info
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 63
Picture Aesthetics (cont.)
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 64
Picture Saliency Info
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 65
Picture Semantics
Love / Like / Dear Happy Sleepy / sleeping Tears Wearing a hat NO!
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 66
Can Pomics Do Micro-tasks?
The answer is YES! Users were asked to create comics using a specific album Rewarded by 200 MB quota if their books are shared by 20+ FB users
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 67
Picture Aesthetics from Microtasks
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 68
Picture Saliency from Microtasks
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 69
Crowdmining Services
Advantages No or little hiring cost once right incentives are given Easily scale up Can change the game rules to fit to research
Disadvantages
High development cost Less flexible Hard to find the right incentives (besides money)
Conclusion
Crowdmining is a potential and exciting area Crowdsourcing != Mechanical Turking A lot more can be done with crowdmining services
Building your own crowdmining service
today!
CrowdKDD12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 71
CrowdMM 2012
Keynote: Prof. Masataka Goto (AIST, Japan) 11 oral+poster presentations
Annotation, Evaluation, Novel applications
An industrial panel discussion Welcome to join us!
(in conjunction with ACM Multimedia 2012)
http://crowdmm.org/
Kuan-Ta Chen Academia Sinica
Unleash the power of
Crowd!
Thank You!
http://www.iis.sinica.edu.tw/~swc
Crowdsourcing beyond What Im going to talk 3A more formal definitionWhat Can Crowdsourcing Do?Brand TaggingData EntryGeneral QuestionsWhen crowdsourcing meets data miningCrowdsourcing for Data Mining: IssuesCrowdsourcing Usesin Data Mining ResearchImage Semantics 14 15 17Perspectives for 3D ObjectsWeb Site ClassifierPhotographers IntentionLinguistic Affective JudgementA Lot More ExamplesThe common fallacies-- experiences from crowdmm12Common Fallacies #1Common Fallacies #2Common Fallacies #2Common Fallacies #3 30Crowdsourcing for Data Mining: IssuesThe Era of Too Many PhotosHow to Share Photos?3 Common WaysPhoto BrowsingPhoto/Video slideshowIllustrated TextA Missing PieceComicsPhoto Comics Baby BornPhoto Comics Birthday PartyPhoto Comics Daily FunMedia ComparisonComic Making Cartoonists Way 45Goal of PomicsPomics = Picture to ComicsComputer-Aided StorytellingTechnical Challenges #1Technical Challenges #2Pomics as a Social ServiceLive DemoHow is related to crowdsourcing?users are implicitly doing image annotation and evaluationWhat pictures are used?Wizard InterfaceThe Page LayoutUsage Statistics of Pomics (since July 15 2012)What We Have Gathered So Far?Picture Aesthetics InfoPicture Aesthetics (cont.)Picture Saliency InfoPicture SemanticsCan Pomics Do Micro-tasks?Picture Aesthetics from MicrotasksPicture Saliency from MicrotasksCrowdmining ServicesConclusionCrowdMM 2012 Thank You!