expert crowds
TRANSCRIPT
![Page 1: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/1.jpg)
Expert CrowdsCrowdsourcing and Human Computation
Instructor: Chris Callison-Burch
Website: crowdsourcing-class.org
Thanks to Maria Christoforaki & Panos Ipeirotis for today’s slides!
![Page 2: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/2.jpg)
Recruiting is hard
• MTurk, CrowdFlower, oDesk, or Freelancer gives us access to a lot of people
• But are they useful for specialized skills?
![Page 3: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/3.jpg)
Attracting Contributors via Online Advertising
• Panos Ipeirotis spent a sabbatical at Google, and they tasked him with finding experts to fill in their Knowledge Graph
“Wehaveabillionusers…leveragetheirknowledge…”
“Let’screateanewcrowdsourcingsystem…”
“Crowdsourceinapredictablemanner,withknowledgeableusers,withoutintroducingmonetaryrewards”
![Page 4: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/4.jpg)
KnowledgeGraph:ThingsnotStrings
![Page 5: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/5.jpg)
Stillincomplete…• “Symptomofstrepthroat”• “Sideeffectsoftreximet”• “WhoisCristianoRonaldodating”• “WhenisJayZplayinginNewYork”• “WhatisthecustomerservicenumberforGoogle”• …
![Page 6: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/6.jpg)
Quizz
![Page 7: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/7.jpg)
Calibrationvs.Collection• Calibrationquestions(knownanswer):
Evaluatingusercompetenceontopicathand• Collectionquestions(unknownanswer):
Askingquestionsforthingswedonotknow• Trustmoreanswerscomingfromcompetentusers
TradeoffLearnmoreaboutuserqualityvs.gettinganswers(technicalsolution:useaMarkovDecisionProcess)
![Page 8: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/8.jpg)
Challenges• Whywouldanyonecomeandplaythisgame?
• Whywouldknowledgeableuserscome?• Wouldn’titbesimplertojustpay?
![Page 9: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/9.jpg)
AttractingVisitors:AdCampaigns
![Page 10: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/10.jpg)
RunningAdCampaigns:Objectives
• Wewanttoattractgoodusers,notjustclicks
• Wedonotwanttothinkhardaboutkeywordselection,appropriateadtext,etc.
• Wewantautomationacrossthousandsoftopics (fromtreatmentsideeffectstocelebritydating)
![Page 11: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/11.jpg)
Solution:TreatQuizzaseCommerceSite
![Page 12: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/12.jpg)
Solution:TreatQuizzaseCommerceSite
Feedback:Valueofclick
![Page 13: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/13.jpg)
ExampleofTargeting:MedicalQuizzes
• MedicaltopicsTheythebestperformingquizzes…
• UserscomingfromsitessuchasMayoClinic,WebMD• Likely“prosumers”(proactiveconsumers,notprofessionals
![Page 14: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/14.jpg)
Self-selectionandparticipation
• Lowperformingusersnaturallydropout• Withpaidusers,monetaryincentiveskeepthem
Submittedanswers
%
correct
![Page 15: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/15.jpg)
Comparisonwithpaidcrowdsourcing
• Bestpaiduser– 68%quality,40answers(~1.5minutesperquestion)– Quality-equivalency:13answers@99%accuracy,23answers@90%accuracy– 5cents/question,or$3/hrtomatchadvertisingcostofunpaidusers
• Knowledgeableusersaremuchfasterandmoreefficient
Submittedanswers
%
correct
![Page 16: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/16.jpg)
Targeted Advertising• New way to run crowdsourcing, targeting
with ads • Engages unpaid users, avoids problems
with extrinsic rewards • Provides access to expert users, not
available labor platforms • Experts not always professionals (e.g., Mayo
Clinic users)
![Page 17: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/17.jpg)
17
Online Labor Markets• Help employers and employees connect • Face a similar challenge • How do they assess worker skills?
![Page 18: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/18.jpg)
Skill Testing• Skill certification through testing • Workers take online tests • Display score on profile • Tests licensed from companies • Domain-experts paid to create questions • Static question banks
![Page 19: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/19.jpg)
ExpertRating Categories• Airlines and Aviation
• Building & Construction
• Career guidance
• Clothing and Fashion
• Engineering
• English language skills
• Finance & Accounting
• Food and hospitality
• Foreign language skills
• Graphic design
• Healthcare
• IT & Computer skills
• Law
• Management
• Media
• Medical transcription and billing
• Office temp skills
• Sales and Marketing
![Page 20: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/20.jpg)
Problems• Static Question Banks
• Questions become outdated • Cheating
• Lack of evaluation • Questionable long-term performance
predictors • Questions may have errors or ambiguities
![Page 21: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/21.jpg)
STEP: A Scalable Testing and Evaluation Platform
• Continuously generate new questions • Make tests more cheating proof • Keep questions up-to-date
• Evaluate question quality • Identify errors or ambiguities • Use real-market performance data for evaluation
Christoforaki and Ipeirotis (2014)
![Page 22: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/22.jpg)
STEP system summary
Question Bank
TestsTest
Questions
Test-taker Answers
Low-Quality Questions
High-Quality Questions
InspirationQuestion/Answer
(Q/A) Sites
![Page 23: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/23.jpg)
Stack Overflow
• 3 million subscribed users • 8 million questions • 35K tags • 91% at least one answer
“A Q/A site for professional and enthusiast programmers”Topic Questions %
Java 737,563 8.9Javascript 723,150 8.7
C# 714,774 8.6PHP 658,827 8.0
Android 585,017 7.1Jquery 545,776 6.6Python 355,093 4.3HTML 352,146 4.2C++ 325,667 3.9
mysql 280,946 3.4
![Page 24: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/24.jpg)
Stack Overflow Challenges
• Volume of questions • Large base of candidate questions for tests • Not all Q/A threads are suitable to serve as test-
questions • Properties of good Q/A threads
• Relatively small text for easy processing and reformulation
• Question relevant to general topic at hand • Has to have a clear and objective correct answer
![Page 25: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/25.jpg)
Question Spotter• Identifies promising Q/A threads• Train classifier with obtained labels: ~90% precision
Features– Question text length– Answer count– Answer score entropy– Popularity distribution of tags
Q/A Threads
Review
ed
Questio
nsTes
t
Questio
nsEdite
d
Questio
ns
High Q
uality
Q/A Threads
Question Bank
– Question popularity score– Weekly view count – Max answer author reputation
![Page 26: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/26.jpg)
Question Editor
Q/A Threads
Review
ed
Questio
nsTes
t
Questio
nsEdite
d
Questio
ns
High Q
uality
Q/A Threads
Question Bank
• Humans with expertise in topic at hand • Visit and read promising Q/A thread • Reformulate into multiple choice test-
question • Discard questions not considered appropriate
![Page 27: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/27.jpg)
Question Reviewer
Q/A Threads
Review
ed
Questio
nsTes
t
Questio
nsEdite
d
Questio
ns
High Q
uality
Q/A Threads
Question Bank
• Have a good handle of English Language, Check for spelling, grammar • Check for compliance with test standards
• Vocabulary usage • Question text length • Answer count • Answer text length
• Reviewers do not need to be topic experts
![Page 28: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/28.jpg)
Question Bank
Q/A Threads
Review
ed
Questio
nsTes
t
Questio
nsEdite
d
Questio
ns
High Q
uality
Q/A Threads
Question Bank
• Experimental Question Bank • Stores newly created questions • Not used for test-taker evaluation • Gather answers waiting for evaluation
• Production Question Bank • Are used for the test-taker evaluation
Question Bank
![Page 29: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/29.jpg)
System Overview
Q/A Threads
Review
ed
Questio
nsTes
t
Questio
nsEdite
d
Questio
ns
High Q
uality
Q/A Threads
Question Bank
Test-taker Answers
Low-Quality Questions
High-Quality Questions
![Page 30: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/30.jpg)
Item Response Theory• Test takers have a single ability
parameter 𝛳 • Questions are modeled by Item
Characteristic Curve:
• α: discrimination of the question
• β: difficulty of the question
![Page 31: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/31.jpg)
Item Response Theory
Normalized Ability θ
Prob
abili
ty o
f Su
cces
s P(θ)
α: discrimination of the question
![Page 32: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/32.jpg)
Item Response Theory
Normalized Ability θ
Prob
abili
ty o
f Su
cces
s P(θ)
β: difficulty of the question
![Page 33: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/33.jpg)
Question Quality Evaluation
I(θ): information gain
Discrimination=1.83Difficulty=0.81
P(θ): probability of success
Discrimination=0.45Difficulty=6.14
Ability θ Ability θ
![Page 34: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/34.jpg)
Ability measures• Endogenous measures
• 𝛳(u): Test score of candidate u • Fit the function using logistic
regression • Derive discrimination and difficulty
values for each question
![Page 35: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/35.jpg)
Ability measures• Exogenous measures
• 𝛳(u): Hourly wage of candidate u after taking the test
• Use wage data from ODesk • More robust to cheating • Evaluates importance of skills in the
marketplace
![Page 36: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/36.jpg)
STEP cost• Using oDesk data • Question cost
• Static question bank licensing: $10 per question
• STEP: $4 per question • Create question “from scratch” (IKM
data): $25 per question
![Page 37: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/37.jpg)
STEP performance• Question quality (Java test example)
• Static Question Bank: 87% acceptance rate
• STEP generated questions: 89% acceptance rate
Aggregate Information gain
Static STEP
![Page 38: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/38.jpg)
STEP• System that continuously generates new
questions • Makes tests more cheating-proof • Assesses test quality with real-market
performance data • Identify potential errors or ambiguities • Is of equal or higher quality with existing tests • Cheaper to generate questions than licensing
![Page 39: Expert Crowds](https://reader034.vdocuments.net/reader034/viewer/2022050807/58a1a48f1a28ab5c588beed7/html5/thumbnails/39.jpg)
What would the ability to find and engage experts allow you to do?