one does not simply crowdsource the semantic web
TRANSCRIPT
1
ONE DOES NOT SIMPLY CROWDSOURCE THE SEMANTIC
WEB TECHNOLOGY DESIGN AND INCENTIVES
Elena [email protected] @esimperlJanuary 26th, 2016
2
CROWDSOURCINGPROBLEM SOLVING VIA OPEN CALLS “Crowdsourcing represents the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call. “
[Howe, 2006]
3
THE SEMANTIC WEBWEB OF DATA THAT CAN BE PROCESSED BY MACHINES “The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries “
[W3C, 2011]
4
MAKING THE SEMANTIC WEB HUMANLY POSSIBLECrowdsourcing increasingly used to help algorithms solve Semantic Web problemsGreat challenges How to run a crowdsourcing project
effectively? Which form of crowdsourcing for
which task? How to combine crowd and machine
intelligence? How to encourage participation?
5
DESIGNING CROWDSOURCING
PROJECTS
6
DIFFERENT FORMS AND PLATFORMS TO CHOOSE FROM
MacrotasksMicrotasksChallengesSelf-organized crowdsCrowdfunding
Source:
[Prpić et al., 2015]
MANY QUESTIONS TO ANSWER
TASK DESIGNWORKFLOW DESIGN AND EXECUTION
TASK INTERFACES
QUALITY ASSURANCE
TASK ASSIGNMENT
CROWD TRAINING AND
FEEDBACKINCENTIVES
ENGINEERING
COLLABORATION,
COMPETITION, SELF-
ORGANIZATION
REAL-TIME DELIVERY
NICHESOURCING
EXTENSIONS TO
TECHNOLOGIES
SOCIAL MACHINES
ENGINEERING
8
SOME ANSWERS
IMPROVING PAID MICROTASKS @WWW15 Compared effectivity of microtasks on CrowdFlower vs self-developed game Image labelling on ESP data set as gold standard
Evaluated accuracy, #labels, cost per label, avg/max #labels/contributor
For three types of tasks Nano: 1 image Micro: 11 images Small: up to 2000 images
Probabilistic reasoning to personalize furtherance incentives
Findings Gamification and payments work well together
Furtherance incentives particularly interesting for top contributors
HYBRID NER ON TWITTER @ESWC15Identified content and crowd factors that impact effectivity
Findings Shorter tweets with fewer entities work better Crowd is more familiar with people and places from recent news MISC as a NER category sometimes confusing but useful to identify partial and implicitly named entities
#entities in post
types of entities
content sentime
ntskipped
TP posts
avg. time/tas
k
UI interact
ion
11
CROWD-EMPOWERED SPARQL QUERIES @KCAP2015A hybrid machine/human SPARQL query engine that enhances query answers. Uses novel RDF completeness model, to
identify portions of a query with missing values Resorts to microtask crowdsourcing to resolve
the missing values Evaluated # of answers/delivery time/accuracy
50 queries against Dbpedia in five domains: History, Life Sciences, Movies, Music, and Sports.
FindingsSize of query answer set increased on avg. 3.13 times12 minutes to get 98% of all answersAccuracy between 84 And 96%
12
OPEN QUESTIONS
13
NOT CROWDSOURCING AS USUAL Knowledge-intensive tasks Structured, interlinked content Content meant for machine consumption Scale, shape, and quality of the data Context is critical Open-set answers
14
FUNDAMENTAL CHALLENGESSCALE
No‘Big Crowd’TIME
From one-off and short-term to mid and long-termSCOPE
Problems technology cannot solve
15
PATHWAYS TO SOLUTIONSSC
ALEAligning
incentivesBetter reuse of crowd outputs
TIM
ESustaining engagementBuilding relationshipsBetter integration with algorithms
SCOP
ENew problems and problem solving paradigmsNovel human-computer interactions designs
16
[email protected]@esimperl