one does not simply crowdsource the semantic web

1

ONE DOES NOT SIMPLY CROWDSOURCE THE SEMANTIC

WEB TECHNOLOGY DESIGN AND INCENTIVES

Elena [email protected] @esimperlJanuary 26th, 2016

2

CROWDSOURCINGPROBLEM SOLVING VIA OPEN CALLS “Crowdsourcing represents the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call. “

[Howe, 2006]

3

THE SEMANTIC WEBWEB OF DATA THAT CAN BE PROCESSED BY MACHINES “The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries “

[W3C, 2011]

4

MAKING THE SEMANTIC WEB HUMANLY POSSIBLECrowdsourcing increasingly used to help algorithms solve Semantic Web problemsGreat challenges How to run a crowdsourcing project

effectively? Which form of crowdsourcing for

which task? How to combine crowd and machine

intelligence? How to encourage participation?

5

DESIGNING CROWDSOURCING

PROJECTS

6

DIFFERENT FORMS AND PLATFORMS TO CHOOSE FROM

MacrotasksMicrotasksChallengesSelf-organized crowdsCrowdfunding

Source:

[Prpić et al., 2015]

MANY QUESTIONS TO ANSWER

TASK DESIGNWORKFLOW DESIGN AND EXECUTION

TASK INTERFACES

QUALITY ASSURANCE

TASK ASSIGNMENT

CROWD TRAINING AND

FEEDBACKINCENTIVES

ENGINEERING

COLLABORATION,

COMPETITION, SELF-

ORGANIZATION

REAL-TIME DELIVERY

NICHESOURCING

EXTENSIONS TO

TECHNOLOGIES

SOCIAL MACHINES

ENGINEERING

8

SOME ANSWERS

IMPROVING PAID MICROTASKS @WWW15 Compared effectivity of microtasks on CrowdFlower vs self-developed game Image labelling on ESP data set as gold standard

Evaluated accuracy, #labels, cost per label, avg/max #labels/contributor

For three types of tasks Nano: 1 image Micro: 11 images Small: up to 2000 images

Probabilistic reasoning to personalize furtherance incentives

Findings Gamification and payments work well together

Furtherance incentives particularly interesting for top contributors

HYBRID NER ON TWITTER @ESWC15Identified content and crowd factors that impact effectivity

Findings Shorter tweets with fewer entities work better Crowd is more familiar with people and places from recent news MISC as a NER category sometimes confusing but useful to identify partial and implicitly named entities

#entities in post

types of entities

content sentime

ntskipped

TP posts

avg. time/tas

k

UI interact

ion

11

CROWD-EMPOWERED SPARQL QUERIES @KCAP2015A hybrid machine/human SPARQL query engine that enhances query answers. Uses novel RDF completeness model, to

identify portions of a query with missing values Resorts to microtask crowdsourcing to resolve

the missing values Evaluated # of answers/delivery time/accuracy

50 queries against Dbpedia in five domains: History, Life Sciences, Movies, Music, and Sports.

FindingsSize of query answer set increased on avg. 3.13 times12 minutes to get 98% of all answersAccuracy between 84 And 96%

12

OPEN QUESTIONS

13

NOT CROWDSOURCING AS USUAL Knowledge-intensive tasks Structured, interlinked content Content meant for machine consumption Scale, shape, and quality of the data Context is critical Open-set answers

14

FUNDAMENTAL CHALLENGESSCALE

No‘Big Crowd’TIME

From one-off and short-term to mid and long-termSCOPE

Problems technology cannot solve

15

PATHWAYS TO SOLUTIONSSC

ALEAligning

incentivesBetter reuse of crowd outputs

TIM

ESustaining engagementBuilding relationshipsBetter integration with algorithms

SCOP

ENew problems and problem solving paradigmsNovel human-computer interactions designs

16

[email protected]@esimperl

one does not simply crowdsource the semantic web

Education