crowdsourcing linked data management

19
HUMAN COMPUTATION IN THE LINKED DATA MANAGEMENT LIFE CYCLE ELENA SIMPERL UNIVERSITY OF SOUTHAMPTON 7/18/2013 1 st PRELIDA workshop 1

Upload: elena-simperl

Post on 10-May-2015

258 views

Category:

Education


2 download

TRANSCRIPT

Page 1: Crowdsourcing Linked Data management

HUMAN COMPUTATION IN THE LINKED DATA MANAGEMENT LIFE CYCLE ELENA SIMPERL

UNIVERSITY OF SOUTHAMPTON

7/18/2013

1st PRELIDA workshop 1

Page 2: Crowdsourcing Linked Data management

HUMAN COMPUTATION Outsourcing tasks that machines find difficult to solve to humans (accuracy, efficiency, costs)

Page 3: Crowdsourcing Linked Data management

SEMANTIC TECHNOLOGIES ARE ALL ABOUT AUTOMATION

…but many tasks rely on human input

• Modeling a domain • Integrating data sources

originating from different contexts

• Producing semantic markup for various types of digital artifacts

• ...

3 1st PRELIDA workshop

Page 4: Crowdsourcing Linked Data management

DIMENSIONS OF HUMAN COMPUTATION SYSTEMS

What Tasks that

require basic human skills

How Distribution

Coordination Aggregation

Quality Closed vs

open answers

Ground truth Quantitative vs qualitative Who is the evaluator?

Optimize! Incentives Reduce

problem size Task

assignment

7/18/2013

1st PRELIDA workshop 4

Page 5: Crowdsourcing Linked Data management

GAMES WITH A PURPOSE (GWAP)

Human computation disguised as casual games Tasks are divided into parallelizable atomic units (challenges) solved (consensually) by players Game models

• Single vs. multi-player • Selection agreement vs. input agreement vs. inversion-

problem games

7/18/2013 5

Page 6: Crowdsourcing Linked Data management

MICROTASK CROWDSOURCING Similar types of tasks, but different incentives model (monetary reward, PPP) Successfully applied to transcription, classification, and content generation, data collection, image tagging, website feedback, usability tests…

7/18/2013

1st PRELIDA workshop 6

Page 7: Crowdsourcing Linked Data management

THE SAME, BUT DIFFERENT • Tasks leveraging common human skills, appealing to large

audiences • Selection of domain and task more constrained in games to

create typical UX • Tasks decomposed into smaller units of work to be solved

independently • Complex workflows

• Creating a casual game experience vs. patterns in microtasks • Quality assurance

• Synchronous interaction in games • Levels of difficulty and near-real-time feedback in games • Many methods applied in both cases (redundancy, votes,

statistical techniques) • Different set of incentives and motivators

7/18/2013

1st PRELIDA workshop 7

Page 8: Crowdsourcing Linked Data management

Physical World (people and devices)

HYBRID SYSTEMS

Design and composition

Participation and data supply

Model of social interaction

Virtual world (Network of social interactions)

Dave Robertson

Page 9: Crowdsourcing Linked Data management

Not sure

EXAMPLE: HYBRID DATA INTEGRATION

paper conf Data integration VLDB-01

Data mining SIGMOD-02

title author email OLAP Mike mike@a

Social media Jane jane@b

Generate plausible matches – paper = title, paper = author, paper = email, paper = venue – conf = title, conf = author, conf = email, conf = venue

Ask users to verify

paper conf Data integration VLDB-01

Data mining SIGMOD-02

title author email venue OLAP Mike mike@a ICDE-02

Social media Jane jane@b PODS-05

Does attribute paper match attribute author?

No Yes

[McCann, Shen, Doan, ICDE 2008] 9

Page 10: Crowdsourcing Linked Data management

EXAMPLES FROM THE LINKED DATA WORLD

ELENA SIMPERL

UNIVERSITY OF SOUTHAMPTON, UK

7/18/2013

1st PRELIDA workshop 10

Page 11: Crowdsourcing Linked Data management

WHAT IS DIFFERENT ABOUT SEMANTIC SYSTEMS?

Semantic Web tools vs. applications

• Intelligent (specialized) Web sites (portals) with improved (local) search based on vocabularies and ontologies

• X2X integration (often combined with Web services)

• Knowledge representation, communication and exchange

7/18/2013

1st PRELIDA workshop

Page 12: Crowdsourcing Linked Data management

TASKS NAMED IN METHODOLOGIES ARE TOO HIGH-LEVEL

Crowdsource very specific tasks that are (highly) divisible

• Labeling (in different languages) • Finding relationships • Populating the ontology • Aligning and interlinking • Ontology-based annotation • Validating the results of automatic

methods • …

Think about the context of the application (social structure) and about how to hide tasks behind existing practices and tools

12

7/18/2013

Tutorial@ESWC2013

Page 13: Crowdsourcing Linked Data management

TASTE IT! TRY IT! • Restaurant review Android app developed in the Insemtives project • Uses Dbpedia concepts to generate structured reviews • Uses mechanism design/gamification to configure incentives • User study

• 2274 reviews by 180 reviewers referring to 900 restaurants, using 5667 DPpedia concepts

7/18/2013

1st PRELIDA workshop 13 https://play.google.com/store/apps/details?id=insemtives.android&hl=en

0

500

1000

1500

2000

2500

CAFE FASTFOOD PUB RESTAURANT

Numer of reviews

Number of semantic annotations (type of cuisine)

Number of semantic annotations (dishes)

Page 14: Crowdsourcing Linked Data management

LODREFINE

7/18/2013

1st PRELIDA workshop 14 http://research.zemanta.com/crowds-to-the-rescue/

Page 15: Crowdsourcing Linked Data management

DBPEDIA CURATION

7/18/2013

1st PRELIDA workshop 15 http://aksw.org/Projects/TripleCheckMate.html

Page 16: Crowdsourcing Linked Data management

CROWDMAP Experiments using MTurk, CrowdFlower and established benchmarks Enhancing the results of automatic techniques Fast, accurate, cost-effective [Sarasua, Simperl, Noy, ISWC2012]

16

CartP 301-304

100R50P Edas-Iasted

100R50P Ekaw-Iasted

100R50P Cmt-Ekaw

100R50P ConfOf-Ekaw

Imp 301-304

PRECISION 0.53 0.8 1.0 1.0 0.93 0.73

RECALL 1.0 0.42 0.7 0.75 0.65 1.0

Page 17: Crowdsourcing Linked Data management

ONTOLOGY POPULATION

7/18/2013

1st PRELIDA workshop 17

Page 18: Crowdsourcing Linked Data management

LINKED DATA CURATION

7/18/2013

1st PRELIDA workshop 18

Page 19: Crowdsourcing Linked Data management

PROBLEMS AND CHALLENGES •What is feasible and how can tasks be optimally translated into microtasks?

• Examples: data quality assessment for technical and contextual features; subjective vs objective tasks (also in modeling); open-ended questions

•What to show to users • Natural language descriptions of Linked Data/SPARQL • How much context • What form of rendering • How about links?

•How to combine with automatic tools • Which results to validate

• Low precision (no fun for gamers...) • Low recall (vs all possible questions)

•How to embed it into an existing application • Tasks are fine granular, perceived as additional burden to the actual functionality

•What to do with the resulting data? • Integration into existing practices • Vocabularies!

7/18/2013

1st PRELIDA workshop 19