exploring the challenge of linking scientific publications and studies with crowd workers instead of...

19
Web Science & Technologies University of Koblenz Landau, Germany Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts Cristina Sarasua [email protected] Computational Social Science workshop Köln, 16.12.2013

Upload: cristina-sarasua

Post on 05-Jul-2015

120 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts

Web Science & Technologies

University of Koblenz ▪ Landau, Germany

Exploring the challenge of linking scientific publications

and studies with crowd workers instead of domain experts

Cristina Sarasua

[email protected]

Computational Social Science workshop

Köln, 16.12.2013

Page 2: Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts

WeST Cristina SarasuaExploring the challenge of linking scientific publications and

studies with crowd workers instead of domain experts

FOTO

Ideal workflow

Peter Schumacher (social scientist) would like to analyse

the voting patterns of Germans in the last 20 years

Past observations

New analysis, new findings

Read publications Access data Reuse data1 2 3

Page 3: Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts

WeST Cristina SarasuaExploring the challenge of linking scientific publications and

studies with crowd workers instead of domain experts

Reality

Publications and research data (coming from surveys and

studies) are published independently

The link between them is missing

Researchers cannot easily access the research data

FOTO?

Page 4: Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts

WeST Cristina SarasuaExploring the challenge of linking scientific publications and

studies with crowd workers instead of domain experts

Scenario

We need a method to

process publications and

studies in order to be able

to

1. Find references to

studies inside

publications

2. Identify which

publication is connected

to which study

3. Identify the type of

relation between

publication and study

publications

research data (studies)

Page 5: Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts

WeST Cristina SarasuaExploring the challenge of linking scientific publications and

studies with crowd workers instead of domain experts

Problem

Computers cannot perform these 3 tasks automatically in a

perfect way

We need human intervention

Domain experts are often not available for such kind of

tasks

Incorrect link between a

publication and a study

Page 6: Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts

WeST Cristina SarasuaExploring the challenge of linking scientific publications and

studies with crowd workers instead of domain experts

Solution: Crowdsourcing

“The process of outsourcing a task to a (potentially) large and

undefined group of people in an open call“ Jeff Howe, 2006

Microtask crowdsourcing

-Simple and independent tasks

-Paid crowdsourcing

-Online labor marketplaces (e.g. MTurk)

-

Page 7: Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts

WeST Cristina SarasuaExploring the challenge of linking scientific publications and

studies with crowd workers instead of domain experts

Amazon Mechanical Turk

Page 8: Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts

WeST Cristina SarasuaExploring the challenge of linking scientific publications and

studies with crowd workers instead of domain experts

1) Automatic processing of publications and studies

2) Ask crowd workers to review links

- Correct errors

- Identify primary literature / secondary literature

3) Generates Linked Data

Hybrid solution

SSOAR

da|ra

InfoLink

CrowdLINK

links

corrected links

1

PublicationsWeb

portal

Web

portal

Researcher

Research data

3

2

Crowdsourced interlinking: the GESIS case study

Page 9: Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts

WeST Cristina SarasuaExploring the challenge of linking scientific publications and

studies with crowd workers instead of domain experts

How is this related to CSS?

Page 10: Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts

WeST Cristina SarasuaExploring the challenge of linking scientific publications and

studies with crowd workers instead of domain experts

On the one hand …

The GESIS case study

In collab with GESIS colleagues

Katarina Boland, Daniel Hienert et al.

Page 11: Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts

WeST Cristina SarasuaExploring the challenge of linking scientific publications and

studies with crowd workers instead of domain experts

On the other hand …

How to manage such a

group of people to maximize

their efficiency and make

them happy?

Page 12: Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts

WeST Cristina SarasuaExploring the challenge of linking scientific publications and

studies with crowd workers instead of domain experts

Page 13: Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts

WeST Cristina SarasuaExploring the challenge of linking scientific publications and

studies with crowd workers instead of domain experts

2010Chart: Ipeirotis, 2010

Different background

Open call

We can impose some restrictions (e.g. language, country,

reputation gained)

Spam

Charts: Charts Ross et al., 2010

Different motivations Different behaviour

CrowdFlower 11.12.2013

Page 14: Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts

WeST Cristina SarasuaExploring the challenge of linking scientific publications and

studies with crowd workers instead of domain experts

They are not the “most exciting tasks“ of the world

The data is in German

The domain is very specific

The tasks at hand

Page 15: Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts

WeST Cristina SarasuaExploring the challenge of linking scientific publications and

studies with crowd workers instead of domain experts

First experiments of the GESIS case study

Adopted measures

Used majority voting

Included verification questions (e.g. “please type the date shown for the

publication“)

Defined gold standard links to check who could be trusted

Highlights of findings

We managed to get trusted workers quite quickly (e.g. 490 links reviewed

in ~24hours) being able to improve the precision of the automatic software

without without loosing considerable recall

The cases which required background knowledge showed worse results

The task of “relating publication and study“ was solved with much better

recall than the task of deciding on “whether a publication is

primaryLiterature or not of a study“. The precision was very high, though.

Page 16: Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts

WeST Cristina SarasuaExploring the challenge of linking scientific publications and

studies with crowd workers instead of domain experts

Ongoing research work

Can we improve their results by including mixed

incentives? Not only money, but also competition at a

microtask level

How can we better instruct crowd workers in 1) the type of

tasks were are running and 2) the domain we are working

with?

there are only X links left, be

quick!“, or „there are three workers

who were faster in reviewing links!

there 3 workers who were faster in

reviewing links!

Page 17: Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts

WeST Cristina SarasuaExploring the challenge of linking scientific publications and

studies with crowd workers instead of domain experts

Take-home message

We can employ crowd workers for connecting scientific

publications and studies in the social sciences. It can improve

automatically generated links.

How can we transfer the knowledge of domain

experts to the crowd?

Page 18: Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts

WeST Cristina SarasuaExploring the challenge of linking scientific publications and

studies with crowd workers instead of domain experts

Call for discussion

Who?

1. Psychologists

2. Social Scientists

3. Computer scientists

Possible topics

Any feedback about the aforementioned ideas

Well-established methodologies in psychology to instruct

or train a large group of people

Any suggestion on how to analyse crowd workers (i.e.

criteria)

Page 19: Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts

WeST Cristina SarasuaExploring the challenge of linking scientific publications and

studies with crowd workers instead of domain experts

Thank you.

Vielen Dank.