human computation and crowdsourcing for information systems

35
Human Computation and Crowdsourcing for Information Systems Marta Sabou Vienna University of Technology, Institute of Software Technology and Interactive Systems 13th of December 2016, Vienna, Austria

Upload: marta-sabou

Post on 13-Apr-2017

115 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Human Computation and Crowdsourcing for Information Systems

Human Computation and Crowdsourcing for Information Systems

Marta SabouVienna University of Technology,

Institute of Software Technology and Interactive Systems13th of December 2016, Vienna, Austria

Page 2: Human Computation and Crowdsourcing for Information Systems

THANK YOU!

What is Human Computation and Crowdsourcing (HC&C)?

How are HC&C used in NLP, Semantic Web and Software Engineering?

What could be the impact of HC&C on Information Systems?

Page 3: Human Computation and Crowdsourcing for Information Systems

morning

Human Computation: computer systems work together with large groups of human contributors to solve tasks that neither of them could solve by themselves

Marta Sabou | CONFENIS| 13.12.16 | Vienna

100 million people a day

13 million NYT articles;2 million books per year;

Source: L.vonAhn,B.Maurer,C.McMillen,D.Abraham,M.Blum, Science 321, 1465 (2008).

Page 4: Human Computation and Crowdsourcing for Information Systems

Human Computation Crowdsourcing

Human Computation and Crowdsourcing

A paradigm for utilizing human processing power to solve problems that computers cannot yet solve (von Ahn, 2005)

Crowdsourcing is the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call.

Crowdsourcing replaces traditional human workers with members of the public.

Human computation replaces computers with humans

Source: A. J. Quinn, B. B. Bederson. Human computation: a survey and taxonomy of a growing field. In Proc. of the SIGCHI Conference on Human Factors in Computing Systems (CHI '11). 1403-1412. 2011

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Page 5: Human Computation and Crowdsourcing for Information Systems

Genre 1: Mechanised LabourParticipants (workers) paid a small amount of money to complete easy tasks (HIT = Human Intelligence Task)

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Page 6: Human Computation and Crowdsourcing for Information Systems

Genre 1: Mechanised LabourParticipants (workers) paid a small amount of money to complete easy tasks (HIT = Human Intelligence Task)

Visual skills Language skills

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Page 7: Human Computation and Crowdsourcing for Information Systems

Genre 1: Mechanised LabourParticipants (workers) paid a small amount of money to complete easy tasks (HIT = Human Intelligence Task)

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Page 8: Human Computation and Crowdsourcing for Information Systems

Genre 2: Games with a purpose

“Players took three weeks to solve the three dimensional structure of a simian retroviral protein that is used in animal models of HIV, but whose structure had eluded biochemists for more than a decade.”http://blogs.nature.com/spoonful/2012/04/foldit-games-next-play-crowdsourcing-better-drug-design.html

S. Cooper, [other auhors], and Foldit players: Predicting protein structures with a multiplayer online game. Nature, 466(7307):756-760, 2010.

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Page 9: Human Computation and Crowdsourcing for Information Systems

Genre 3: Altruistic Crowdsourcing

Marta Sabou | CONFENIS| 13.12.16 | Vienna

• First year: 50 Mil classifications by 150K people

• By Oct 2016: 57 peer-reviewed papers published

Page 10: Human Computation and Crowdsourcing for Information Systems

THANK YOU!

HC&C: emerging computational paradigm where computer systems work together with large groups of human contributors to solve tasks that neither of them could solve by themselves.

How are HC&C used in NLP, Semantic Web and Software Engineering?

What could be the impact of HC&C on Information Systems?

Page 11: Human Computation and Crowdsourcing for Information Systems

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Page 12: Human Computation and Crowdsourcing for Information Systems

InputProcess/Algorithm Output Evaluation

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Key Problem in NLP: Creation of Language Resources

LR/corpora creation traditionally relies on a handful of well-trained experts:• Time consuming • Costly

Language Resources/Corpora:written or spoken corpora and lexica, multimodal resources, grammars, terminology or domain specific databases and dictionaries

Training Algorithm support Testing

Page 13: Human Computation and Crowdsourcing for Information Systems

Crowdsourcing is revolutionalising NLP research

Cheaper resource acquisition (affordable, large-scale resources) A variety of small-medium sized resources can be obtained with

as little as 100$ using AMT Crowdsourcing is also cost effective for large resources

(Poesio, 2012)

$/label 1 M labels ($)Traditional High Q. 1 1,000,000Mechanical Turk .38 380,000 (<40%)Game .19 217,000 (20%)

Source: Sabou, M., Bontcheva, K., Scharl, A. Crowdsourcing Research Opportunities: Lessons from Natural Language Processing. iKNOW-2012.

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Page 14: Human Computation and Crowdsourcing for Information Systems

Crowdsourcing is revolutionalising NLP research

Diversification of the research agenda E.g., Urdu, Arabic, Hitian

Creole lexicons between English

and 37 low resourced languages, (Irvine &Klementiev, 2010)

Emails, Twitter feeds, augmented and alternative communication texts

Speech: transcription, accent rating, assessment of dialog systems

Sentiment detection, translation, word sense

disambiguation, anaphora resolution, question answering,

textual entailment, text summarization

Easier evaluation of algorithms

Source: Sabou, M., Bontcheva, K., Scharl, A. Crowdsourcing Research Opportunities: Lessons from Natural Language Processing. iKNOW-2012.

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Page 15: Human Computation and Crowdsourcing for Information Systems

1. Project Definition

2. Data and UI Preparation

3. Running the Project

4. Evaluation & Corpus Delivery

Novel Methodologies Needed

Methodology for crowdsourcing based copora creation.

Source: Sabou, M., Bontcheva, K., Derczynski, L. and Scharl, A. . Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines. 9th Language Resources and Evaluation Conference (LREC-2014).

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Page 16: Human Computation and Crowdsourcing for Information Systems

Challenge 1: Contributor Selection and Training

From: prior to resource creation To: during the resource creation

SCREENING TRAINING PROFILING

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Page 17: Human Computation and Crowdsourcing for Information Systems

Challenge 2: Aggregation and Quality Control

From: a few experts‘ annotations To: multiple, noisy annotations from non-experts Approach 1: Statistical techniques

Simplest (and most popular): majority voting More complex: Machine learning model trained on

various features Approach 2: Crowdsourcing the QC process itself

HIT1 (Create):

Translate the following sentence:

HIT2 (Verify):Which of these 5 sentences is the

best translation?

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Page 18: Human Computation and Crowdsourcing for Information Systems

Scientific American, May 2001:

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Page 19: Human Computation and Crowdsourcing for Information Systems

Human Computation in the Semantic Web Life-Cycle

Source: G. Wohlgenannt, M. Sabou, F. Hanika: Crowd-based ontology engineering with the uComp Protégé plugin. Semantic Web 7(4): 379-398 (2015)

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Problem: ontology creation is time-consuming and costly.

Page 20: Human Computation and Crowdsourcing for Information Systems

Context: Ontology Creation

TextCorpus

HC Task: (T3) Specification of relation type between concept pairs.

Coal Is a subcategory of Fossil Fuel

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Page 21: Human Computation and Crowdsourcing for Information Systems

How does MLab compare to GWAPs?

Marta Sabou | CONFENIS| 13.12.16 | Vienna

GWAP

MLab

Comparison Criteria:• Cost• Speed• Quality

Source: M. Sabou, K. Bontcheva, A. Scharl,  M. Föls. 2013. Games with a Purpose or Mechanised Labour?: A Comparative Study. In i-Know '13.

Page 22: Human Computation and Crowdsourcing for Information Systems

GWAP: Climate Quiz

Source: A. Scharl, M. Sabou, M. Föls: Climate quiz: a web application for eliciting and validating knowledge from social networks. WebMedia 2012: 189-192

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Page 23: Human Computation and Crowdsourcing for Information Systems

MLab using CrowdFlower

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Page 24: Human Computation and Crowdsourcing for Information Systems

Experimental Study Results

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Trade-offs: Cost; Timescale; Worker skills

Small, simple tasks, fast completion => MLab

Complex, large tasks, slower completion => GWAP

The two genres are highly complementary

Integration should be sought in Hybrid-genre workflows

Page 25: Human Computation and Crowdsourcing for Information Systems

Hybrid-genre workflows

Source: M. Sabou, A. Scharl, M. Fols. Crowdsourced Knowledge Acquisition: Towards Hybrid-Genre Workflows. Int. J. on Semantic Web and Information Systems (IJSWIS) 9:(3):14-41, 2013

Higher precision (increase with 6%) than GWAP based workflow alonePlayers work on cleaner data, therefore more motivated

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Page 26: Human Computation and Crowdsourcing for Information Systems

How to embed Crowdsourcing into Ontology Engineering?

Source: G. Wohlgenannt, M. Sabou, F. Hanika: Crowd-based ontology engineering with the uComp Protégé plugin. Semantic Web 7(4): 379-398 (2015)

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Page 27: Human Computation and Crowdsourcing for Information Systems

Evaluation

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Source: G. Wohlgenannt, M. Sabou, F. Hanika: Crowd-based ontology engineering with the uComp Protégé plugin. Semantic Web 7(4): 379-398 (2015)

T1-T3

Human Computation can be embedded in ontology engineering workflows and leads to good quality ontologies

obtained faster (c.a. 40%) and cheaper (c.a. 60%).

Page 28: Human Computation and Crowdsourcing for Information Systems

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Page 29: Human Computation and Crowdsourcing for Information Systems

Use of Crowdsourcing in Software Engineering

Source: K. Mao, L. Capra, M. Harman, Y. Jia. A survey of the use of crowdsourcing in software engineering. Journal of Systems and Software, 2016.

Model Quality Assurance

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Some tasks require expert workers: expert-sourcing.

Page 30: Human Computation and Crowdsourcing for Information Systems

Crowdsourcing–based Model Quality Checking

System Specification System EER Diagram

Does the model completely and correctly represent the specification?

Current work: Use expert-sourcing to speed up MQC task.

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Page 31: Human Computation and Crowdsourcing for Information Systems

THANK YOU!

HC&C: emerging computational paradigm where computer systems work together with large groups of human contributors to solve tasks that neither of them could solve by themselves.

HC&C is: revolutionizing NLP research; supports Semantic Web ontology engineering and activities across the Software Engineering life cycle.

What could be the impact of HC&C on Information Systems?

Page 32: Human Computation and Crowdsourcing for Information Systems

Higher Quality IS Created Faster

Crowdsourcing improves all aspects of Software Engineering and leads to:– Reduced time to market for IS– Higher quality IS (more robust)

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Source: T.D. LaToza, A. van der Hoek. Crowdsourcing in Software Engineering: Models, Motivations, and Challenges. IEEE Softw. 33(1):74-80. 2016Source: K. Mao, L. Capra, M. Harman, Y. Jia. A survey of the use of crowdsourcing in software engineering. Journal of Systems and Software, 2016.

“The possibility to usability-test a system in a few hours is tantalizing, given that, in-house that task often takes considerably longer (LaToza, 2016).”

Page 33: Human Computation and Crowdsourcing for Information Systems

New Types of IS with Advanced Capabilities IS to benefit from advanced capabilities developed thanks to HC

– E.g., sentiment detection in tweets; translation support for a broader set of languages; Emergence of “Crowdsourcing Information Systems”

– “socio-technical systems that produce informational products and/or services for internal or external customers by harnessing the potential of crowds” (Geiger et al., 2012)

Marta Sabou | CONFENIS| 13.12.16 | Vienna Source: D. Geiger et. al., Crowdsourcing Information Systems: Definition, Typology and Design.  Proc. of the 33rd International Conference on Information Systems, 2012.

Crowd Rating Crowd Creation

Crowd Processing Crowd Solving

Crowdsourced IS

Valu

e D

eriv

ed

Non

-em

erge

nt

Em

erge

nt

Differentiation between contributions

Homogeneous Heterogeneous

Page 34: Human Computation and Crowdsourcing for Information Systems

Managing the Crowd-Workforce

Opportunities– Fighting poverty– Democratization of

participations: liberating to choose when, what and where to contribute

– Support learning and self-improvement through work

– Advertise STEM research and science to young people

Challenges– Ethical issues: low wages

($2/hour), lack of worker rights

– Prevent addiction, prolonged-use, user exploitation

Marta Sabou | CONFENIS| 13.12.16 | Vienna

Page 35: Human Computation and Crowdsourcing for Information Systems

THANK YOU!

HC&C: emerging computational paradigm where computer systems work together with large groups of human contributors to solve tasks that neither of them could solve by themselves.

HC&C is: revolutionizing NLP research; supports Semantic Web ontology engineering and activities across the Software Engineering life cycle.

Impact on IS: IS will be created faster and will be more robust; IS will have advanced capabilities; IS must be aware of opportunities and challenges when managing crowdworkers.