human computation and crowdsourcing for information systems
TRANSCRIPT
![Page 1: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/1.jpg)
Human Computation and Crowdsourcing for Information Systems
Marta SabouVienna University of Technology,
Institute of Software Technology and Interactive Systems13th of December 2016, Vienna, Austria
![Page 2: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/2.jpg)
THANK YOU!
What is Human Computation and Crowdsourcing (HC&C)?
How are HC&C used in NLP, Semantic Web and Software Engineering?
What could be the impact of HC&C on Information Systems?
![Page 3: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/3.jpg)
morning
Human Computation: computer systems work together with large groups of human contributors to solve tasks that neither of them could solve by themselves
Marta Sabou | CONFENIS| 13.12.16 | Vienna
100 million people a day
13 million NYT articles;2 million books per year;
Source: L.vonAhn,B.Maurer,C.McMillen,D.Abraham,M.Blum, Science 321, 1465 (2008).
![Page 4: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/4.jpg)
Human Computation Crowdsourcing
Human Computation and Crowdsourcing
A paradigm for utilizing human processing power to solve problems that computers cannot yet solve (von Ahn, 2005)
Crowdsourcing is the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call.
Crowdsourcing replaces traditional human workers with members of the public.
Human computation replaces computers with humans
Source: A. J. Quinn, B. B. Bederson. Human computation: a survey and taxonomy of a growing field. In Proc. of the SIGCHI Conference on Human Factors in Computing Systems (CHI '11). 1403-1412. 2011
Marta Sabou | CONFENIS| 13.12.16 | Vienna
![Page 5: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/5.jpg)
Genre 1: Mechanised LabourParticipants (workers) paid a small amount of money to complete easy tasks (HIT = Human Intelligence Task)
Marta Sabou | CONFENIS| 13.12.16 | Vienna
![Page 6: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/6.jpg)
Genre 1: Mechanised LabourParticipants (workers) paid a small amount of money to complete easy tasks (HIT = Human Intelligence Task)
Visual skills Language skills
Marta Sabou | CONFENIS| 13.12.16 | Vienna
![Page 7: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/7.jpg)
Genre 1: Mechanised LabourParticipants (workers) paid a small amount of money to complete easy tasks (HIT = Human Intelligence Task)
Marta Sabou | CONFENIS| 13.12.16 | Vienna
![Page 8: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/8.jpg)
Genre 2: Games with a purpose
“Players took three weeks to solve the three dimensional structure of a simian retroviral protein that is used in animal models of HIV, but whose structure had eluded biochemists for more than a decade.”http://blogs.nature.com/spoonful/2012/04/foldit-games-next-play-crowdsourcing-better-drug-design.html
S. Cooper, [other auhors], and Foldit players: Predicting protein structures with a multiplayer online game. Nature, 466(7307):756-760, 2010.
Marta Sabou | CONFENIS| 13.12.16 | Vienna
![Page 9: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/9.jpg)
Genre 3: Altruistic Crowdsourcing
Marta Sabou | CONFENIS| 13.12.16 | Vienna
• First year: 50 Mil classifications by 150K people
• By Oct 2016: 57 peer-reviewed papers published
![Page 10: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/10.jpg)
THANK YOU!
HC&C: emerging computational paradigm where computer systems work together with large groups of human contributors to solve tasks that neither of them could solve by themselves.
How are HC&C used in NLP, Semantic Web and Software Engineering?
What could be the impact of HC&C on Information Systems?
![Page 11: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/11.jpg)
Marta Sabou | CONFENIS| 13.12.16 | Vienna
![Page 12: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/12.jpg)
InputProcess/Algorithm Output Evaluation
Marta Sabou | CONFENIS| 13.12.16 | Vienna
Key Problem in NLP: Creation of Language Resources
LR/corpora creation traditionally relies on a handful of well-trained experts:• Time consuming • Costly
Language Resources/Corpora:written or spoken corpora and lexica, multimodal resources, grammars, terminology or domain specific databases and dictionaries
Training Algorithm support Testing
![Page 13: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/13.jpg)
Crowdsourcing is revolutionalising NLP research
Cheaper resource acquisition (affordable, large-scale resources) A variety of small-medium sized resources can be obtained with
as little as 100$ using AMT Crowdsourcing is also cost effective for large resources
(Poesio, 2012)
$/label 1 M labels ($)Traditional High Q. 1 1,000,000Mechanical Turk .38 380,000 (<40%)Game .19 217,000 (20%)
Source: Sabou, M., Bontcheva, K., Scharl, A. Crowdsourcing Research Opportunities: Lessons from Natural Language Processing. iKNOW-2012.
Marta Sabou | CONFENIS| 13.12.16 | Vienna
![Page 14: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/14.jpg)
Crowdsourcing is revolutionalising NLP research
Diversification of the research agenda E.g., Urdu, Arabic, Hitian
Creole lexicons between English
and 37 low resourced languages, (Irvine &Klementiev, 2010)
Emails, Twitter feeds, augmented and alternative communication texts
Speech: transcription, accent rating, assessment of dialog systems
Sentiment detection, translation, word sense
disambiguation, anaphora resolution, question answering,
textual entailment, text summarization
Easier evaluation of algorithms
Source: Sabou, M., Bontcheva, K., Scharl, A. Crowdsourcing Research Opportunities: Lessons from Natural Language Processing. iKNOW-2012.
Marta Sabou | CONFENIS| 13.12.16 | Vienna
![Page 15: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/15.jpg)
1. Project Definition
2. Data and UI Preparation
3. Running the Project
4. Evaluation & Corpus Delivery
Novel Methodologies Needed
Methodology for crowdsourcing based copora creation.
Source: Sabou, M., Bontcheva, K., Derczynski, L. and Scharl, A. . Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines. 9th Language Resources and Evaluation Conference (LREC-2014).
Marta Sabou | CONFENIS| 13.12.16 | Vienna
![Page 16: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/16.jpg)
Challenge 1: Contributor Selection and Training
From: prior to resource creation To: during the resource creation
SCREENING TRAINING PROFILING
Marta Sabou | CONFENIS| 13.12.16 | Vienna
![Page 17: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/17.jpg)
Challenge 2: Aggregation and Quality Control
From: a few experts‘ annotations To: multiple, noisy annotations from non-experts Approach 1: Statistical techniques
Simplest (and most popular): majority voting More complex: Machine learning model trained on
various features Approach 2: Crowdsourcing the QC process itself
HIT1 (Create):
Translate the following sentence:
HIT2 (Verify):Which of these 5 sentences is the
best translation?
Marta Sabou | CONFENIS| 13.12.16 | Vienna
![Page 18: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/18.jpg)
Scientific American, May 2001:
Marta Sabou | CONFENIS| 13.12.16 | Vienna
![Page 19: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/19.jpg)
Human Computation in the Semantic Web Life-Cycle
Source: G. Wohlgenannt, M. Sabou, F. Hanika: Crowd-based ontology engineering with the uComp Protégé plugin. Semantic Web 7(4): 379-398 (2015)
Marta Sabou | CONFENIS| 13.12.16 | Vienna
Problem: ontology creation is time-consuming and costly.
![Page 20: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/20.jpg)
Context: Ontology Creation
TextCorpus
HC Task: (T3) Specification of relation type between concept pairs.
Coal Is a subcategory of Fossil Fuel
Marta Sabou | CONFENIS| 13.12.16 | Vienna
![Page 21: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/21.jpg)
How does MLab compare to GWAPs?
Marta Sabou | CONFENIS| 13.12.16 | Vienna
GWAP
MLab
Comparison Criteria:• Cost• Speed• Quality
Source: M. Sabou, K. Bontcheva, A. Scharl, M. Föls. 2013. Games with a Purpose or Mechanised Labour?: A Comparative Study. In i-Know '13.
![Page 22: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/22.jpg)
GWAP: Climate Quiz
Source: A. Scharl, M. Sabou, M. Föls: Climate quiz: a web application for eliciting and validating knowledge from social networks. WebMedia 2012: 189-192
Marta Sabou | CONFENIS| 13.12.16 | Vienna
![Page 23: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/23.jpg)
MLab using CrowdFlower
Marta Sabou | CONFENIS| 13.12.16 | Vienna
![Page 24: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/24.jpg)
Experimental Study Results
Marta Sabou | CONFENIS| 13.12.16 | Vienna
Trade-offs: Cost; Timescale; Worker skills
Small, simple tasks, fast completion => MLab
Complex, large tasks, slower completion => GWAP
The two genres are highly complementary
Integration should be sought in Hybrid-genre workflows
![Page 25: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/25.jpg)
Hybrid-genre workflows
Source: M. Sabou, A. Scharl, M. Fols. Crowdsourced Knowledge Acquisition: Towards Hybrid-Genre Workflows. Int. J. on Semantic Web and Information Systems (IJSWIS) 9:(3):14-41, 2013
Higher precision (increase with 6%) than GWAP based workflow alonePlayers work on cleaner data, therefore more motivated
Marta Sabou | CONFENIS| 13.12.16 | Vienna
![Page 26: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/26.jpg)
How to embed Crowdsourcing into Ontology Engineering?
Source: G. Wohlgenannt, M. Sabou, F. Hanika: Crowd-based ontology engineering with the uComp Protégé plugin. Semantic Web 7(4): 379-398 (2015)
Marta Sabou | CONFENIS| 13.12.16 | Vienna
![Page 27: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/27.jpg)
Evaluation
Marta Sabou | CONFENIS| 13.12.16 | Vienna
Source: G. Wohlgenannt, M. Sabou, F. Hanika: Crowd-based ontology engineering with the uComp Protégé plugin. Semantic Web 7(4): 379-398 (2015)
T1-T3
Human Computation can be embedded in ontology engineering workflows and leads to good quality ontologies
obtained faster (c.a. 40%) and cheaper (c.a. 60%).
![Page 28: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/28.jpg)
Marta Sabou | CONFENIS| 13.12.16 | Vienna
![Page 29: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/29.jpg)
Use of Crowdsourcing in Software Engineering
Source: K. Mao, L. Capra, M. Harman, Y. Jia. A survey of the use of crowdsourcing in software engineering. Journal of Systems and Software, 2016.
Model Quality Assurance
Marta Sabou | CONFENIS| 13.12.16 | Vienna
Some tasks require expert workers: expert-sourcing.
![Page 30: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/30.jpg)
Crowdsourcing–based Model Quality Checking
System Specification System EER Diagram
Does the model completely and correctly represent the specification?
Current work: Use expert-sourcing to speed up MQC task.
Marta Sabou | CONFENIS| 13.12.16 | Vienna
![Page 31: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/31.jpg)
THANK YOU!
HC&C: emerging computational paradigm where computer systems work together with large groups of human contributors to solve tasks that neither of them could solve by themselves.
HC&C is: revolutionizing NLP research; supports Semantic Web ontology engineering and activities across the Software Engineering life cycle.
What could be the impact of HC&C on Information Systems?
![Page 32: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/32.jpg)
Higher Quality IS Created Faster
Crowdsourcing improves all aspects of Software Engineering and leads to:– Reduced time to market for IS– Higher quality IS (more robust)
Marta Sabou | CONFENIS| 13.12.16 | Vienna
Source: T.D. LaToza, A. van der Hoek. Crowdsourcing in Software Engineering: Models, Motivations, and Challenges. IEEE Softw. 33(1):74-80. 2016Source: K. Mao, L. Capra, M. Harman, Y. Jia. A survey of the use of crowdsourcing in software engineering. Journal of Systems and Software, 2016.
“The possibility to usability-test a system in a few hours is tantalizing, given that, in-house that task often takes considerably longer (LaToza, 2016).”
![Page 33: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/33.jpg)
New Types of IS with Advanced Capabilities IS to benefit from advanced capabilities developed thanks to HC
– E.g., sentiment detection in tweets; translation support for a broader set of languages; Emergence of “Crowdsourcing Information Systems”
– “socio-technical systems that produce informational products and/or services for internal or external customers by harnessing the potential of crowds” (Geiger et al., 2012)
Marta Sabou | CONFENIS| 13.12.16 | Vienna Source: D. Geiger et. al., Crowdsourcing Information Systems: Definition, Typology and Design. Proc. of the 33rd International Conference on Information Systems, 2012.
Crowd Rating Crowd Creation
Crowd Processing Crowd Solving
Crowdsourced IS
Valu
e D
eriv
ed
Non
-em
erge
nt
Em
erge
nt
Differentiation between contributions
Homogeneous Heterogeneous
![Page 34: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/34.jpg)
Managing the Crowd-Workforce
Opportunities– Fighting poverty– Democratization of
participations: liberating to choose when, what and where to contribute
– Support learning and self-improvement through work
– Advertise STEM research and science to young people
Challenges– Ethical issues: low wages
($2/hour), lack of worker rights
– Prevent addiction, prolonged-use, user exploitation
Marta Sabou | CONFENIS| 13.12.16 | Vienna
![Page 35: Human Computation and Crowdsourcing for Information Systems](https://reader033.vdocuments.net/reader033/viewer/2022051404/589e05071a28ab67278b4969/html5/thumbnails/35.jpg)
THANK YOU!
HC&C: emerging computational paradigm where computer systems work together with large groups of human contributors to solve tasks that neither of them could solve by themselves.
HC&C is: revolutionizing NLP research; supports Semantic Web ontology engineering and activities across the Software Engineering life cycle.
Impact on IS: IS will be created faster and will be more robust; IS will have advanced capabilities; IS must be aware of opportunities and challenges when managing crowdworkers.