Transcript
Page 1: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

UNDERSTANDING WIKIPEDIA

Niki [email protected]

Page 2: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Slowing growth

• Since 2007, slowing growth

Why?• Fewer new topics to

write about• Growing resistance to

new contributions

Proportion reverted edits (by editor class)

Number of active editors per month

Suh, Convertino, Chi, & Pirolli, 2009

Page 3: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Wisdom of crowds poll

What proportion of Wikipedia (in words) is made up of

articles?

0-25% | 25-50% | 50-75% | 75-100%

Page 4: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Wisdom of crowds poll

Page 5: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Article

Page 6: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Discussion

Page 7: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Discussion

Page 8: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Edit history

Page 9: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Edit history

Page 10: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Policies + Procedures

Page 11: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

How does it work?

• “Wisdom of crowds” - Many independent judgments– “with enough eyeballs all bugs are shallow”

• More contributors ->– more information– fewer errors– less bias

Page 12: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Wilkinson & Huberman, 2007

• Examined featured articles vs. non-featured articles– Controlling for PageRank (i.e., popularity)

• Featured articles = more edits, more editors

• More work, more people => better outcomes

Edits Editors

Page 13: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Difficulties with generalizing results

• Cross-sectional analysis– Reverse causation: articles which become

featured may subsequently attract more people

• Coarse quality metrics– Fewer than 2000 out of >2,000,000 articles

are featured• What about coordination?

Page 14: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Coordination costs

• Increasing contributors incurs process losses (Boehm, 1981; Steiner, 1972)

• Diminishing returns with added people (Hill, 1982; Sheppard, 1993)

– Super-linear increase in communication pairs– Linear increase in added work

• In the extreme, costs may exceed benefits to quality (Brooks, 1975)

• The more you can support coordination, the more benefits from adding people“Adding manpower to a late

software project makes it later”

Brooks, 1975

Page 15: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Research question

To what degree are editors in Wikipedia working independently

versus coordinating?

Page 16: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Research infrastructure

• Analyzed entire history of Wikipedia– Every edit to every article

• Large dataset (as of 2008)– 10+ million pages– 200+ million revisions– 2.5+ Tb

• Used distributed processing– Hadoop distributed filesystem– Map/reduce to process data in parallel– Reduce time for analysis from weeks to

hours

Page 17: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Types of work

Direct work Editing articles

Indirect workUser talk, creating

policy

Maintenance work Reverts, vandalism

Page 18: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Less direct work

• Decrease in proportion of edits to article page

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

2001 2002 2003 2004 2005 2006

Edi

t pr

opor

tion

70%

Page 19: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

2001 2002 2003 2004 2005 2006

Ed

it P

rop

ort

ion

More indirect work

• Increase in proportion of edits to user talk

8%

Page 20: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

More indirect work

• Increase in proportion of edits to user talk

• Increase in proportion of edits to policy pages

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

2001 2002 2003 2004 2005 2006

Edi

t pr

opor

tion 11

%

Page 21: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

More maintenance work

• Increase in proportion of edits that are reverts

00.020.040.060.08

0.10.120.140.160.18

0.2

2001 2002 2003 2004 2005 2006

Ed

it p

rop

ort

ion

7%

Page 22: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

More wasted work

• Increase in proportion of edits that are reverts

• Increase in proportion of edits reverting vandalism

00.005

0.010.015

0.02

0.0250.03

2001 2002 2003 2004 2005

Ed

it p

rop

ort

ion

1-2%

Page 23: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Global level

• Coordination costs are growing– Less direct work (articles)+ More indirect work (article talk, user,

procedure)+ More maintenance work (reverts, vandalism)

Kittur, Suh, Pendleton, & Chi, 2007

Page 24: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Research question

How does coordination impact quality?

Page 25: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Coordination types

• Explicit coordination– Direct communication among editors

planning and discussing article• Implicit coordination

– Division of labor and workgroup structure– Concentrating work in core group of editors

Leavitt, 1951; March & Simon, 1958; Malone, 1987; Rouse et al., 1992; Thompson, 1967

Page 26: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Explicit coordination: “Music of Italy”

planning

Page 27: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Explicit coordination: “Music of Italy”

coverage

Page 28: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Explicit coordination: “Music of Italy”

readability

Page 29: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Coordination types

• Explicit coordination– Direct communication among editors

planning and discussing article• Implicit coordination

– Division of labor and workgroup structure– Concentrating work in core group of editors

Leavitt, 1951; March & Simon, 1958; Malone, 1987; Rouse et al., 1992; Thompson, 1967

Page 30: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Implicit coordination: “Music of Italy”

Page 31: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Implicit coordination: “Music of Italy”

TUF-KAT: Set scope and structure

Page 32: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Implicit coordination: “Music of Italy”

Filling in by many contributors

Page 33: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Implicit coordination: “Music of Italy”

Restructuring by Jeffmatt

Page 34: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Research question

• What factors lead to improved quality?– More contributors– Explicit coordination

• Number of communication edits

– Implicit coordination• Concentration among editors

Page 35: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Measuring concentration

• If an article has 100 edits and 10 editors, it could have:– 10 editors making 10 edits each

Page 36: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Measuring concentration

• If an article has 100 edits and 10 editors, it could have:– 10 editors making 10 edits each– 1 editor making 90 edits

Page 37: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Measuring concentration

• If an article has 100 edits and 10 editors, it could have:– 10 editors making 10 edits each– 1 editor making 90 edits

• Measure concentration with Gini coefficient

Page 38: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Measuring concentration

• If an article has 100 edits and 10 editors, it could have:– 10 editors making 10 edits each– 1 editor making 90 edits

• Measure concentration with Gini coefficient

Gini = 0

Page 39: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Measuring concentration

• If an article has 100 edits and 10 editors, it could have:– 10 editors making 10 edits each– 1 editor making 90 edits

• Measure concentration with Gini coefficient

Gini = 0 Gini ~ 1

Page 40: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Measuring quality

• Wikipedia 1.0 quality assessment scale – Over 900,000 assessments– 6 classes of quality, from “Stub” up to

“Featured”– Top 3 classes require increasingly rigorous

peer review• Validated community assessments with

non-expert judges (r = .54***)

Page 41: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Analysis

Page 42: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Analysis

Page 43: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Analysis

Page 44: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Editors + coordination

1. Editors no effect on quality2. Communication increase in quality3. Concentration increase in quality

Page 45: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Communication x Editors

• Communication does not scale to the crowd– Effective with few editors– Ineffective with many editors

Page 46: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Concentration x Editors

• Concentration enables effective harnessing of the crowd– High concentration: more editors increase quality– Low concentration: more editors reduce quality

Page 47: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Summary

• Wikipedia includes large degree of coordination

• Adding more editors does not improve quality– Coordination between editors is critical

• Type of coordination is important– Communication does not scale to large

groups– Concentration does scale to large groups

Page 48: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

TOOLS FOR SOCIAL COLLABORATION

Page 49: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Profits and perils of user-generated content

• Content in Wikipedia can be added or changed by anyone

• Because of this, has become one of the most important information resources on the web– Top 10 most popular websites (Alexa.com)– Millions of contributors

• Also causes problems– Conflict between contributors– Unknown trustworthiness

Page 50: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Denning et al. (2005)

• Risks with using Wikipedia– Accuracy of content– Motives of editors– Expertise of editors– Stability of article– Coverage of topics– Quality of cited information

Insufficient information to evaluate trustworthiness

Page 51: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

History flow

Page 52: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Details

Page 53: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Vandalism

Page 54: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Anonymous contribution

M$: many anonymous contributors

Brazil: few anonymous contributors

Page 55: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Edit war

Page 56: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Conflict at the user level

• How can we identify conflict between users?

Kittur et al., 2007; Suh et al. 2007; Brandes & Lerner, 2008

Page 57: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Terry Schiavo

Mediators

Sympathetic to parents

Sympathetic to husband

Anonymous (vandals/spammers)

Page 58: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Dokdo/Takeshima opinion groups

Group A

Group B Group C

Group D

Page 59: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Ekstrand & Riedl, 2009

Page 60: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Ekstrand & Riedl (2009)

Page 61: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Ekstrand & Riedl (2009)

Page 62: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Trust

• Numerous studies surface trust-relevant information– Editors [Adler & Alfaro, 2007; Dondio et al., 2006; Zeng

et al., 2006]

– Stability [Suh et al., 2008]

– Conflict [Kittur et al., 2007; Viegas et al., 2004]

• But how much impact can this have on user perceptions in a system which is inherently mutable?

Page 63: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

What would make you trust Wikipedia more?

Nothing

Page 64: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

What would make you trust Wikipedia more?

“Wikipedia, just by its nature, is impossible to trust completely. I don't think this can necessarily be changed.”

Page 65: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Hypotheses

1. Visualization will impact perceptions of trust

2. Compared to baseline, visualization will impact trust both positively and negatively

3. Visualization should have most impact when high uncertainty about article• Low quality• High controversy

Page 66: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Design

• 3 x 2 x 2 design

Abortion

George Bush

Volcano

Shark

Pro-life feminism

Scientology and celebrities

Disk defragmenter

Beeswax

Controversial

Uncontroversial

High quality

Low quality

Visualization

• High trust• Low trust• Baseline

(none)

Page 67: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Method

• Users recruited via Amazon’s Mechanical Turk– 253 participants– 673 ratings– 7 cents per rating– Kittur, Chi, & Suh, CHI 2008: Crowdsourcing user

studies

Page 68: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Example: High trust visualization

Page 69: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Example: Low trust visualization

Page 70: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Summary info: Editor

• % from anonymous users

Page 71: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Summary info: Editor

• % from anonymous users

• Last change by anonymous or established user

Page 72: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Summary info: Stability

• Stability of words

Page 73: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Summary info: Stability

• Instability

Page 74: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Summary info: Conflict

• Instability• Conflict

Page 75: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Results

1. Significant effect of visualization– High > low, p < .001

2. Both positive and negative effects– High > baseline, p < .001– Low < baseline, p < .01

3. No effect of article uncertainty– No interaction of

visualization with either quality or controversy

– Robust across conditions

Page 76: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Results

1. Significant effect of visualization– High > low, p < .001

2. Both positive and negative effects– High > baseline, p < .001– Low < baseline, p < .01

3. No effect of article uncertainty– No interaction of

visualization with either quality or controversy

– Robust across conditions

Page 77: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Results

1. Significant effect of visualization– High > low, p < .001

2. Both positive and negative effects– High > baseline, p < .001– Low < baseline, p < .01

3. No effect of article uncertainty– No interaction of

visualization with either quality or controversy

– Robust across conditions

Page 78: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu
Page 79: U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Top Related