collaborative work beneath the surface

50
Collaborative work beneath the surface Visitors only look at article pages But much of Wikipedia comprised of other pages Conflict resolution, coordination, policies and procedures

Upload: nola-duffy

Post on 01-Jan-2016

28 views

Category:

Documents


4 download

DESCRIPTION

Collaborative work beneath the surface. Visitors only look at article pages But much of Wikipedia comprised of other pages Conflict resolution, coordination, policies and procedures. Types of work. Talk, user, procedure. Article. Direct work Immediately consumable. Indirect work - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Collaborative work beneath the surface

Collaborative work beneath the surface

• Visitors only look at article pages• But much of Wikipedia comprised of

other pages– Conflict resolution, coordination, policies and

procedures

Page 2: Collaborative work beneath the surface

Types of work

Direct work Immediately consumable

Indirect workCoordination,

conflict

Maintenance work Reverts, vandalism

Article Talk, user, procedure

Page 3: Collaborative work beneath the surface

Less direct work

• Decrease in proportion of edits to article page

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

2001 2002 2003 2004 2005 2006

Edi

t pr

opor

tion

70%

Page 4: Collaborative work beneath the surface

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

2001 2002 2003 2004 2005 2006

Ed

it P

rop

ort

ion

More indirect work

• Increase in proportion of edits to user talk

8%

Page 5: Collaborative work beneath the surface

More indirect work

• Increase in proportion of edits to user talk

• Increase in proportion of edits to procedure

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

2001 2002 2003 2004 2005 2006

Edi

t pr

opor

tion 11

%

Page 6: Collaborative work beneath the surface

More maintenance work

• Increase in proportion of edits that are reverts

00.020.040.060.08

0.10.120.140.160.18

0.2

2001 2002 2003 2004 2005 2006

Ed

it p

rop

ort

ion

7%

Page 7: Collaborative work beneath the surface

More wasted work

• Increase in proportion of edits that are reverts

• Increase in proportion of edits reverting vandalism

00.005

0.010.015

0.02

0.0250.03

2001 2002 2003 2004 2005

Ed

it p

rop

ort

ion

1-2%

Page 8: Collaborative work beneath the surface

Global level

• Coordination costs are growing– Less direct work (articles)+ More indirect work (article talk, user,

procedure)+ More maintenance work (reverts, vandalism)

Kittur, Suh, Pendleton, & Chi, 2007

Page 9: Collaborative work beneath the surface

Article lifespan

• How do articles change over time?• High discussion and coordination

– Kittur et al., 2007; Viegas et al., 2007

• When does this happen?– Hyp 1: Early when articles are growing– Hyp 2: Late when articles are more stable

Page 10: Collaborative work beneath the surface

Article lifespan

Page 11: Collaborative work beneath the surface

User lifespan

• How do users change over time?

Page 12: Collaborative work beneath the surface

Centralization in Wikipedia

• How much centralization?• “Gang of 500” (Jimmy Wales, 2004)

– Small group of ~500 does half the work

• Masses do the work (Aaron Swartz, 2006)

– New users add most of the words

Page 13: Collaborative work beneath the surface

Hypotheses

• Masses dominate• Elite privileged group• Shift from elites to masses

– Technology adoption (Rogers, 1962)

Masses Elites Shift

Page 14: Collaborative work beneath the surface

Elites

• Admins• Editing status (fixed-size)• Editing status (scaling)

Page 15: Collaborative work beneath the surface

Admins

• Waxing and waning of admin influence

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2001 2002 2003 2004 2005 2006

Pro

port

ion

of to

tal e

dits

mad

e by

adm

ins

Nature News, 2/2007; Kittur, Chi, Pendleton, Suh, Mytkowicz, 2007

Pro

port

ion

of

all

ed

its

Page 16: Collaborative work beneath the surface

Admins

• Similar for changed words

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2001 2002 2003 2004 2005 2006

Pro

portio

n ch

ange

d w

ords

(adm

ins)

Pro

port

ion

of

word

s ch

an

ged

Page 17: Collaborative work beneath the surface

Elites

• Admins• Editing status (fixed-size)• Editing status (scaling)

Page 18: Collaborative work beneath the surface

Editing status (fixed size)

Page 19: Collaborative work beneath the surface

Elites

• Admins• Editing status (fixed-size)• Editing status (scaling)

Page 20: Collaborative work beneath the surface

Editing status (scaling)

• Proportional influence of elites still high– Though absolute number of elites growing

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2001 2002 2003 2004 2005 2006

Pro

port

ion

of E

dits

Top 5%

Top 3%

Top 1%

Page 21: Collaborative work beneath the surface

Summary: Centralization

• Centralized elite influence is waning– Decline in admin influence– Decline in data-driven “Gang of 500”

• Decentralized proportional influence remains high– Top 1/3/5% of users account for ~50/70/80%

of edits– The “Bourgeosie”

Page 22: Collaborative work beneath the surface

Challenges for Wikipedia

• Coordination costs• Organization structure• Conflict

Page 23: Collaborative work beneath the surface

Characterizing conflict

Page 24: Collaborative work beneath the surface

Conflict at the article level

• What leads to conflict in articles?• Build a characterization model of article

conflict– Identify page features and metrics

associated with conflict– Automatically identify high-conflict articles

Page 25: Collaborative work beneath the surface

Page metrics

• Chose metrics for identifying conflict in articles– Easily computable, scalable

Metric type Page Type

Revisions (#)Article, talk, article/talk

Page lengthArticle, talk, article/talk

Unique editorsArticle, talk, article/talk

Unique editors / revisions

Article, talk

Links from other articles Article, talk

Links to other articles Article, talk

Anonymous edits (#, %) Article, talk

Administrator edits (#, %)

Article, talk

Minor edits (#, %) Article, talk

Reverts (#, by unique editors)

Article

Page 26: Collaborative work beneath the surface

Defining conflict

• Operational definition for conflict • Revisions tagged controversial

• Conflict revision count

Page 27: Collaborative work beneath the surface

Machine learning

• Predict conflict from page metrics– Training set of “controversial” pages– Support vector machine regression

predicting # controversial revisions (SMOreg; Smola & Scholkopf, 1998)

• Not just conflict/no conflict, but how much conflict

Page 28: Collaborative work beneath the surface

Performance: Cross-validation

• 5x cross-validation, R2 = 0.897

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Predicted controversial revisions

Act

ual c

ontrov

ersial

revi

sion

s

Page 29: Collaborative work beneath the surface

Performance: Cross-validation

• 5x cross-validation, R2 = 0.897

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Predicted controversial revisions

Act

ual c

ontrov

ersial

revi

sion

s

Page 30: Collaborative work beneath the surface

Determinants of conflict

1. —Revisions (talk)2. —Minor edits (talk)3. ˜Unique editors (talk)4. —Revisions (article)5. ˜Unique editors (article)6. —Anonymous edits (talk)7. ˜Anonymous edits (article)

Highly weighted metrics of conflict model:

Page 31: Collaborative work beneath the surface

Identifying untagged articles

• Detect conflicts for unlabeled articles– Majority of articles have never been conflict

tagged

• Testing model generalization– Applied model to untagged articles– Sample of 28 articles rated by 13 expert

Wikipedians

• Significant positive correlation with predicted scores– By rank correlation, p < 0.013 (Spearman’s

rho)

Page 32: Collaborative work beneath the surface

Characterizing conflict

Page 33: Collaborative work beneath the surface

Conflict at the user level

• How can we identify conflict between users?

• Reverts between users as a proxy for user conflict

• Force directed layout to cluster users– Group similar viewpoints– Find conflicts between groups

Page 34: Collaborative work beneath the surface

Dokdo/Takeshima opinion groups

Group A

Group B Group C

Group D

Page 35: Collaborative work beneath the surface

Terry Schiavo

Mediators

Sympathetic to parents

Sympathetic to husband

Anonymous (vandals/spammers)

Page 36: Collaborative work beneath the surface

Cognitive atlas

Page 37: Collaborative work beneath the surface

Visualizing hypotheses

Page 38: Collaborative work beneath the surface

Distributed collaboration

• Lots of people• Each doing a little bit of work• Leads to high quality outcome (i.e., “wisdom

of crowds”)

Francis Galton OxScale

Page 39: Collaborative work beneath the surface

Distributed collaboration

• Applications of distributed collaboration:– Judging: weight of an ox, temperature of a

room– Search: Google PageRank– Predicting: Iowa Electronic Market, Las

Vegas, HP– Filtering: Digg, Reddit– Organizing: del.icio.us

• Common characteristics:– Independent judgments– Independent aggregation

Page 40: Collaborative work beneath the surface

Wikipedia and the wisdom of crowds

• But these are not characteristic of Wikipedia:– Independent judgments– High coordination costs (Kittur et al., 2007)

– Independent aggregation – Competitive aggregation (everyone is editing

the same information)

• To the extent that judgments and aggregation of individual tasks are not independent and instead require coordination and engender conflict, having more editors may not be beneficial and may even be harmful

Page 41: Collaborative work beneath the surface

Travesty of the commoners?

• Increasing size of group generally has negative consequences:– Increased coordination costs– Increased anonymity and social loafing– Decreased attribution and individual reward– More negative social relations– Greater conflict and misbehavior– Loss of control– Cognitive overload

see Bettenhausen, 1991; Levine & Moreland, 1990

Page 42: Collaborative work beneath the surface

Wilkinson & Huberman, 2007

• Examined featured articles vs. non-featured articles– Controlling for PageRank (i.e., popularity)

• Featured articles = more edits, more editors

• “More work, better outcome”: WP similar to other distributed collaboration systems

Nature News (2/27/07)

Page 43: Collaborative work beneath the surface

Problem: Distribution of work

• However, articles can have different distributions of work, even with same edits/editors

• If an article has 1000 edits and 100 editors, it could have:– 1 editor making 901 edits, 99 making 1 edit– 100 editors making 10 edits each

<>

Page 44: Collaborative work beneath the surface

Capturing skew

• Gini coefficient: measures inequality of distribution

• Measure Gini coefficient for each article– Count how many edits each editor makes,

calculate ratio• If an article is driven by few, gini -> 1• If an article is driven by many, gini -> 0

http://en.wikipedia.org/wiki/Gini_coefficient

Page 45: Collaborative work beneath the surface

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 10 100 1000 10000 100000

Edits

Gin

i co

eff

icie

nt

Top15k Page hits

Featured

* Sig difference betw een featured (M=.46) and Top5k (M=.39) gini coeff icients (p < .0001), and betw een Top5k (M=.39) and 5-15k (M=.34, p < .0001)

Old results

Page 46: Collaborative work beneath the surface

0

0.2

0.4

0.6

0.8

1

1 10 100 1000 10000 100000

Edits

Gin

i co

eff

icie

nt

Top15k Page hits

Featured

* Sig difference betw een featured (M=.46) and Top5k (M=.39) gini coeff icients (p < .0001), and betw een Top5k (M=.39) and 5-15k (M=.34, p < .0001)

P(Featured | Gini quintile)

Page 47: Collaborative work beneath the surface

Probability of Being Featured

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5

Gini quintile

P(F

eatu

red

)

ExpertsCrowds

Page 48: Collaborative work beneath the surface

1

10

100

1000

10000

1 10 100 1000 10000 100000

Edits

Un

iqu

e e

dit

ors

Top15k Page hits

Featured

Unique editorsFeat vs. Top15k (M=381), p < .001***

Unique editors x Edits

Page 49: Collaborative work beneath the surface

New results

• Sampled articles at a variety of quality levels– Defined and rated by expert Wikipedians– Hundreds of thousands of articles rated

Page 50: Collaborative work beneath the surface

Cross-sectional analysis

• 900 articles sampled from Start through Featured– Higher quality associated with higher gini,

higher editors

Average of artGini

0

0.1

0.2

0.3

0.4

0.5

0.6

Start-Class B-Class GA-Class A-Class FA-Class

Average of artEditors

050

100150200250300350400

Start-Class

B-Class GA-Class A-Class FA-Class

FA-Class

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 10 100 1000 10000

Number of article editors

Art

icle

Gin

i (0=

equa

l con

trib

s)

FA-Class

A-Class

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 10 100 1000 10000

Number of article editors

Art

icle

Gin

i (0=

equa

l con

trib

s)

A-Class

GA-Class

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 10 100 1000 10000

Number of article editors

Art

icle

Gin

i (0=

equa

l con

trib

s)

GA-Class

B-Class

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 10 100 1000 10000

Number of article editors

Art

icle

Gin

i (0=

equa

l con

trib

s)

B-Class

Start-Class

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 10 100 1000 10000

Number of article editors

Art

icle

Gin

i (0=

equa

l con

trib

s)

Start-Class