managing social communities

104
Web Science & Technologies University of Koblenz Landau, Germany Managing Social Communities Steffen Staab Acknowledgements to ROBUST Project team & WEST Team, in particular K. Dellschaft, J. Kunegis, F. Schwagereit

Upload: net2-project

Post on 18-Dec-2014

260 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Managing Social Communities

Web Science & Technologies University of Koblenz ▪ Landau, Germany

Managing Social Communities

Steffen Staab

Acknowledgements to ROBUST Project team & WEST Team, in particular

K. Dellschaft, J. Kunegis, F. Schwagereit

Page 2: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 2

Semantic Web

Web Retrieval

Interactive Web

Multimedia Web

Software Web

Institut WeST – Web Science & Technologies

eGovernment eMedia eScience eOrganizations ePerson

Institute for Computer Science

Institute for Information Systems

Leibniz Institute for Social Sciences (GESIS)

Page 3: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 3

Plan for this Talk

1 Web

2 Science

Page 4: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 4

Social Communities

…are everywhere

c

Page 5: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 5

Content, User & Networks Analysis Understanding, response time

Opportunities Open innovation, improved user support,… increase business value

Data Storage and Processing

Scalability, heterogeneity

Business Value Product support & innovation, CRM, Expertise management, Marketing, Advertising

Online Communities Intranet, Extranet, Internet

Risks Bad content quality, social ill behavior,… jeopardize business value

Vorführender
Präsentationsnotizen
Adrian
Page 6: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 6

Large-scale Testbeds

SAP (B2B) Community Network

IBM (E2E) Developer Network

Polecat (C2C)

2009 99K accounts

2013 800K accounts

2009 1.5M users 150K access/day

2013 5M users 1200K accesses/day

2009 …

2013 millions posts/day 1TB data/day

Business Partner Network CRM for IT

Online Marketing

Corporate Knowledge Management

2

Vorführender
Präsentationsnotizen
Adrian
Page 7: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 7

SAP Business Partner Use Case

SAP Developer Network

Posts per day Size of user generated content (posts) Number of users

2007 2009 2013 2007 2009 2013 2007 2009 2013

SAP 5000 6000 7000 1M 4M 10.0M

1M 1.7M 4.8M

Page 8: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 8

ROBUST: IBM Employee Use Case Business Data Created per day Number of users

2007 2009 2013 2007 2009 2013

IBM Activities Entry 700 2750 5000 53200 143600 200000

IBM Blogs Entries 120 30 60 34600 77750 100000

IBM Communities 3 23 50 3000 181950 250000

IBM Bookmarks 800 900 1000 8500 22400 50000

IBM Wikis NA 40 100 NA 35450 100000

IBM Files NA 290 1000 NA 45160 100000 IBM Overall 1623 4033 7210 500000* 500000* 500000*

Page 9: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 9

Risks in Online Communities

Definition: Risk Probability of an event occurring Impact of the event occurring

Risk management

Process for managing costs, benefits and likelyhoods Detect high impact risks in time even if

they generate expensive false alarms Ignore very low impact risks

even if they can be reliably detected Types of risks

Non-compliance with the community policies/polity Scamming or spamming behavior Lower involvement and productivity Decrease of user satisfaction Loss of community dynamics

SAP: SCN Award Points Scamming • Experts reputation decreases • Business users leave the forum

Web: Public communities • Death of TechCrunch forum due to spam and lack of management Loss of 1% experts loss of high revenue

Loss of 10% lurkers low impact

Cost Benefit

Likelihood

8

Vorführender
Präsentationsnotizen
Matthew Question 7: Please explain what type of risks need to be detected in communities, which of these risks will be reliably detected by the system proposed and what justifies the hypothesis that the cost of detecting such risks will be smaller than cost of the risk going undetected. What happens if SAP / IBM / Pidgin lose 1% of customers What can lead to such loss ? Risks detected: Community health indicators (activity, loss of interest, …) Off topics, spam What is market perspective for the technology?
Page 10: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 10

Communities: dynamics and confidentiality

ROBUST supports decision making for users, hosts and service providers Managing growth & decline

Identify, encourage, safeguard core users Social matching Define/maintain etiquette and policies Manage negative behavior and conflicts Content matching Recognize, categorize decline and growth Redirect users to other communities

Merging communities Cross community topic detection to stimulate inter-community interactions

Splitting communities Identification of clusters/compartments of members that can be separate

Vorführender
Präsentationsnotizen
Matthew Question 12: Please explain how risk detection could be appropriately reported when risks are detected over communities whose activities and dynamics are presumably kept confidential. Question 8: Please explain in what way the system envisioned supports the merging and splitting of communities and helps managing their growth or decline.
Page 11: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 11

Agenda

• Risks and Opportunities in Social Communities: the ROBUST project

• Many related Talks in this Summer School

Robust partners Alani: Monitoring and analysis of social networks Karnstedt: User churn

Closely related Greene: Network Analysis Bernstein: Scalable infrastructures

But here comes the biased account from work in our institute

Page 12: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 12

Plan for this Talk

1 Web

2 Science

Page 13: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 13

Bild eines schwarzen Lochs

Flickr cc, Jan 7 2009 by thebadastronomer

Vorführender
Präsentationsnotizen
The Web could absorp everything into one application (say Facebook) From a macro perspective this could be a reasonable answer, Just like people thought that a black hole generated by the Large Hadron Collider could slurp up, First Geneva, then Switzerland and then the rest of planet Earth. But, of course people like Hawking have looked more closely and have observed that it is not Sufficient to only take the macro perspective (black hole digest everything), But also to look at the details. And the details say: if there is a black hole, then it will dissolve because of its radiation. And there is an interesting mechanism that makes such radiation let a black hole evaporate over time.
Page 14: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 14

Agenda

• Risks and Opportunities in Social Communities: the ROBUST project

• Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level,

Understanding collective effects (macro level) arising from individual behavior (micro level)

• Predicting dynamic system behavior, recognizing behavior deviating from the model

• Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action

Page 15: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 15

Better understanding of the tagging process

Cooperative classification of resources Which factors influence the tagging process?

• Background knowledge of the user? • Tag assignments of other users?

Hypothesis: Tagging involves imitation of other users AND selection of tags from background knowledge of users.

Page 16: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 16

Methodology

Conceptualization

Own Knowledge

Shared terminology

Something else? User interface

Tagging Behavior

Joint Stochastic Model

Model of Own Knowledge

Model of Sharing

Model of User Interface Influence

Simulated Tagging Behavior

Com

parison of Statistics

Page 17: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 17

Components of Analysis

Properties of Tag Streams Stream view of Folksonomies Co-occurrence streams Resource streams

Dynamic model for Tagging Systems Simulating background knowledge Simulating tag imitation

Simulation Results Co-occurrence streams Resource streams

Observations in

the real world

Stochastic models of influence

Which models best fit the

reality?

Page 18: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 18

Stream Views of a Folksonomy

Folksonomies: Vertices: Users, tags, resources Edges: Tag assignments Postings:

• Tag assignments of a user to a single resource • Can be ordered according to their time-stamp

Page 19: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 19

Co-occurrence Streams

Co-occurrence Streams: All tags co-occurring with a given tag in a posting Ordered by posting time

Co-occurrence stream for 'apple': {mackz, r1, {apple, tree}, 13:25}

{klaasd, r2, {apple, mac, ibook}, 13:26} {mackz, r2, {apple, macintosh, stevejobs}, 13:27}

tree, mac, ibook, macintosh, stevejobs

Tag |Y| |U| |T| |R|ajax 2.949.614 88.526 41.898 71.525blog 6.098.471 158.578 186.043 557.017xml 974.866 44.326 31.998 61.843

Page 20: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 20

Properties of Co-occurrence Streams – Tag Growth

linear growth

Page 21: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 21

Properties of Co-occurrence Streams – Tag Frequencies

power law

Page 22: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 22

Resource Streams

Resource Streams: All tags assigned to a resource Ordered by posting time

Resource stream for 'r2': {mackz, r1, {apple, tree}, 13:25}

{klaasd, r2, {apple, mac, ibook}, 13:26} {mackz, r2, {apple, macintosh, stevejobs}, 13:27}

apple, mac, ibook, apple, macintosh, stevejobs

Page 23: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 23

Properties of Resource Streams – Tag Frequencies

Page 24: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 24

Properties of Resource Streams – Tag Frequencies

Page 25: Managing Social Communities

Web Science & Technologies University of Koblenz ▪ Landau, Germany

Simulating the Evolution of Tag Streams

Page 26: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 26

Simulating tag streams

Which of my concepts represent this web page? How do I tag

this web page?

Which combination of inspirations develop the same statistics as the one observed for delicious?

Inspiration for conceptualization from:

1. Most popular tags

2. Most recently used tags

3. Tags used for this resource

4. Tags co-occuring with similar text documents

5. Creating completely new tags

6. …

Page 27: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 27

The Delicious User Interface

Imitating previous tag assignments:

Recommended tags: Intersection of tags of a user and tags already assigned to the resource.

Your tags: Tags of the user. Popular tags: 7 most popular tags assigned to the resource.

Page 28: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 28

Simulating a Tag Stream

p(w|t): Probability of selecting word w for topic t. Modeled by word distributions in a topic centered text corpus.

n: Number of visible previous tags.

h: Maximal number of previous tag assignments used for determining ranking of the n distinct tags.

Start with empty tag stream Each simulation step appends a new tag assignment Simulation of a single tag assignment:

Page 29: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 29

Modeling Background Knowledge

PBK: Probability of selecting from background knowledge p(w|t): Probability of selecting word w for topic t. Modeled by word

distributions in a topic centered text corpus. p(w|r): Probability of selecting word w for resource r.

Text Corpora Del.icio.us Text Corpora

Page 30: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 30

Modeling Tag Imitation

PI = 1 – PBK: Probability of imitating a previous tag assignment n: Number of visible top-ranked tags h: Maximal number of previous tag assignments used for determining

ranking of the n distinct tags

PBK t t-1 t-2 t-3 t-4 t-5 … … t-h

1-PBK

1 2 3 … n

Page 31: Managing Social Communities

Web Science & Technologies University of Koblenz ▪ Landau, Germany

Simulation Results

Page 32: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 32

Overall Scheme

Conceptualization

Own Knowledge

Shared terminology

Something else? User interface

Tagging Behavior

Joint Stochastic Model

Model of Own Knowledge Model of Sharing

Model of User Interface Influence

Simulated Tagging Behavior

Com

parison of Statistics

Page 33: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 33

Simulating Co-occurrence Streams

Tag growth: Influenced by PBK and p(w|t)

Tag Frequencies: Influenced by PBK, p(w|t), n, h n: Semantic breadth of a topic (blog: 100 tags,

ajax: 50 tags, xml: 50 tags; Cattuto et al. 2007) h: No hint for realistic values. Good guesses may be 500

and 1000.

Page 34: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 34

Co-occ. Streams – Simulated Tag Growth

Page 35: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 36

Co-occ. Stream – Simulated Tag Frequencies

Page 36: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 37

Simulating Resource Streams

PI and PBK: Values comparable to co-occurrence streams p(w|r): Approximated by p(w|t) n: 7 tags are visible (cf. Delicious user interface) h: Smaller value than for co-occurrence streams

Page 37: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 38

Res. Streams – Simulated Tag Frequencies

Page 38: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 40

Frequency RankCo-occur. Streams Resource Streams Tag Growth

Polya Urn Model o o fixed sizeSimon Model o o linear

YS Model w/ Memory + o linearHalpin et al. Model o o linear

Our Model + + power-law

Lessons learned

Black holes do not only eat mass they also dissolve by emitting radiation

Imitation AND background knowledge are needed for

explaining properties of tag streams Probability of imitating previous tag assignments: ~70-90%

[Dellschaft+Staab, ACM Hypertext 2008]

Epistemic Model

Page 39: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 41

Solar System

Flickr, cc Sep 1 2008 by Image Editor

Jupi

ter

Sat

urn

Nep

tun

Ura

nus

Page 40: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 42

Agenda

• Risks and Opportunities in Social Communities: the ROBUST project

• Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level,

Understanding collective effects (macro level) arising from individual behavior (micro level)

• Predicting dynamic system behavior, recognizing behavior deviating from the model

• Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action

Page 41: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 43

Overall Scheme

Conceptualization

Own Knowledge

Shared terminology

Something else? User interface

Tagging Behavior

Joint Stochastic Model

Model of Own Knowledge Model of Sharing

Model of User Interface Influence

Simulated Tagging Behavior

Com

parison of Statistics

Page 42: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 44

What is our Uranus?

What is this?

Page 43: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 45

Uranus = Spam

Effect of removing 257 spammers of 12.777 users from the ‘bookmark’ stream

[Dellschaft+Staab,WebSci 2010]

Page 44: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 46

Why care? The Bibsonomy Example

Complete snapshot of Bibsonomy system Manually labeled ground truth of spammers in the data set

Users Tags Resources TAS

Spammers 29,248 297,846 1,197,354 13,258,759

Non-Spammers 2,467 61,154 234,143 816,196

Page 45: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 47

Why care? The Delicious Example

Crawled during the TAGora Project Amount of spammers not known exactly Estimation based on random sample of 500 users: With 95% probability: Between 1.972 and 12.949 spammers Delicious most likely already applies spam detection Why care about ~ 1.5% spammers in Delicious?

Users Tags Resources TAS

532,938 2,482,850 18,778,566 140,305,446

Page 46: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 48

Filtering Results (Users)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

2000

4000

6000

8000

10000

12000

14000

16000

Number of Spammers and Non-Spammers

SpammerNon-Spammer

Page 47: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 49

Filtering Results (Tag Assignments)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

50000

100000

150000

200000

250000

300000

350000

400000

450000

Filtered and unfiltered number of TAS

SpamNon-Spam

Page 48: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 50

That’s why

Effect of removing 257 spammers of 12.777 users from the ‘bookmark’ stream

Page 49: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 51

How statistically significant is the epistemic model for normal users?

Page 50: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 52

Lessons learned

Uranus was discovered because it affected Neptun Pluto was discovered because it affected Uranus! Spammers can be discovered by their behavior,

even if you do not know what kind of spam they are producing!

Page 51: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 53

How do constellations in the sky evolve?

http://www.flickr.com/photos/furious-angel/2142647358/sizes/o/in/photostream/

Page 52: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 54

Agenda

• Risks and Opportunities in Social Communities: the ROBUST project

• Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level,

Understanding collective effects (macro level) arising from individual behavior (micro level)

• Predicting dynamic system behavior, recognizing behavior deviating from the model

• Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action

Page 53: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 55

Example: Network

Person Friendship

Page 54: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 56

SUGGESTING WHOM TO LINK TO NEXT

Page 55: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 57

Use Networks for Recommendation

Goal: Predict who a person will add as friend Facebook's algorithm: find friends-of-friends → Problem: Rest of the network is ignored!

:-(

me

Vorführender
Präsentationsnotizen
- restrict to unweighted, undirected, unipartite graphs.
Page 56: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 58

Algebraic Graph Theory

0 1 0 0 0 01 0 1 1 0 00 1 0 1 0 00 1 1 0 1 00 0 0 1 0 10 0 0 0 1 0

Represent a network by an adjacency matrix A: Aij = 1 when i and j are connected Aij = 0 when i and j are not connected A is square and symmetric.

1 2 4 5 6

3

A =

1

2 3

4 5

6

1 2 3 4 5 6

Page 57: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 59

Baseline: Friend of a Friend Model

Count the number of ways a person can be found as the friend of a friend. Consider the matrix product AA = A2

1 2 4

3 0 1 0 0 0 01 0 1 1 0 00 1 0 1 0 00 1 1 0 1 00 0 0 1 0 10 0 0 0 1 0

1 0 1 1 0 00 3 1 1 1 01 1 2 1 1 01 1 1 3 0 10 1 1 0 2 00 0 0 1 0 1

=

2

Page 58: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 60

Eigenvalue Decomposition

Write the matrix A as a product:

A = UΛUT

where U are the eigenvectors UTU = I Λ are the eigenvalues Λij = 0 when i ≠ j

Page 59: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 61

Computing A2

Use the eigenvalue decomposition A = UΛUT

A2 = UΛUT UΛUT = UΛ2UT

Exploit U and Λ: UTU = I because U contains eigenvectors (Λ2)ii = Λii

2 because Λ contains eigenvalues Result: Just square all eigenvalues!

Page 60: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 62

Friend of a Friend of a Friend

Compute the number of friends-of-friends-of-friends: A3 = UΛUT UΛUT UΛUT = UΛ3UT

1 2 4 5 6

3

1 2 3 4 5 6 1 2 3 4 5 6

0 1 0 0 0 01 0 1 1 0 00 1 0 1 0 00 1 1 0 1 00 0 0 1 0 10 0 0 0 1 0

0 3 1 1 1 03 2 4 5 1 11 4 2 4 1 11 5 4 2 4 01 1 1 4 0 20 1 1 0 2 0

3

=

Page 61: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 63

Matrix Exponential

The matrix exponential can be written as a power sum with decreasing coefficients:

exp(A) = I + A + 1/2 A2 + 1/6 A3 + . . .

Recommendations for user ④: ① > ⑥ > ⑦

1 2 4 5

3

1 2 3 4 5 6 1 2 3 4 5 6 6

0 1 0 0 0 0 01 0 1 1 0 0 00 1 0 1 0 0 00 1 1 0 1 0 00 0 0 1 0 1 00 0 0 0 1 0 10 0 0 0 0 1 0

exp =

1.66 1.72 0.93 0.98 0.28 0.06 0.011.72 3.57 2.70 2.93 1.04 0.29 0.060.93 2.70 2.86 2.71 0.99 0.28 0.060.98 2.93 2.71 3.63 1.95 0.76 0.220.28 1.04 0.99 1.95 2.35 1.59 0.640.06 0.29 0.28 0.76 1.59 2.23 1.380.01 0.06 0.06 0.22 0.64 1.38 1.59

7 6

7

7

0.98 0.76 0.22

Page 62: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 64

Why the Matrix Exponential An = Number of paths of length n aA2 + bA3 + cA4 + . . . = Number of paths, weighted by path length → New edges more likely to appear when there are many paths already → When a > b > c > . . . > 0, short paths are weighted more

Page 63: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 65

Computing Power Series

Let p(A) be a power series:

p(A) = aA2 + bA3 + cA4 + . . . = aUΛ2UT + bUΛ3UT + cUΛ4UT + . . .

= U(aΛ2 + bΛ3 + cΛ4 + . . .)UT

= Up(Λ)UT

Therefore:

Power series change only the eigenvalues!

Page 64: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 66

TRACKING THE EVOLUTION OF THE NETWORK AS A WHOLE

Page 65: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 67

Diversity • Many, equally-sized subcommunities • High entropy • ‘Flat’ structure Regularity • Few large subcommunities • Low entropy • Many ‘hubs’

Page 66: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 68

Network Evolution

• How did a network look at time t? • Idea: Observe the change of diversity/regularity over time

⇒ ⇒

Page 67: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 69

Outline

1. Power-law exponent 2. Weighted spectral distribution 3. Network entropy 4. Network rank

Page 68: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 70

1. Power-law Exponent

Number of neighbors is unevenly distributed:

Results in a power-law (Newman 2006) Higher exponent γ denotes less regularity

C(n) ∼ n−γ

Epinions trust network (Massa et al. 2005)

Page 69: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 71

1. Power-law Exponent over Time

γ shrinks ⇒ Network becomes more regular

Epinions trust network (Massa et al. 2005)

Page 70: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 72

2. Weighted Spectral Distribution

• Consider the n×n matrix N defined by Nij = 1 / sqrt(d(i)d(j)) when (i,j) is an edge Nij = 0 otherwise Then the distribution of the eigenvalues of N is called the

weighted spectral distribution (WSD) (Fay et al. 2010) Eigenvalues nearer to ±1: diversity Eigenvalues nearer to 0: regularity

Page 71: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 73

2. Weighted Spectral Distribution over Time

• The WSD shifts to zero ⇒ Regularization The WSD shifts towards zero ⇒ The network becomes regular

CiteULike user–tag network (Emamy et al. 2007)

Page 72: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 74

3. Network Entropy

• Write the graph G as a sum of subgraphs Gk

G = G1 ∪ G2 ∪ . . . ∪ Gr

Each Gk has weighted edges, with total weight λk

• When picking an edge from G at random, the probability of it being in community Gk is

λk / (λ1 + λ2 + . . . + λr) = λk / L • The entropy of this distribution is (Kunegis et al. 2011)

H(G) = −Σk (λk / L) log (λk / L) • Entropy: Effective number of subcommunities

Page 73: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 75

3. Network Entropy over Time

Entropy is constant ⇒ Constant number of communities

Time (t)

Entropy (H

(G))

0

absolute

zoom

Enron email network (Klimt et al. 2004)

Page 74: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 76

4. Network Rank

Decompose network into subcommunities:

G = G1 ∪ G2 ∪ . . . ∪ Gr

The rank r is a measure of diversity:

rank(G) = r Weighted rank:

rank∗(G) = Σk |Gk| / |G1| Robust measure of diversity (Kunegis et al. 2011)

Page 75: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 77

4. Network Rank over Time

• Increasing network rank: increasing diversity • Shrinking network rank: shrinking diversity

Time (t)

Netw

ork rank (rank∗ (G

))

Enron email network (Klimt et al. 2004)

Page 76: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 78

More Network Rank Plots

hep-th citations Wikipedia elections Epinions trust network

frwikibooks edits MIT conference contacts YouTube social network

(biased towards good examples of convex evolution)

Page 77: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 79

Conclusion

• Power-law exponent shrinks – Connection diversity shrinking

• Weighted spectral distribution shifts to zero – Emerging main components

• Entropy is constant – Effective number of communities is constant

• Network rank increases, then shrinks – Two-phase- model of expansion

Page 78: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 80

Watch out!

KONECT – Koblenz Network Collection http://uni-koblenz.de/~kunegis/paper/kunegis-

konect.poster.pdf Coming soon! Follow #ictrobust or @kunegis or @ststaab

Page 79: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 81

Why has the sky the density it has?

Flickr, cc Oct 14, 2007, Michael Donough

Page 80: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 82

Why do tagging systems have so little spam?

User Roles

Content Quality

Community Policy

Content Process

Administrative Process

Page 81: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 83

Agenda

• Risks and Opportunities in Social Communities: the ROBUST project

• Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level,

Understanding collective effects (macro level) arising from individual behavior (micro level)

• Predicting dynamic system behavior, recognizing behavior deviating from the model

• Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action

Page 82: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 84

Yahoo Answers

• Ensure quality of user generated content

• Use of administrators and community moderators How?

• Policy influences community processes

Page 83: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 85

SURVEY OF GOVERNANCE MODELS

Page 84: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 86

Communities need Governance

Steering and coordinating actions of community members Goal: Successful and flourishing community High quality user-generated content Active community members

[ http://www.flickr.com/photos/61433480@N02/5593890914/, http://www.flickr.com/photos/boojee/3733902852/ ]

[Benz2004]

Page 85: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 87

Motivation

Different types of Web communities User-generated content (video, photos, comment, article,

questions, answers, posting, review text)

What are the most successful means of governance for user-generated content?

Analyze successful platforms and compare

their means of governance!

Page 86: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 88

Means of Governance

1. Direct intervention of community owner Affecting content or users based on apparent properties

2. Functionality of the community platform

User-generated Content

Community Member

Assessment Ratings Text Reviews Bookmarks Abuse Reports

Selection & Ranking Ratings

Time Views Replies Score

Hide Low Quality

Content Modification

Complex User Roles

Page 87: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 89

Method

Selection of 250 most prominent web sites with community functionality according to Alexa Page Rank

Clustering web sites in four groups according to purpose

Top-5 web sites of each group analyzed (*)

Social Media Editorial News

Social Networking Social Reviewing

Page 88: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 90

Key Results

(1) Abuse Reports are a successful means of governance. • 16 occurrences • Restricted to filter out unwanted content • Staff needed – expensive but efficient [Schwagereit2010]

(2) Simple ratings are dominant – but battle between “Like” and “Like/Dislike” • “Like”: 9 occurrences • “Like/Dislike”: 7 occurrences • Tradeoff between simplicity and improved ranking ability

Page 89: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 91

Key Results

(3) Creation time is most implemented ranking criterion • 18 occurrences • Others: score: 8, ratings: 6 • Important content is renewed - unimportant content will be

forgotten

(4) Content modification and user roles are rarely implemented 2 occurrences Requires complex role system and users

who understand it

Page 90: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 92

GOVERNANCE MODEL: DEEP DIVE - SIMULATION

Page 91: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 93

Methodology Principle

1. Define a Web Community model (Lycos IQ, Yahoo Answers…)

2. Adapt this model to an existing community 3. Estimate parameters

4. Define quality measure

5. Simulate community behaviour

6. Compare simulation results with real data 7. Analyze quality measures wrt variations of CoSiMo

parameters

Page 92: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 94

Dataset Lycos IQ

Time Period: 909 days Users: 34.327 Administrators: 36 Questions: 1.031.982 Answers: 2.996.446 Deleted non-compliant Answers: 21.139

Vorführender
Präsentationsnotizen
only answers considered because, for questions users don’t need to be registered
Page 93: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 95

Observed parameters (input to simulation)

0-9991000-19992000-29993000-39994000-49995000-59996000-69997000-7999>7000

1

10

100

1000

10000

100000

0.0-

0.09

0.1-

0.19

0.2-

0.29

0.3-

0.39

0.4-

0.49

0.5-

0.59

0.6-

0.69

0.7-

0.79

0.8-

0.89

0.9-

1.0

Answers per year

Number of Users

Rate of Compliant Answers

Page 94: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 96

Example Behaviors and Example Policies

Reading Policies for Administrators:

PA: random selection of postings

PB: random selection of postings that no other administrator has examined so far

PC: selection of postings that were most often reported by users for being non-compliant

Promotion Policy: PM-X : ordinary users become

moderators (who can delete postings) when having at least X bonus points

Behaviors of Ordinary Users: • Create new postings

• Read existing postings • Report non-compliant postings OR give bonus points to poster

Moderator Users:

• Create new postings • Read existing postings

• Delete non-compliant posting OR give bonus points to poster

Administrators: •Read existing postings

•Delete non-compliant postings

Page 95: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 97

How many administrators are needed?

510204080

160

320

640

1280

2560

0,65

0,75

0,85

0,95

1,05

1418722881152Additional non-compliant

Postings (per day)

Recent Posting Quality

Number of Administrators

0,95-1,050,85-0,950,75-0,850,65-0,75

Page 96: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 98

Fighting spam with administrators…

0,990,9920,9940,9960,998

1

1

9 72 576

Applied Policies

Recent Posting Quality

Number of Administrators

0,998-10,996-0,9980,994-0,9960,992-0,9940,99-0,992

Variation of policies and number of administrators • Efficient policies result in high quality content • A minimum of 18 administrators are needed • Many moderators are needed to bring the quality to a high level

Page 97: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 99

Fighting spam with user moderators…

51020408016032064012802560

0,60,650,70,750,80,850,90,951

PAPA

+PB

PA+P

B+PC

PA+P

B+PC

+PM

3…PA

+PB+

PC+P

M1…

PA+P

B+PC

+PM

800

PA+P

B+PC

+PM

400

PA+P

B+PC

+PM

200

PA+P

B+PC

+PM

100

PA+P

B+PC

+PM

50PA

+PB+

PC+P

M25

PA+P

B+PC

+PM

12

Additional non-compliant

Postings (per day)

Recent Posting Quality

Applied Policies

0,95-1

0,9-0,950,85-0,9

Variation of policies and posting quality • A limited number of administrators has a limited capacity of filtering a surge of non-compliant postings • Moderators are helping to increase quality

Page 98: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 100

Lessons Learned

• Strategy of selecting questionable postings is crucial

• Reporting by normal users is the most effective strategy

• Moderators are not so effective as expected, if they hunt only incidentally for non-compliant content

• Sufficiently strong requirements regarding moderator profiles lead to high quality of moderators

• Policies for promoting users need to be based on a criterion that is time dependent

Page 99: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 101

Agenda

• Risks and Opportunities in Social Communities: the ROBUST project

• Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level,

Understanding collective effects (macro level) arising from individual behavior (micro level)

• Predicting dynamic system behavior, recognizing behavior deviating from the model

• Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action

Page 100: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 102

Are we satisfied here? No! Not by far!

Understand how and why users tag or tweet? -> What are people‘s limitations that affect the system? -> Psychology and Sociology! What are their legal boundaries? -> How can you shape the systems? -> Law! What are organizations‘ incentives? -> Why and how do organizations participate? -> Nice example: open source -> Economy

Page 101: Managing Social Communities

Web Science & Technologies University of Koblenz ▪ Landau, Germany

Thank You!

Page 102: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 104

References

The Slashdot Zoo: Mining a social network with negative edges J. Kunegis, A. Lommatzsch and C. Bauckhage In Proc. World Wide Web Conf., pp. 741–750, 2009.

Learning spectral graph transformations for link prediction J. Kunegis and A. Lommatzsch In Proc. Int. Conf. on Machine Learning, pp. 561–568, 2009.

Spectral analysis of signed graphs for clustering, prediction and visualization J. Kunegis, S. Schmidt, A. Lommatzsch and J. Lerner In Proc. SIAM Int. Conf. on Data Mining, pp. 559–570, 2010.

Network growth and the spectral evolution model J. Kunegis, D. Fay and C. Bauckhage In Proc. Conf. on Information and Knowledge Management, pp. 739–748, 2010.

Page 103: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 105

References

B. Viswanath, A. Mislove, M. Cha, K. P. Gummadi, On the evolution of user interaction in Facebook. In Proc. Workshop on Online Social Networks, pp. 37–42, 2009.

Page 104: Managing Social Communities

Steffen Staab [email protected]

Web Science Doctoral Summer School 106

References

K. Dellschaft, S. Staab. An Epistemic Dynamic Model for Tagging Systems. HYPERTEXT 2008, Proceedings of the 19th ACM Conference on Hypertext and Hypermedia, June 19-21, 2008 - Pittsburgh, Pennsylvania, USA.

K. Dellschaft, S. Staab. On Differences in the Tagging Behavior of Spammers and Regular Users. In: Proc. of WebSci-2010, Raleigh, April, 2010.

F. Schwagereit, S. Sizov, S. Staab. Finding Optimal Policies for Online Communities with CoSiMo. In: Proc. of WebSci-2010, Raleigh, US, April, 2010.