semantic stability in social tagging streams

47
Semantic Stability in Social Tagging Streams Claudia Wagner, Philipp Singer, Markus Strohmaier and Bernardo Huberman

Upload: claudia-wagner

Post on 11-Aug-2014

164 views

Category:

Data & Analytics


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Semantic Stability in Social Tagging Streams

Semantic Stability in Social Tagging Streams

Claudia Wagner, Philipp Singer, Markus Strohmaier and Bernardo Huberman

Page 2: Semantic Stability in Social Tagging Streams

2

Folksonomies

Ontologies

Formal, shared and stableNot formal but shared

and stable?

Page 3: Semantic Stability in Social Tagging Streams

4

1970

1990

2010

http://schwarzenegger.com/

Page 4: Semantic Stability in Social Tagging Streams

5

How can we measure semantic stability?

How can we compare the semantic stabilization process in different systems?

What impacts semantic stability?

Page 5: Semantic Stability in Social Tagging Streams

6

Measuring Semantic StabilityState of the Art

• Relative tag proportions per resource become stable with increasing number of tag assignments [Golder and Huberman, 2006]

• KL-divergence of rank-ordered tag frequency distribution per resource at different time points converges towards zero [Halpin et al., 2007]

• Power Law distributions [Cattuto et al., 2006] – Scale invariance property ensures that regardless how large the system grows the shape of the distribution stays the same

Page 6: Semantic Stability in Social Tagging Streams

7

Some Limitations• Don’t allow comparing the semantic stabilization

process of different systems • Prune tag distributions to top-k tags

– Cannot handle non-conjoint lists of tags• Random tagging process also produces “stable”

description– Tag assignment at timepoint t+1 has less impact on the

tag distribution of a resource than a tag at timepoint t

Page 7: Semantic Stability in Social Tagging Streams

8

ExampleKL-Divergence

• KL-divergence converges towards zero.

• But random baseline also converges towards zero if we assume a constant tagging rate.

• We do not always know the top k tags!

Page 8: Semantic Stability in Social Tagging Streams

9

ExampleRelative Tag Proportion

Page 9: Semantic Stability in Social Tagging Streams

Intuition and Approach• Some descriptors are

more important than others.

• Ranking of (top) descriptors remains stable over time

• All descriptors are equally important.

• Ranking of (top) descriptors changes over time

Schwarz

eneg

geract

or

terminato

r

Hollywood

bodybuild

ing0

0.10.20.3

P(T)

Schwarz

eneg

ger

actor

terminato

r

Hollywood

bodybuild

ing0

0.10.20.3

00.20.4

P(T)

00.20.4

stable

less stable

tn tn+m

tn tn+m

Page 10: Semantic Stability in Social Tagging Streams

Intuition and Approach• Some descriptors are

more important than others.

• Ranking of (top) descriptors remains stable over time

• All descriptors are equally important.

• Ranking of (top) descriptors changes over time

Schwarz

eneg

geract

or

terminato

r

Hollywood

bodybuild

ing0

0.10.20.3

P(T)

Schwarz

eneg

ger

actor

terminato

r

Hollywood

bodybuild

ing0

0.10.20.3

stable

less stable

tn tn+m

tn tn+m

gove

rnor

California

republican CA

00.20.4

00.20.4

P(T)

Page 11: Semantic Stability in Social Tagging Streams

13

Requirements• Rank agreement of the descriptors of a resources

over time

• Weighted rank agreement

• Non-conjoint lists of descriptors

• Random Baseline

Page 12: Semantic Stability in Social Tagging Streams

14

Rank Biased Overlap (RBO)[Webber et al., 2010]

• RBO falls in the range [0, 1], where 0 means disjoint, and 1 means identical

• p lies between 0 and 1 and determines how steep the decline in weights is

• The smaller p, the more top-weighted the metric

Page 13: Semantic Stability in Social Tagging Streams

15

Example

novel

fiction sf

book

London

00.05

0.10.15

0.20.25

0.30.35

0.4

novel sf

fiction

book

London

00.05

0.10.15

0.20.25

0.30.35

0.4

Overlap at depth 1 = 1

P(T) P(T)

tntn+m

Page 14: Semantic Stability in Social Tagging Streams

16

Example

novel

fiction sf

book

London

00.05

0.10.15

0.20.25

0.30.35

0.4

novel sf

fiction

book

London

00.05

0.10.15

0.20.25

0.30.35

0.4

Overlap at depth 2 = 0.5

P(T) P(T)

tntn+m

Page 15: Semantic Stability in Social Tagging Streams

17

Example

novel

fiction sf

book

London

00.05

0.10.15

0.20.25

0.30.35

0.4

novel sf

fiction

book

London

00.05

0.10.15

0.20.25

0.30.35

0.4

Overlap at depth 3 = 1

P(T) P(T)

tntn+m

Page 16: Semantic Stability in Social Tagging Streams

18

Effect of the Paramter p

Page 17: Semantic Stability in Social Tagging Streams

19

Tie correction for Rank Biased Overlap

• RBO does not penalize ties• We want to penalize ties since they show that users have

not agreed on a ranking

• Sum only over those depths which occur in at least one of the two rankings

Page 18: Semantic Stability in Social Tagging Streams

Same concordant pairs: (A,D) and (B,D) and (C,D)

A B C D0

102030405060708090

C B A D0

102030405060708090

RBOorig = 0.2RBOmod= 0.2

A B C D0

102030405060708090

A B C D0

102030405060708090

RBOorig = 0.34RBOmod= 0.17

No Ties Ties

tn tn+m tn tn+m

R1 R2

A B C D C B A D A B C D C B A D

Freq

uenc

y

Freq

uenc

y

Page 19: Semantic Stability in Social Tagging Streams

23

Semantic Stabilization on a Resource Level

• Tag distributions of Twitter users become semantically stable between 1k and 2k tag assignments

• The RBO values of random tagging distributions increase slower and are significantly lower

Page 20: Semantic Stability in Social Tagging Streams

24

Semantic Stabilization on a System Level

• How can we compare the semantic stabilization process in different systems?

• We call a resource description semantically stable after tn+m tag assignments, if the RBO value between its tag distribution at point tn and tn+m is equal or greater than k.

Page 21: Semantic Stability in Social Tagging Streams

25

Semantic Stabilization on a System Level

After 1250 tag assignments 90% of all resources have a stability above 0.61

Page 22: Semantic Stability in Social Tagging Streams

26

Empirical StudyTwitter

Medium level of semantic stability is reached after 1k-2k tag assignments

Page 23: Semantic Stability in Social Tagging Streams

27

Empirical StudyTwitter and Delicious

Tag streams in Delicious stabelize faster and sign.

higher than in Twitter

Page 24: Semantic Stability in Social Tagging Streams

28

Empirical StudyTwitter, Delicious and LibraryThing

Same is true for tag streams of books in

LibraryBook

Page 25: Semantic Stability in Social Tagging Streams

29

Empirical StudyRandom Baseline

Page 26: Semantic Stability in Social Tagging Streams

30

Difference between tag and word streams?

Page 27: Semantic Stability in Social Tagging Streams

31

What causes semantic stability?

• Simulations based on the epistemic tagging model [Dellschaft and Staab, 2008].

• Use parameter I as imitation rate and produce tag distributions for I=0, 0.1, ... 1

Page 28: Semantic Stability in Social Tagging Streams

33

What causes stability?

Medium levels of semantic stability are

reached after 1k-2k tag assignments

Page 29: Semantic Stability in Social Tagging Streams

34

What causes stability?

Same is true if we combine BK and imitation

when BK is dominant

Page 30: Semantic Stability in Social Tagging Streams

35

What causes stability?

If imitation and BK are combined an imitation is dominant higher levels of

semantic stability are reached faster

Page 31: Semantic Stability in Social Tagging Streams

36

What causes stability?

• Combination of shared background knowledge and imitation behaviour (where imitation is more important) leads to the fastest and highest stabilization.

• Natural language systems show similar stabilization as social tagging systems where no imitation is supported

Page 32: Semantic Stability in Social Tagging Streams

37

Conclusions & Implications• Attempt to formalize semantic stability in social streams• Novel approach to measure and compare the semantic

stabilization process in different social streams

Why is that useful?• Identify social streams (e.g. tag stream of URL or word stream

of hashtags) which are semantically stable – Extract shared and agreed-upon semantic knowledge from social

streams• Select systems that provide semantically stable streams

Page 33: Semantic Stability in Social Tagging Streams

40

References• D. Bollen and H. Halpin. The role of tag suggestions in folksonomies. In Proceedings of the 20th ACM

conference on Hypertext and hypermedia, HT ’09, pages 359–360, New York, NY, USA, 2009. ACM.• C. Cattuto, Semiotic dynamics on social tagging communities. The European Physical Journal C - Particles

and Fields August 2006, Volume 46, Issue 2 Supplement, pp 33-37• A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. SIAM Rev.,

51(4):661–703, Nov. 2009.• K. Dellschaft and S. Staab. An epistemic dynamic model for tagging systems. In HT ’08: Proceedings of the

nineteenth ACM conference on Hypertext and hypermedia, pages 71–80, New York, NY, USA, 2008. ACM.• S. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems. Journal of Information

Science, 32(2):198–208, April 2006.• H. Halpin, V. Robu, and H. Shepherd. The complex dynamics of collaborative tagging. In Proceedings of the

16th international conference on World Wide Web, WWW ’07, pages 211–220, New York, NY, USA, 2007. ACM.

• A. Hotho, R. Jäschke, C. Schmitz, and G. Stumme. Bibsonomy: A social bookmark and publication sharing system. In Proceedings of the Conceptual Structures Tool Interoperability Workshop at the 14th International Conference on Conceptual Structures, pages 87-102, 2006.

• C. T. Kello, G. D. A. Brown, R. Ferrer-i Cancho, J. G. Holden, K. Linkenkaer-Hansen, T. Rhodes, and G. C. Van Orden. Scaling laws in cognitive sciences. Trends in Cognitive Sciences, 14(5):223{232, May 2010.

• W. Webber, A. Moat, and J. Zobel. A similarity measure for indefinite rankings. ACM Trans. Inf. Syst., 28(4):20:1{20:38, Nov. 2010.

Page 34: Semantic Stability in Social Tagging Streams

41

Thank you!

Special thanks to my collaborators (2/3 of them are here):

Page 35: Semantic Stability in Social Tagging Streams

42

Limitations and Future Work• RBO measures ranking but ignores the differences

in the frequencies

• Decay function to weight tag counts– old tag assignments are less important than new ones

• Number and diversity of users who tag a resource might impact the semantic stabilization process

Page 36: Semantic Stability in Social Tagging Streams

43

Alternatives to RBO• Unweighted and conjoint measures

– Kendall tau, Spearman rho• Weighted and conjoint measures

– Weighted Kendall tau• Unweighted and non-conjoint measures

– Intersection metric• Weighted and conjoint

– Cumulative overlap at increasing depths

Page 37: Semantic Stability in Social Tagging Streams

44

Dataset

Page 38: Semantic Stability in Social Tagging Streams

45

Categories of Semantically Unstable Resources

• Entity to which a resource refers changes• Resource (i.e. website) changes • Entity/Topic to which a resource refers is controversial

– website refers to controversial entity/topic on which different viewpoints exist

• External conditions which impact viewpoints on entity/topic change– Website remains stable but viewpoint of taggers on the

entity or topic related with the site change

Page 39: Semantic Stability in Social Tagging Streams

46

Relative Tag Proportion [Golder and Huberman, 2006]

tn+mtn

stableless stable

Page 40: Semantic Stability in Social Tagging Streams

47

Relative Tag Proportion [Golder and Huberman, 2006]

Page 41: Semantic Stability in Social Tagging Streams

48

KL-Divergence [Halpin et al., 2007]

• KL divergence between the rank-ordered frequency distribution of the top 25 tags at different time points

tn+mtn

stableless stable

Page 42: Semantic Stability in Social Tagging Streams

49

KL-Divergence

Page 43: Semantic Stability in Social Tagging Streams

50

Power Law [Cattuto, 2006]

• Is the rank-ordered frequency distribution a power law distribution?

• Is the frequency y of a tag inversely proportional to it's rank r?

tn+mtn

Page 44: Semantic Stability in Social Tagging Streams

51

Power Law [Cattuto, 2006]

• Is it really power law?– Very likely yes according to the maximum

likelihood estimator and Kolmogorov-Smirnov statistic [Clauset et al., 2010]

– Estimate alpha and xmin over some reasonable range

– Compare power law fit to the fit of the exponential function, the lognormal function and the stretched exponential (Weibull) function. Use the log-likelihood ratios to indicate which fit is better.

– We do not find significant differences between the power law fit and the lognormal fit

Page 45: Semantic Stability in Social Tagging Streams

52

RBO

Page 46: Semantic Stability in Social Tagging Streams

53

Stablilization going beyond Baseline Stability

Page 47: Semantic Stability in Social Tagging Streams

54

Stablilization not going beyond Baseline Stability