semantic stability in social tagging streams

Semantic Stability in Social Tagging Streams

Claudia Wagner, Philipp Singer, Markus Strohmaier and Bernardo Huberman

Folksonomies

Ontologies

Formal, shared and stableNot formal but shared

and stable?

http://schwarzenegger.com/

How can we measure semantic stability?

How can we compare the semantic stabilization process in different systems?

What impacts semantic stability?

Measuring Semantic StabilityState of the Art

• Relative tag proportions per resource become stable with increasing number of tag assignments [Golder and Huberman, 2006]

• KL-divergence of rank-ordered tag frequency distribution per resource at different time points converges towards zero [Halpin et al., 2007]

• Power Law distributions [Cattuto et al., 2006] – Scale invariance property ensures that regardless how large the system grows the shape of the distribution stays the same

Some Limitations• Don’t allow comparing the semantic stabilization

process of different systems • Prune tag distributions to top-k tags

– Cannot handle non-conjoint lists of tags• Random tagging process also produces “stable”

description– Tag assignment at timepoint t+1 has less impact on the

tag distribution of a resource than a tag at timepoint t

ExampleKL-Divergence

• KL-divergence converges towards zero.

• But random baseline also converges towards zero if we assume a constant tagging rate.

• We do not always know the top k tags!

ExampleRelative Tag Proportion

Intuition and Approach• Some descriptors are

more important than others.

• Ranking of (top) descriptors remains stable over time

• All descriptors are equally important.

• Ranking of (top) descriptors changes over time

Schwarz

geract

terminato

Hollywood

bodybuild

0.10.20.3

Schwarz

terminato

Hollywood

bodybuild

0.10.20.3

00.20.4

stable

less stable

tn tn+m

Intuition and Approach• Some descriptors are

more important than others.

• Ranking of (top) descriptors remains stable over time

• All descriptors are equally important.

• Ranking of (top) descriptors changes over time

Schwarz

geract

terminato

Hollywood

bodybuild

0.10.20.3

Schwarz

terminato

Hollywood

bodybuild

0.10.20.3

stable

less stable

tn tn+m

California

republican CA

00.20.4

Requirements• Rank agreement of the descriptors of a resources

over time

• Weighted rank agreement

• Non-conjoint lists of descriptors

• Random Baseline

Rank Biased Overlap (RBO)[Webber et al., 2010]

• RBO falls in the range [0, 1], where 0 means disjoint, and 1 means identical

• p lies between 0 and 1 and determines how steep the decline in weights is

• The smaller p, the more top-weighted the metric

Example

fiction sf

London

0.10.15

0.20.25

0.30.35

novel sf

fiction

London

0.10.15

0.20.25

0.30.35

Overlap at depth 1 = 1

P(T) P(T)

tntn+m

Example

fiction sf

London

0.10.15

0.20.25

0.30.35

novel sf

fiction

London

0.10.15

0.20.25

0.30.35

Overlap at depth 2 = 0.5

P(T) P(T)

tntn+m

Example

fiction sf

London

0.10.15

0.20.25

0.30.35

novel sf

fiction

London

0.10.15

0.20.25

0.30.35

Overlap at depth 3 = 1

P(T) P(T)

tntn+m

Effect of the Paramter p

Tie correction for Rank Biased Overlap

• RBO does not penalize ties• We want to penalize ties since they show that users have

not agreed on a ranking

• Sum only over those depths which occur in at least one of the two rankings

Same concordant pairs: (A,D) and (B,D) and (C,D)

A B C D0

102030405060708090

C B A D0

102030405060708090

RBOorig = 0.2RBOmod= 0.2

A B C D0

102030405060708090

A B C D0

102030405060708090

RBOorig = 0.34RBOmod= 0.17

No Ties Ties

tn tn+m tn tn+m

A B C D C B A D A B C D C B A D

Semantic Stabilization on a Resource Level

• Tag distributions of Twitter users become semantically stable between 1k and 2k tag assignments

• The RBO values of random tagging distributions increase slower and are significantly lower

Semantic Stabilization on a System Level

• How can we compare the semantic stabilization process in different systems?

• We call a resource description semantically stable after tn+m tag assignments, if the RBO value between its tag distribution at point tn and tn+m is equal or greater than k.

Semantic Stabilization on a System Level

After 1250 tag assignments 90% of all resources have a stability above 0.61

Empirical StudyTwitter

Medium level of semantic stability is reached after 1k-2k tag assignments

Empirical StudyTwitter and Delicious

Tag streams in Delicious stabelize faster and sign.

higher than in Twitter

Empirical StudyTwitter, Delicious and LibraryThing

Same is true for tag streams of books in

LibraryBook

Empirical StudyRandom Baseline

Difference between tag and word streams?

What causes semantic stability?

• Simulations based on the epistemic tagging model [Dellschaft and Staab, 2008].

• Use parameter I as imitation rate and produce tag distributions for I=0, 0.1, ... 1

What causes stability?

Medium levels of semantic stability are

reached after 1k-2k tag assignments

Same is true if we combine BK and imitation

when BK is dominant

If imitation and BK are combined an imitation is dominant higher levels of

semantic stability are reached faster

• Combination of shared background knowledge and imitation behaviour (where imitation is more important) leads to the fastest and highest stabilization.

• Natural language systems show similar stabilization as social tagging systems where no imitation is supported

Conclusions & Implications• Attempt to formalize semantic stability in social streams• Novel approach to measure and compare the semantic

stabilization process in different social streams

Why is that useful?• Identify social streams (e.g. tag stream of URL or word stream

of hashtags) which are semantically stable – Extract shared and agreed-upon semantic knowledge from social

streams• Select systems that provide semantically stable streams

References• D. Bollen and H. Halpin. The role of tag suggestions in folksonomies. In Proceedings of the 20th ACM

conference on Hypertext and hypermedia, HT ’09, pages 359–360, New York, NY, USA, 2009. ACM.• C. Cattuto, Semiotic dynamics on social tagging communities. The European Physical Journal C - Particles

and Fields August 2006, Volume 46, Issue 2 Supplement, pp 33-37• A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. SIAM Rev.,

51(4):661–703, Nov. 2009.• K. Dellschaft and S. Staab. An epistemic dynamic model for tagging systems. In HT ’08: Proceedings of the

nineteenth ACM conference on Hypertext and hypermedia, pages 71–80, New York, NY, USA, 2008. ACM.• S. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems. Journal of Information

Science, 32(2):198–208, April 2006.• H. Halpin, V. Robu, and H. Shepherd. The complex dynamics of collaborative tagging. In Proceedings of the

16th international conference on World Wide Web, WWW ’07, pages 211–220, New York, NY, USA, 2007. ACM.

• A. Hotho, R. Jäschke, C. Schmitz, and G. Stumme. Bibsonomy: A social bookmark and publication sharing system. In Proceedings of the Conceptual Structures Tool Interoperability Workshop at the 14th International Conference on Conceptual Structures, pages 87-102, 2006.

• C. T. Kello, G. D. A. Brown, R. Ferrer-i Cancho, J. G. Holden, K. Linkenkaer-Hansen, T. Rhodes, and G. C. Van Orden. Scaling laws in cognitive sciences. Trends in Cognitive Sciences, 14(5):223{232, May 2010.

• W. Webber, A. Moat, and J. Zobel. A similarity measure for indefinite rankings. ACM Trans. Inf. Syst., 28(4):20:1{20:38, Nov. 2010.

Thank you!

Special thanks to my collaborators (2/3 of them are here):

Limitations and Future Work• RBO measures ranking but ignores the differences

in the frequencies

• Decay function to weight tag counts– old tag assignments are less important than new ones

• Number and diversity of users who tag a resource might impact the semantic stabilization process

Alternatives to RBO• Unweighted and conjoint measures

– Kendall tau, Spearman rho• Weighted and conjoint measures

– Weighted Kendall tau• Unweighted and non-conjoint measures

– Intersection metric• Weighted and conjoint

– Cumulative overlap at increasing depths

Dataset

Categories of Semantically Unstable Resources

• Entity to which a resource refers changes• Resource (i.e. website) changes • Entity/Topic to which a resource refers is controversial

– website refers to controversial entity/topic on which different viewpoints exist

• External conditions which impact viewpoints on entity/topic change– Website remains stable but viewpoint of taggers on the

entity or topic related with the site change

Relative Tag Proportion [Golder and Huberman, 2006]

tn+mtn

stableless stable

Relative Tag Proportion [Golder and Huberman, 2006]

KL-Divergence [Halpin et al., 2007]

• KL divergence between the rank-ordered frequency distribution of the top 25 tags at different time points

tn+mtn

stableless stable

KL-Divergence

Power Law [Cattuto, 2006]

• Is the rank-ordered frequency distribution a power law distribution?

• Is the frequency y of a tag inversely proportional to it's rank r?

tn+mtn

Power Law [Cattuto, 2006]

• Is it really power law?– Very likely yes according to the maximum

likelihood estimator and Kolmogorov-Smirnov statistic [Clauset et al., 2010]

– Estimate alpha and xmin over some reasonable range

– Compare power law fit to the fit of the exponential function, the lognormal function and the stretched exponential (Weibull) function. Use the log-likelihood ratios to indicate which fit is better.

– We do not find significant differences between the power law fit and the lognormal fit

Stablilization going beyond Baseline Stability

Stablilization not going beyond Baseline Stability

semantic stability in social tagging streams

Data & Analytics

smart: semantic malware attribute relevance tagging ·...

combining timbric and rhythmic features for semantic music...

chapter 46 representing and sharing tagging data using the...

digling knjizica a5 - univerzita...

extraction and analysis of tripartite relationships from...

developing smart cities services through semantic analysis...

image tagging attaching textual meta-information or semantic...

noun sense tagging: semantic prototype annotation of a...

combining semantic tagging and support vector …...display...

image access, the semantic gap, and social tagging as a

processing online news streams for large-scale semantic...

semantic discovery and integration of urban data streams

chapter 2 semantic enhancement of social tagging...

hardware accelerated algorithms for semantic processing ·...

metaphor, popular science, and semantic tagging:...

social inﬂuence analysis in microblogging ... - semantic...

semantic tagging for the xwiki platform with zemanta and...

die neuen tools: web 2.0, semantic wiki, social tagging & co

semantic tagging for old maps...and other things on the web

moat: from tagging to semantic web