towards mining semantic maturity in social bookmarking systems

23
Towards Mining Semantic Maturity in Social Bookmarking Systems Martin Atzmueller 1 , Dominik Benz 1 , Andreas Hotho 2 , Gerd Stumme 1 1 Knowledge and Data Engineering Group (KDE), University of Kassel, Germany 2 Data Mining and Information Retrieval Group University of Würzburg, Germany

Upload: dominik-benz

Post on 27-Jan-2015

104 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Towards Mining Semantic Maturity in Social Bookmarking Systems

Towards Mining Semantic Maturityin Social Bookmarking Systems

Martin Atzmueller1, Dominik Benz1,Andreas Hotho2, Gerd Stumme1

1Knowledge and Data Engineering Group (KDE), University of

Kassel, Germany

2Data Mining and Information Retrieval

GroupUniversity of Würzburg,

Germany

Page 2: Towards Mining Semantic Maturity in Social Bookmarking Systems

Let it grow!

Evidence for Emergent Semantics within Social Applications Meaning of tags can be captured at different stages

Can we find indicators of „semantic maturity“?

Page 3: Towards Mining Semantic Maturity in Social Bookmarking Systems

2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 3 / 18

The Story

Social Bookmarking & Emergent Semantics

Maturity Indicators

Mining Maturity Profiles

Evaluation

Page 4: Towards Mining Semantic Maturity in Social Bookmarking Systems

2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 4 / 18

Social Tagging

Social tagging: Simple and intuitive way to organize all kinds of resources

Uncontrolled vocabulary: Tags are „just strings“

Formal model: Folksonomy F = (U, T, R, Y) Users U, Tags T, Resources R Tag assignments Y (UTR)

Alice

Bob

iswc.org bonn.de

semantics

conference

travel

Page 5: Towards Mining Semantic Maturity in Social Bookmarking Systems

Capturing Tag Semantics

Co-occurrence distribution „semantic fingerprint“

Capture Semantic Relatedness / Synonyms by Cosine Similarity

2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 5 / 18

with Cattuto et al: Semantic Grounding of Tag Relatedness in Social Bookmarking Systems (ISWC 2008)

Page 6: Towards Mining Semantic Maturity in Social Bookmarking Systems

Semantic Grounding

Compute Folksonomy-based Relatedness (via Context

Vectors)

Sim( , ) = 0.74

WordNet Synset Taxonomy

map

Grounded similaritySimTrue( , ) = 0.59(we used Jiang-Conrath dist.)

Page 7: Towards Mining Semantic Maturity in Social Bookmarking Systems

Appendix: Music Genre Taxonomy learned from last.fmMusic Genre Taxonomy learned from last.fm

Page 8: Towards Mining Semantic Maturity in Social Bookmarking Systems

2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 8 / 18

The Story

Cooccurrence Fingerprints capture

tag semanticsMaturity

Indicators

Mining Maturity Profiles

Evaluation

Page 9: Towards Mining Semantic Maturity in Social Bookmarking Systems

Frequency Intuition: „the more often used,

the more mature“

Resource frequency

User frequency

2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 9 / 18

Maturity Indicators (1)

Page 10: Towards Mining Semantic Maturity in Social Bookmarking Systems

Centrality "Importance" within co-occurrence network G = (V,E) Intuition: the more important, the more mature Degree, Closeness, Betweenness

2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 10 / 18

Maturity Indicators (2)

Page 11: Towards Mining Semantic Maturity in Social Bookmarking Systems

2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 11 / 18

The Story

Cooccurrence Fingerprints capture

tag semanticsFrequency /

Centrality Properties as maturity indicators

Mining Maturity Profiles

Evaluation

Page 12: Towards Mining Semantic Maturity in Social Bookmarking Systems

Pattern Mining using Subgroup Discovery

In a nutshell: "Find descriptions of subsets in the data, that differ significantly for the total population with respect to a target concept.

Pattern: Conjunctive description using tag propertiesrepresentation as indicator rule:

description target (with probability)

2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 12 / 18

Page 13: Towards Mining Semantic Maturity in Social Bookmarking Systems

Finding Maturity Patterns

Tag Ufreq Rfreq Deg Bet Clos TARGET

java 0.13 0.9 0.2 0.1 0.4 0.8

game 0.2 0.3 0.01 0.04 0.02 1.0

games 0.4 0.1 0.1 0.12 0.2 1.0

semantic

0.1 0.2 0.4 0.3 0.35 0.5

web 0.8 0.7 0.6 0.7 0.43 0.6

web2.0 0.4 0.4 0.2 0.25 0.3 0.3

python 0.3 0.2 0.02 0.1 0.05 0.7

Which patterns maximize target variable?

Page 14: Towards Mining Semantic Maturity in Social Bookmarking Systems

Mining Maturity Profiles – Target Variables

First: Compute most related tag tsim for each tag t (resource context)

WordNet Synonym Identification (SYN) Binary Target Variable True if tsim is synonym of t, False otherwise

Grounded WordNet Maturity (MAT) Binary Target Variable Based on taxonomic shortest path length True if sim(tsim,t) > 0.5, false otherwise

2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 14 / 18

Page 15: Towards Mining Semantic Maturity in Social Bookmarking Systems

Pattern Mining - Algorithm

Patterns similar to association rules BUT: Fixed target concept (of interest), i.e., high

maturity

Pattern mining – k best approach Through space of descriptions

(conjunctions of features)

Maximizing quality function, e.g., increase in target mean/share

Several efficient algorithms, we apply exhaustive one.2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 15 / 18

Page 16: Towards Mining Semantic Maturity in Social Bookmarking Systems

2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 16 / 18

The Story

Cooccurrence Fingerprints capture

tag semanticsFrequency /

Centrality Properties as maturity indicators

Evaluation

0.70.050.10.020.20.3Python

0.30.30.250.20.40.4web2.0

0.60.430.70.60.70.8Web

0.50.350.30.40.20.1Semantic

1.00.20.120.10.10.4Games

1.00.020.040.010.30.2game

0.80.40.10.20.90.13java

TARGETClosBetDegRfreqUfreqTag

0.70.050.10.020.20.3Python

0.30.30.250.20.40.4web2.0

0.60.430.70.60.70.8Web

0.50.350.30.40.20.1Semantic

1.00.20.120.10.10.4Games

1.00.020.040.010.30.2game

0.80.40.10.20.90.13java

TARGETClosBetDegRfreqUfreqTag

Discover indicator subgroups which

maximize maturity target

Page 17: Towards Mining Semantic Maturity in Social Bookmarking Systems

2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 17 / 18

Social Bookmarking Data

Folksonomy crawled from Delicious in 2006, restrictedto top 10,000 tags 476,378 users 10,000 tags 12,660,470 resources 101,491,722 tag assignments

Preprocessing & Filtering: Filter tags without sufficiently similar partner (cos <

0.05) Limit to tags with only one sense in WordNet

Nr. of finally considered tags: 1944

Page 18: Towards Mining Semantic Maturity in Social Bookmarking Systems

Direct Correlation to Target Variables

No significant correlation of individual indicator with maturity target

Eventually higher correlation by combination of indicators consider subgroups

2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 18 / 18

Bet Clos Deg Rfreq Ufreq

MAT 0.09 0.14 0.12 0.15 0.12

SYN 0.12 0.14 0.13 0.15 0.15

Page 19: Towards Mining Semantic Maturity in Social Bookmarking Systems

Target: Synonym Identification (SYN); mean = 0.13

Small groups with highest maturity (measured by increase of synonym discovery rate)

Larger group: degree centrality + user frequency Synonmym discovery rate 128 % higher than for all tags

Results: Exemplary Patterns (1)

2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 19 / 18

Page 20: Towards Mining Semantic Maturity in Social Bookmarking Systems

Results: Exemplary Patterns (2)

2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 20 / 18

Target: WordNet Maturity (MAT); mean = 0.59

Page 21: Towards Mining Semantic Maturity in Social Bookmarking Systems

2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 21 / 18

Discussion & Implications

In general: centrality and frequency useful to assess maturity

Combined evidence of indicators leads to higher-quality patterns

Subgroup discovery generally useful technique

Open issues: Further maturity indicators? Alternative notions of maturity? Temporal aspects Mining of „immaturity“ …

Page 22: Towards Mining Semantic Maturity in Social Bookmarking Systems

2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 22 / 18

The Story

Cooccurrence Fingerprints capture

tag semanticsFrequency /

Centrality Properties as maturity indicators

Combined evidence of

indicators leads to higher-quality

patterns

0.70.050.10.020.20.3Python

0.30.30.250.20.40.4web2.0

0.60.430.70.60.70.8Web

0.50.350.30.40.20.1Semantic

1.00.20.120.10.10.4Games

1.00.020.040.010.30.2game

0.80.40.10.20.90.13java

TARGETClosBetDegRfreqUfreqTag

0.70.050.10.020.20.3Python

0.30.30.250.20.40.4web2.0

0.60.430.70.60.70.8Web

0.50.350.30.40.20.1Semantic

1.00.20.120.10.10.4Games

1.00.020.040.010.30.2game

0.80.40.10.20.90.13java

TARGETClosBetDegRfreqUfreqTag

Discover indicator subgroups which

maximize maturity target

Thanks!

[email protected]

kassel.de

Page 23: Towards Mining Semantic Maturity in Social Bookmarking Systems

Towards Mining Semantic Maturityin Social Bookmarking Systems

Martin Atzmueller1, Dominik Benz1,Andreas Hotho2, Gerd Stumme1

1Knowledge and Data Engineering Group (KDE), University of

Kassel, Germany

2Data Mining and Information Retrieval

GroupUniversity of Würzburg,

Germany