towards mining semantic maturity in social bookmarking systems
DESCRIPTION
TRANSCRIPT
Towards Mining Semantic Maturityin Social Bookmarking Systems
Martin Atzmueller1, Dominik Benz1,Andreas Hotho2, Gerd Stumme1
1Knowledge and Data Engineering Group (KDE), University of
Kassel, Germany
2Data Mining and Information Retrieval
GroupUniversity of Würzburg,
Germany
Let it grow!
Evidence for Emergent Semantics within Social Applications Meaning of tags can be captured at different stages
Can we find indicators of „semantic maturity“?
2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 3 / 18
The Story
Social Bookmarking & Emergent Semantics
Maturity Indicators
Mining Maturity Profiles
Evaluation
2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 4 / 18
Social Tagging
Social tagging: Simple and intuitive way to organize all kinds of resources
Uncontrolled vocabulary: Tags are „just strings“
Formal model: Folksonomy F = (U, T, R, Y) Users U, Tags T, Resources R Tag assignments Y (UTR)
Alice
Bob
iswc.org bonn.de
semantics
conference
travel
Capturing Tag Semantics
Co-occurrence distribution „semantic fingerprint“
Capture Semantic Relatedness / Synonyms by Cosine Similarity
2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 5 / 18
with Cattuto et al: Semantic Grounding of Tag Relatedness in Social Bookmarking Systems (ISWC 2008)
Semantic Grounding
Compute Folksonomy-based Relatedness (via Context
Vectors)
Sim( , ) = 0.74
WordNet Synset Taxonomy
map
Grounded similaritySimTrue( , ) = 0.59(we used Jiang-Conrath dist.)
Appendix: Music Genre Taxonomy learned from last.fmMusic Genre Taxonomy learned from last.fm
2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 8 / 18
The Story
Cooccurrence Fingerprints capture
tag semanticsMaturity
Indicators
Mining Maturity Profiles
Evaluation
Frequency Intuition: „the more often used,
the more mature“
Resource frequency
User frequency
2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 9 / 18
Maturity Indicators (1)
Centrality "Importance" within co-occurrence network G = (V,E) Intuition: the more important, the more mature Degree, Closeness, Betweenness
2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 10 / 18
Maturity Indicators (2)
2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 11 / 18
The Story
Cooccurrence Fingerprints capture
tag semanticsFrequency /
Centrality Properties as maturity indicators
Mining Maturity Profiles
Evaluation
Pattern Mining using Subgroup Discovery
In a nutshell: "Find descriptions of subsets in the data, that differ significantly for the total population with respect to a target concept.
Pattern: Conjunctive description using tag propertiesrepresentation as indicator rule:
description target (with probability)
2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 12 / 18
Finding Maturity Patterns
Tag Ufreq Rfreq Deg Bet Clos TARGET
java 0.13 0.9 0.2 0.1 0.4 0.8
game 0.2 0.3 0.01 0.04 0.02 1.0
games 0.4 0.1 0.1 0.12 0.2 1.0
semantic
0.1 0.2 0.4 0.3 0.35 0.5
web 0.8 0.7 0.6 0.7 0.43 0.6
web2.0 0.4 0.4 0.2 0.25 0.3 0.3
python 0.3 0.2 0.02 0.1 0.05 0.7
Which patterns maximize target variable?
Mining Maturity Profiles – Target Variables
First: Compute most related tag tsim for each tag t (resource context)
WordNet Synonym Identification (SYN) Binary Target Variable True if tsim is synonym of t, False otherwise
Grounded WordNet Maturity (MAT) Binary Target Variable Based on taxonomic shortest path length True if sim(tsim,t) > 0.5, false otherwise
2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 14 / 18
Pattern Mining - Algorithm
Patterns similar to association rules BUT: Fixed target concept (of interest), i.e., high
maturity
Pattern mining – k best approach Through space of descriptions
(conjunctions of features)
Maximizing quality function, e.g., increase in target mean/share
Several efficient algorithms, we apply exhaustive one.2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 15 / 18
2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 16 / 18
The Story
Cooccurrence Fingerprints capture
tag semanticsFrequency /
Centrality Properties as maturity indicators
Evaluation
0.70.050.10.020.20.3Python
0.30.30.250.20.40.4web2.0
0.60.430.70.60.70.8Web
0.50.350.30.40.20.1Semantic
1.00.20.120.10.10.4Games
1.00.020.040.010.30.2game
0.80.40.10.20.90.13java
TARGETClosBetDegRfreqUfreqTag
0.70.050.10.020.20.3Python
0.30.30.250.20.40.4web2.0
0.60.430.70.60.70.8Web
0.50.350.30.40.20.1Semantic
1.00.20.120.10.10.4Games
1.00.020.040.010.30.2game
0.80.40.10.20.90.13java
TARGETClosBetDegRfreqUfreqTag
Discover indicator subgroups which
maximize maturity target
2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 17 / 18
Social Bookmarking Data
Folksonomy crawled from Delicious in 2006, restrictedto top 10,000 tags 476,378 users 10,000 tags 12,660,470 resources 101,491,722 tag assignments
Preprocessing & Filtering: Filter tags without sufficiently similar partner (cos <
0.05) Limit to tags with only one sense in WordNet
Nr. of finally considered tags: 1944
Direct Correlation to Target Variables
No significant correlation of individual indicator with maturity target
Eventually higher correlation by combination of indicators consider subgroups
2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 18 / 18
Bet Clos Deg Rfreq Ufreq
MAT 0.09 0.14 0.12 0.15 0.12
SYN 0.12 0.14 0.13 0.15 0.15
Target: Synonym Identification (SYN); mean = 0.13
Small groups with highest maturity (measured by increase of synonym discovery rate)
Larger group: degree centrality + user frequency Synonmym discovery rate 128 % higher than for all tags
Results: Exemplary Patterns (1)
2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 19 / 18
Results: Exemplary Patterns (2)
2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 20 / 18
Target: WordNet Maturity (MAT); mean = 0.59
2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 21 / 18
Discussion & Implications
In general: centrality and frequency useful to assess maturity
Combined evidence of indicators leads to higher-quality patterns
Subgroup discovery generally useful technique
Open issues: Further maturity indicators? Alternative notions of maturity? Temporal aspects Mining of „immaturity“ …
2011/10/23Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 22 / 18
The Story
Cooccurrence Fingerprints capture
tag semanticsFrequency /
Centrality Properties as maturity indicators
Combined evidence of
indicators leads to higher-quality
patterns
0.70.050.10.020.20.3Python
0.30.30.250.20.40.4web2.0
0.60.430.70.60.70.8Web
0.50.350.30.40.20.1Semantic
1.00.20.120.10.10.4Games
1.00.020.040.010.30.2game
0.80.40.10.20.90.13java
TARGETClosBetDegRfreqUfreqTag
0.70.050.10.020.20.3Python
0.30.30.250.20.40.4web2.0
0.60.430.70.60.70.8Web
0.50.350.30.40.20.1Semantic
1.00.20.120.10.10.4Games
1.00.020.040.010.30.2game
0.80.40.10.20.90.13java
TARGETClosBetDegRfreqUfreqTag
Discover indicator subgroups which
maximize maturity target
Thanks!
kassel.de
Towards Mining Semantic Maturityin Social Bookmarking Systems
Martin Atzmueller1, Dominik Benz1,Andreas Hotho2, Gerd Stumme1
1Knowledge and Data Engineering Group (KDE), University of
Kassel, Germany
2Data Mining and Information Retrieval
GroupUniversity of Würzburg,
Germany