semantic modelling of user interests based on cross-folksonomy analysis @ iswc2008
DESCRIPTION
Paper presented at the International Semantic Web Conference (ISWC) 2008.TRANSCRIPT
TAGora: Semiotic Dynamics of Online Social Communities EU-IST-2006-034721
Semantic Modelling of User Interests Based
on Cross-Folksonomy AnalysisMartin Szomszor, Harith Alani, Kieron O’Hara, Nigel Shadbolt
University of Southampton
Iván Cantador Universidad Autonoma de Madrid
Outline• Introduction and Motivation
– Why is your folksonomy interaction useful?– How could it be exploited?
• Architecture– Matching user accounts– Collecting Data– Tag Filtering– Profile Building
• Experiment and Evaluation• Conclusions and Future Work
Introduction
delicious.comhttp://slashdot.org/
http://news.bbc.co.uk/
Dream Theater
Metallica
Rush
Increasing number ofonline identities
• Recent Ofcom study found that UK adults have on average 1.6 profiles. 39% of those that have one profile have at least 2
• Many predict that in the near future, individuals will have in excess of 10 profiles– [Ofcom 2008] Social Networking: A quantative and qualitative
research report into attitudes, behaviours, and use.
Profile of Interests
The Big Picturedelicious.com
delicious.com
Profiles could be exported to other sites to improve recommendation quality
Profile of
Interests
Personalisation
Profiles could be used to support
personalised searching
Better user experience
Consolidation and Integration
currency
travel
hotels
cuba
http://dbpedia.org/resource/Cuba
cuba
holiday
2008
http://dbpedia.org/resource/Travel
http://dbpedia.org/resource/Holiday
http://dbpedia.org/resource/Category:Tourism
User Taggingdelicious.com
delicious.com
Tag Clouds
Tagging Variation
[1] Szomszor, M., Cantador, I. and Alani, H. (2008). Correlating User Profiles from Multiple Folksonomies. In: ACM Conference on Hypertext and Hypermedia, 2008 , Pittsburgh, Pennsylvania.
Raw Tags
Filtered Tags
Architecture for Building Profiles of Interests
Account Correlation
• Using Google’s Social Graph API
delicious.com
acco
unt h
omep
age
http://users.ecs.soton.ac.uk/mns2
• Delicious– Custom python scripts
• Flickr– Using public API
• Only public information is harvested
Data Collection
Tag Filtering Process
• Three stage process:1. Identify Wikipedia page
• London is matched withhttp://en.wikipedia.org/wiki/London
2. Extract Category list• Host cities of the Summer Olympic Games | Host cities of the
Commonwealth Games | London | 1st century establishments | British capitals | Capitals in Europe | Port cities and towns in the United Kingdom
3. Select representative Categories• Only choose categories that match the tag string• Excludes spurious categories such as:
– Host cities of the Summer Olympic Games– Needs more sources
Creating User Profiles
Profile of Interest
Experiment Setup• Bootstrapped using 667,141 delicious
profiles obtained in previous work• Only accounts with a matching Flickr
profile and > 50 distinct tags were added• Final list contains 1,392 users
Delicious FlickrTotal Posts 1,134,527 Total Posts 2,215,913
Distinct Tags 138,028 Distinct Tags 307,182
Evaluation
• Four evaluation procedures:– The performance of the tag filtering and
matching to Wikipedia Entries– The difference between the most common
categories found in delicious and Flickr– The amount learnt from merging profiles from
the two folksonomies– The accuracy of matching tags to Wikipedia
categories
Tag Filtering and Matching
Global Category View• What are the differences in the interests
that are learnt from each domain?
Delicious FlickrWikipedia Category Total Freq Wikipedia Category Total Freq
Design 69,215 Travel 51,674
Blogs 68,319 Australia 51,617
Music 45,063 London 46,623
Photography 41,356 Festivals 42,504
Tools 35,795 Music 40,943
Video 34,318 Cats 38,230
Arts 29,966 Holidays 37,610
Software 28,746 Family 37,100
Maps 26,912 Japan 36,513
Teaching 22,120 Concerts 35,374
Games 21,549 Surnames 34,947
How-to 19,533 Washington 33,924
Technology 18,032 Given Names 32,843
News 17,737 Dogs 32,206
Humor 15,816 Birthdays 22,290
Learning More About Users• How much more can we learn by using
multiple profiles?
Category Matching• How good is the category matching?• Take 100 random users and choose 1
Delicious tag and 1 Flickr tag• Classify tag into one of 3 classes:
– Correct– Unresolved (not matched to any category)– Ambiguous (Disambiguation required)
Correct Unresolved AmbiguousDelicious 66% 20% 14%
Flickr 63% 25% 12%
Conclusions• We have proposed a novel method for the
creation of Profiles of Interest by exploiting an individual’s tagging activities across two popular folksonomy sites
• Frequently used tags often specify areas of interest but not always!– Common delicious tags are daily, toread, howto– Flickr tags often include names of people
• Expanding the analysis across folksonomies increases the amount learnt– On Average 15 new concepts per user
Future Work• Improve page matching
– 22.5% of sample tags unresolved• Handle disambiguation
– 13% of sample tags refer to ambiguous terms• Cooccurrence networks• Category hierarchy
• Increase network coverage– Already have the data to include Last.fm
• Understand which tags actually specify an interest of the individual– Filter out categories such as ‘Surname’